Flan-T5：用更小且更高效的LLM实现出色效果

在人工智能语言模型的世界里，没有一个放之四海而皆准的解决方案。商业用户逐渐意识到，虽然超大型语言模型具备广泛的能力，但对许多应用来说可能是过人工智能化的。

通常情况下，当人们收到来自自己钟爱的专有模型或云计算提供商的高额账单时，才会恍然大悟一个问题：他们是否能够确保所需的A100和H100系统的GPU可用性，以运行先进的模型。

相反，许多人正在寻找更有效的开源产品，来替代像GPT-3/4这样的产品。

Flan-T5

2022年12月，谷歌发布了论文《指令微调语言模型缩放》^[1]（Scaling Instruction-Finetuned Language Models），并在其中对各种模型（PaLM、T5、U-PaLM）针对广泛的任务进行了大量的微调。

在该论文中，他们发布了Flan-T5检查点。这些检查点以相对适中的参数数量“实现了强大的少样本性能”，“即使和大得多的模型相比”，如GPT家族中最大的成员，也是如此。

在本文中，我们将展示如何在Graphcore（拟未） IPU上运行Flan-T5。Flan-T5-Large可以在一个IPU-POD₄上运行，您可以利用Paperspace的六小时免费试用，而Flan-T5-XL可以在付费的一个IPU-POD₁₆上运行。

我们将研究一系列常见的NLP工作负载，并考虑以下问题：

Flan-T5到底有多好？
如何在IPU上运行Flan-T5？
可以用Flan-T5做什么？
为什么要升级到Flan-T5-XL？

https://reurl.cc/v75Rre

Flan-T5到底有多好？

让我们先来看看谷歌论文中的一些性能数据：

《指令微调语言模型缩放》论文中表5的一部分^[2]

这些结果令人惊奇。请注意：

Flan-T5在MMLU、BBH和MGSM中的表现比T5好2倍
在TyDiQA中，我们甚至看到了新能力的出现
Flan-T5-Large比以前所有的T5变体（甚至XXL）都要好

这意味着Flan-T5是一个非常强大的模型，和您所知道的T5可能完全不同。现在，让我们看看Flan-T5-Large和Flan-T5-XL与MMLU基准中的其他模型相比如何：

部分MMLU排行榜来自Papers With Code^[3]（CoT = Chain of Thought）

请注意，Flan-T5的MMLU是从训练中产出的，这个表格显示：

Flan-T5-Large和Flan-T5-XL（分别有0.8B和3B的参数）的表现与其他参数明显更多的模型相似，例如GPT-3（175B参数）和Galactica（120B参数）
GPT-3需要针对基准任务进行微调，以击败Flan-T5-XL
Flan-T5的性能优于PaLM和LLaMA等较新的LLM的较小版本（同时也比其小数倍）

如何在IPU上运行Flan-T5？

您可以在Hugging Face上找到Flan-T5检查点，并使用Graphcore的Hugging Face集成（🤗Optimum Graphcore^[4]），轻松地用标准推理流水线运行Flan-T5。

如果您已经有一个现成的基于Hugging Face的应用程序，您可以在IPU上进行尝试，非常简单：

- from transformers import pipeline

+ from optimum.graphcore import pipeline

- text_generator = pipeline("text2text-generation", model="google/flan-t5-large")

+ text_generator = pipeline("text2text-generation", model="google/flan-t5-large", ipu_config="Graphcore/t5-large-ipu")

text_generator("Please solve the following equation: x^2 - 9 = 0")

[{'generated_text': '3'}]

现在，让我们定义一个我们自己的文本生成器来在本notebook其余部分使用。首先，确保您的Python虚拟环境安装了最新版本的🤗Optimum Graphcore^[5]：

%pip install "optimum-graphcore>=0.6.1, <0.7.0"

缓存目录的位置可以通过环境变量配置，您也可以直接在notebook中进行配置：

import os

executable_cache_dir=os.getenv("POPLAR_EXECUTABLE_CACHE_DIR", "./exe_cache/")

num_available_ipus=int(os.getenv("NUM_AVAILABLE_IPU", 4))

接下来，让我们从optimal.graphcore导入流水线，并为适当数量的IPU创建我们的Flan-T5流水线：

from optimum.graphcore import pipeline

size = {4: "large", 16: "xl"}

flan_t5 = pipeline(

"text2text-generation",

model=f"google/flan-t5-{size[num_available_ipus]}",

ipu_config=f"Graphcore/t5-{size[num_available_ipus]}-ipu",

max_input_length=896,

ipu_config=ipu_config,

)

flan_t5.model.ipu_config.executable_cache_dir = executable_cache_dir

现在，让我们问它一些随机问题：

questions = [

"Solve the following equation for x: x^2 - 9 = 0",

"At what temperature does nitrogen freeze?",

"In order to reduce symptoms of asthma such as tightness in the chest, wheezing, and difficulty breathing, what do you recommend?",

"Which country is home to the tallest mountain in the world?"

]

for out in flan_t5(questions):

print(out)

Graph compilation: 100%|██████████| 100/100 [05:20<00:00]

Graph compilation: 100%|██████████| 100/100 [02:56<00:00]

{'generated_text': '3'}

{'generated_text': '-32 °C'}

{'generated_text': 'ibuprofen'}

{'generated_text': 'nepal'}

请注意，其中一些答案可能是错误的，从模型本身检索信息不是Flan-T5的目的。但是，如果您使用Flan-T5-XL，它们出错的可能性就会降低（使用IPU-POD₁₆查看这个notebook就会发现其中的不同之处）。

可以用Flan-T5做什么？

Flan-T5已经在数百个数据集的数千个不同任务中进行了微调。因此无论您的任务是什么，看看Flan-T5是否能满足您的要求都是值得的。在这里，我们将展示几个常见的任务：

情感分析

sentiment_analysis = (

"Review: It gets too hot, the battery only can last 4 hours. Sentiment: Negative\n"

"Review: Nice looking phone. Sentiment: Positive\n"

"Review: Sometimes it freezes and you have to close all the open pages and then reopen where you were. Sentiment: Negative\n"

"Review: Wasn't that impressed, went back to my old phone. Sentiment:"

)

flan_t5(sentiment_analysis)[0]["generated_text"]

Negative

高级命名实体识别

下面的代码模板（snippet）改编自与每个提到的公司相对应的维基百科页面。

advanced_ner = """Microsoft Corporation is a company that makes computer software and video games. Bill Gates and Paul Allen founded the company in 1975

[Company]: Microsoft, [Founded]: 1975, [Founders]: Bill Gates, Paul Allen

Amazon.com, Inc., known as Amazon , is an American online business and cloud computing company. It was founded on July 5, 1994 by Jeff Bezos

[Company]: Amazon, [Founded]: 1994, [Founders]: Jeff Bezos

Apple Inc. is a multinational company that makes personal computers, mobile devices, and software. Apple was started in 1976 by Steve Jobs and Steve Wozniak."""

flan_t5(advanced_ner)[0]["generated_text"]

[Company]: Apple, [Founded]: 1976, [Founders]: Steve Jobs, Steve Wozniak

问题回答

下面的代码模板来自于squad^[6]数据集。

context = 'Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24-10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi\'s Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the "golden anniversary" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as "Super Bowl L"), so that the logo could prominently feature the Arabic numerals 50.'

question = "Which NFL team represented the AFC at Super Bowl 50?"

# The correct answer is Denver Broncos

flan_t5(f"{context} {question}")[0]['generated_text']

Denver Broncos

意图分类

intent_classification = """[Text]: I really need to get a gym membership, I'm exhausted.

[Intent]: get gym membership

[Text]: What do I need to make a carbonara?

[Intent]: cook carbonara

[Text]: I need all these documents sorted and filed by Monday.

[Intent]:"""

flan_t5([intent_classification])[0]["generated_text"]

file documents

总结

以下是来自xsum^[7]数据集的代码模板。

summarization="""

Document: Firstsource Solutions said new staff will be based at its Cardiff Bay site which already employs about 800 people.

The 300 new jobs include sales and customer service roles working in both inbound and outbound departments.

The company's sales vice president Kathryn Chivers said: "Firstsource Solutions is delighted to be able to continue to bring new employment to Cardiff."

Summary: Hundreds of new jobs have been announced for a Cardiff call centre.

Document: The visitors raced into a three-goal first-half lead at Hampden.

Weatherson opened the scoring with an unstoppable 15th-minute free-kick, and he made it 2-0 in the 27th minute.

Matt Flynn made it 3-0 six minutes later with a fine finish.

Queen's pulled a consolation goal back in stoppage time through John Carter.

Summary: Peter Weatherson netted a brace as Annan recorded only their second win in eight matches.

Document: Officers searched properties in the Waterfront Park and Colonsay View areas of the city on Wednesday.

Detectives said three firearms, ammunition and a five-figure sum of money were recovered.

A 26-year-old man who was arrested and charged appeared at Edinburgh Sheriff Court on Thursday.

Summary:

"""

flan_t5(summarization)[0]["generated_text"]

A man has been arrested after a firearm was found in a property in Edinburgh.

文本分类

text_classification_1 = """A return ticket is better value than a single.

topic: travel cost

You can start from the basic stitches, and go from there.

topic: learning knitting

The desk which I bought yesterday is very big.

topic: furniture size

George Washington was president of the United States from 1789 to 1797.

topic:"""

flan_t5(text_classification_1)[0]["generated_text"]

George Washington presidency

text_classification_2 = """FLAN-T5 was released in the paper Scaling Instruction-Finetuned Language Models - it is an enhanced version of T5 that has been finetuned in a mixture of tasks.

keywords: released, enhanced, finetuned

The IPU, or Intelligence Processing Unit, is a highly flexible, easy-to-use parallel processor designed from the ground up for AI workloads.

keywords: processor, AI

Paperspace is the platform for AI developers. providing the speed and scale needed to take AI models from concept to production.

keywords:"""

flan_t5(text_classification_2)[0]["generated_text"]

paperspace, AI, scale

为什么要升级到Flan-T5-XL？

正如我们前面所看到的，论文结果显示，Flan-T5-XL在其验证任务中大约比Flan-T5-Large好40%（平均）。因此，在决定Flan-T5-XL是否值得其成本的时候，您可以问问自己以下几个问题：

对要执行的任务来说，我的数据是否需要具备更好的语言理解？
我的任务对于像Flan-T5-Large这样小的模型来说是否太复杂，而对于像GPT-3这样大的模型来说是否太容易？
我的任务是否需要Flan-T5-XL生成更长的输出序列？

为了证明这一点，现在让我们看一个任务的例子，在这个例子中，对上述问题的答案都是肯定的。假设您有一个客户服务人工智能，您用它来回答基本问题以减少客户服务人员的工作量。这需要：

强大的语言学能力，能够分析和生成中等大小的文本块
一个能够很好地从上下文中学习的LLM，但在其参数中并没有嵌入所有的人类历史
能够产生多个句子响应，但不会比这长很多

下面的代码中，我们看到输入中提供了一些关于Graphcore的上下文，以及来自模型的对话响应的引子。从这个例子可以看出，Flan-T5-XL能够理解上下文中提供的信息，并对所问的问题提供有用和自然的回答。

from IPython.display import clear_output

class ChatBot:

def __init__(self, model, context) -> None:

self.model = model

self.initial_context = context

self.context = self.initial_context

self.user, self.persona = [x.split(":")[0] for x in context.split("\n")[-2:]]

def ask(self, question):

question += "." if question[-1] not in [".", "?", "!"] else ""

x = f"{self.context}\n{self.user}: {question}\n{self.persona}: "

# print(f"\n{x}\n")

y = self.model(x)

response = y[0]["generated_text"]

self.context = f"{x}{response}"

return response

def session(self):

print("Starting session", flush=True)

prompt = input()

while prompt != "":

if prompt == "reset":

clear_output()

print("Starting session", flush=True)

self.context = self.initial_context

prompt = input()

print(f"{self.user.title()}: {prompt}", flush=True)

answer = self.ask(prompt)

print(f"{self.persona.title()}: {answer}", flush=True)

prompt = input()

print("Ending session", flush=True)

context = f"""This is a conversation between a [customer] and a [virtual assistant].

The [virtual assistant] works at Graphcore. Here is some informaton about Graphcore:

- Graphcore is located in Bristol.

- Graphcore invented the intelligence processing unit (IPU). It is purpose built for AI applications.

- The currently available IPU models are: Classic IPU, Bow IPU, C600.

- IPUs are available on: Paperspace, Gcore Cloud and Graphcloud.

[virtual assistant]: Hello, welcome to Graphcore, how can I help you today?

[customer]: I'd like to ask some questions about your company.

[virtual assistant]: Ok, I can help you with that."""

chatbot = ChatBot(flan_t5, context)

chatbot.session()

Starting session

[Customer]: What is an IPU?

[Virtual Assistant]: The Intelligence Processing Unit (IPU) is a computer chip that is used to process artificial intelligence.

[Customer]: Who makes it?

[Virtual Assistant]: Graphcore is the manufacturer of the IPU.

[Customer]: Can I use them?

[Virtual Assistant]: Yes, I'm sure you can.

[Customer]: Where?

[Virtual Assistant]: The IPU is available on Paperspace, Gcore and Graphcloud.

Ending session

flan_t5.model.detachFromDevice()

结论

综上所述，我们在开篇提出的问题的答案是：

问

Flan-T5到底有多好？

答

根据MMLU的基准测试，（其性能）是T5的两倍，与GPT-3相当。

问

如何在IPU上运行Flan-T5？

答

在您的流水线实例中，改变一个输入，并增加一个关键词参数。

问

我可以用Flan-T5做什么？

答

鉴于其广泛的微调任务，您几乎可以用它做任何事。

问

我为什么要升级到Flan-T5-XL？

答

与Flan-T5-Large相比，性能提高了约40%，可以执行要求更高的任务。

https://reurl.cc/v75Rre

如果您想了解更多关于我们如何让T5在Float16情况下正常工作，请查看本次推送二条。

您也可以在IPU上尝试T5的其他版本：

使用MT5-Large在IPU上进行零样本文本分类——推理

^[8]

使用MT5-Small在IPU上进行机器翻译——Fine-tuning

^[9]

使用T5-Small在IPU上进行总结——Fine-tuning

^[10]

如果您希望继续探索IPU上的NLP，请查看我们的GPT-J Fine-Tuning博客和相应的notebook。

[1]https://arxiv.org/abs/2210.11416

[2]https://arxiv.org/abs/2210.11416

[3]https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu

[4]https://github.com/huggingface/optimum-graphcore

[5]https://github.com/huggingface/optimum-graphcore

[6]https://huggingface.co/datasets/squad

[7]https://huggingface.co/datasets/xsum

[8]https://ipu.dev/f8VpDW

[9]https://ipu.dev/lwssDB

[10]https://ipu.dev/UFm0ow

获取更多Graphcore资讯，阅读深度技术文章，并与其他创新者们一起交流，请至中国官网graphcore.cn，以及关注Graphcore微信、微博和知乎创新社区。

Graphcore中国官网

Graphcore官方微信

Graphcore微博创新社区

Graphcore知乎创新社区

点击阅读原文，查看英文blog。

继续阅读

阅读原文

关键词

模型

任务

语言模型

性能

IPU上