Language models like ChatGPT have revolutionized the field of natural language processing, but they still struggle with some basic tasks such as arithmetic and fact-checking. Last Thursday, researchers from Meta revealed Toolformer, an AI language model that can teach itself to use external tools such as search engines, calculators, and calendars without sacrificing its core language modeling abilities.
The key to Toolformer is that it can use APIs (application programming interfaces), which are a set of protocols that allow different applications to communicate with one another, often in a seamless and automated manner. During training, researchers gave Toolformer a small set of human-written examples demonstrating how each API is used and then allowed it to annotate a large language modeling dataset with potential API calls. It did this in a “self-supervised” way, meaning that it could learn without needing explicit human guidance.
The model learned to predict each text-based API call as if they were any other form of text. When in operation—generating text as the result of a human input—it can insert the calls when needed. Moreover, Toolformer can “decide” for itself which tool to use for the proper context and how to use it.
This API-calling ability enables Toolformer to use external software tools like search engines, calculators, language translators, and factual references. For example, large language models (LLM) are well-known for not being particularly good at arithmetic. Toolformer can work around that limitation by using a calculator program. Or if someone wanted an LLM-based assistant to add a date to their calendar, Toolformer could handle that task by using an API link to a calendar app.
Toolformer is based on a pre-trained GPT-J model with 6.7 billion parameters. Experiments conducted by the researchers on various tool-using tasks seem to demonstrate that Toolformer achieves far stronger performance than the much larger GPT-3 model, which contains 175 billion parameters.
This isn’t the first time researchers have attempted to make up for limitations in language models. In fact, the recent Bing Chat model making the news this week can perform web searches on its own when needed, and others have attempted integrations with browsers, calculators, and search engines. According to Meta’s researchers, most existing approaches to integrating tools into language models have relied on large amounts of human annotations or have been limited to specific task-specific settings. In contrast, Toolformer can learn to use a range of tools in a generalized way that does not require specialized training for specific tasks.
With techniques like those found in Toolformer, we’re looking at a potential future where LLMs augmented with the ability to use external apps will become far more versatile and reliable assistants (ostensibly). But the ability to perform API calls also might increase an LLM’s capability to cause harm to user data (in apps) or create trouble in the outside world (through a web browser or communications tools)—abilities that they might accidentally invoke while providing an answer.
https://arstechnica.com/?p=1918021