AI bot evolution: Agents will be the next big thing in artificial intelligence
Jaspreet BindraBill Gates wrote a prescient blog recently on how agents will be the next big thing in software (https://bit.ly/3tSMNkB). In his inimitable style, he explained: “To do any task on a computer, you must tell your device which app to use. You can use Microsoft Word and Google Docs to draft a business proposal, but they can’t help you send an email, share a selfie, analyze data, schedule a party, or buy movie tickets. In the next five years, this will change completely. You won’t have to use different apps for different tasks. You’ll simply tell your device, in everyday language, what you want to do. This type of software—something that responds to natural language and can accomplish many different tasks based on its knowledge of the user—is called an agent.” He went on to predict how they will upend the software industry and replace apps to become new platforms we use every day. Big Tech companies and startups have heeded his advice. The first glimpse of an agent-led world came with OpenAI’s GPT store’. This has more than three million GPTs; these proto-agents are a peek into how Agent Stores will replace App Stores. Microsoft, OpenAI, and Google are scrambling to develop software which can do complex tasks by itself, with minimal guidance from you. Thus, the name agents – they have ‘agency’. Aaron Holmes writes in The Information (https://bit.ly/3U0dpJu) about how Microsoft is building software that can create, send, and track an invoice based on order history. Another one can “detect a large product order a business customer hasn’t filled, draft an invoice, and ask the business whether it wants to send that invoice to the client who placed the order. From there, the agent could automatically track the customer’s response and payment and log it in the company’s system.” These agents are powered by OpenAI’s GPT4 and are the next iteration of the Copilots that Microsoft has launched. OpenAI is also busy building agents that could work on different applications at the same time, moving data from a spreadsheet to a PowerPoint slide, for example. Companies are working on more complex agents that could run through multiple applications: create an itinerary and book tickets, accommodation, restaurants, and taxis, for instance. Planning a holiday is an onerous, time-consuming task where you need to work through a gigantic set of choices and apps, taking hours and days of your time. An empowered agent would know your preferences and quirks from your history and data and could do this in minutes. Another startup that Holmes writes about is Adept, co-founded by ex-Googler Anmol Gulati. Adept’s AI was built using videos of people actually working on their PCs, creating an Excel spreadsheet or a PowerPoint deck. Trained on these very human activities, Adept is building an ‘AI Teammate’ which can do these tasks for you. What is interesting is that the first use of agents would probably be by their creators, software developers themselves. Millions of them are already using Microsoft’s GitHub Copilot, which helps them write code better and faster. Agents built into them could listen to a problem a developer is facing, suggest some ways to address it, and then write, run, and test the code if the developer wants.
Agents would also create the next class of devices for the post smartphone era, like the Rabbit R1 and the AI Pin, both of which were unveiled in the last few months. They use GenAI models as their Operating System (OS), natural spoken language as the User Interface (UI) and, importantly, have rudimentary agents instead of apps. So, for example, you can call an Uber, order a Doordash, or play Spotify by just telling your Rabbit R1 to do so. The Large Action Model
(LAM), which are built on LLMs, is the OS of the Rabbit and it becomes your personal voice assistant. The LAM OS uses its long-term memory of you to translate your requests into actionable steps and responses; it comprehends what apps and services you use daily. The LAM can learn to see and act in the world like humans do. It is early days, but I believe that the app led devices of today will lead to these new agent-led devices.
While all this is super-exciting and novel, there are very thorny ethical concerns. So far, in this evolving dance between humans and AI, humans have held on to agency, the power to do stuff. That is why Microsoft calls its software the Copilot – it is not the autopilot, doing stuff on its own or the Pilot, since the human is the pilot. With agents, we devolve agency to AI – it potentially becomes an autopilot, and perhaps soon the Pilot itself. Thus far, we have managed to keep the lid of this particular Pandora’s Box shut; with agents, we just might crack it open.