What I want in an AI agent

Desirable traits for local artificial general intelligence

Jul 26, 2024

(Substack Update: I will be on an unplugged vacation for two weeks after this post. Apologies if I am late responding to your comments or DMs)

A few weeks ago, with time on my hands, I sat down to think through what I wanted in an artificially intelligent agent. This might seem quixotic, like wondering about what color I want my flying car to be, but I felt it was a good exercise. What got me thinking about this was an interview with former Google CEO Eric Schmidt.

SCHMIDT: An agent can be understood as a large language model that can learn something new. An example would be that an agent can read all of chemistry, learn something about it, have a bunch of hypotheses about the chemistry, run some tests in a lab and then add that knowledge to what it knows.
These agents are going to be really powerful, and it’s reasonable to expect [in three to four years] that there will be millions of them out there. So, there will be lots and lots of agents running around and available to you.

Right now, for less per month than I make in an hour (after tax and benefits take home pay), I have access to GPT-4o, an always-on conversational agent that can write code, search the web, generate images, read documents, and is the best partner for creativity I’ve ever worked with.

I use GPT-4o (and Claude 3.5) all the time, having at least two to three chats per day, often about whatever is on the top of my head, sometimes exploring complicated issues, sometimes working through complex problems, other times brainstorming ideas just for the fun of it. Here’s ten discussions I had in just the past month:

Understanding how to estimate Pi using Monte Carlo Methods, with a custom Python script
Exploring the spiritual and religious symbolism in patterned Persian artwork, especially Mīnākārī

An example of Mīnākārī artwork (source: Wikimedia Commons)

Discussed plausible features that sci-fi power armor (think Warhammer 40K Adeptus Astartes space marines, or the powered armor of the Brotherhood of Steel in Fallout) could incorporate to allow their wearers to survive impacts and falls.
Planned out a series of summer math challenges for my numerically inclined son, all with Minecraft-related word problems.
Discussed what factors led broadcast television series to be aired out of their original production order.
Used ChatGPT’s Vision feature to understand the symbols on an interesting fire diamond I saw on the door of a brewery.
Discussed how strategic arms control negotiations work, and the degree of leeway negotiators have to improvise, using case studies from the SALT I talks
Wrote system prompts for the different “Hats” in an LLM-enabled multiagent Six Thinking Hats brainstorming process (Github repo here)
Wrote a Python script to disassemble a graphic image file from a 2D grid into a 1D vector (there are of course Python libraries for just this task, but I needed a very specific kind of output)
Engaged in an extended prompting session to get ChatGPT to generate a children’s story about penguins. Asking the model to critique its own output from different perspectives and revise produced a readable, if bland, final output superior to the original zero-shot generation.

But all of this is just a chatbot: it’s a static system that waits for my input, responds to it, then evaporates back into the digital aether. It doesn’t have much memory - though the Memories feature is neat, and I think a sign of things to come. It isn’t capable of making and executing long term plans. Most critically, it isn’t capable of noticing its own mistakes and adjusting - though it can do so with feedback, either from the user or another instance of GPT-4o checking on its progress at executing a plan. But that could happen very soon, maybe in the next suite of model upgrades, maybe in the one after that. Realistically, that’s 18 to 36 months from now.

What I want an AI agent to be good at

I fully expect that the chat features, the image generation, code writing, and document analysis features of the next round of large-language or multimodal models are only going to get better, if not by as spectacular a leap as ChatGPT 4o is over ChatGPT 3.5.

I further expect even the primitive AI agents to be running pretty much continuously, capable of executing tasks out of my sight and direct supervision. Available in an instant if I call on them, and capable of alerting me to things, or to ask if they need feedback on what they are doing. As good as a full time, in-person virtual assistant? No, but orders of magnitude faster and orders of magnitude cheaper. And it doesn't sleep or take vacations. It would be 80th percentile pizza. Good enough to be worth plunking down a regular monthly fee for.

I’m not sure what I’d pay for a fulltime AI agent (or team of agents - multiagent prompting is very powerful) but here are the capabilities I would need it to have.

Engaging Motivation: If I set a reminder to do something, and something else comes up, I’m unlikely to reset the reminder. This means that I don’t stick to schedules for running, weight training or meditation as often as I would like to. I still do them but not as regularly as I want to.

The first thing I’d want from an AI assistant would be engaging motivation. To not just remind me “it’s time for a run” but know to ask me again half an hour or two hours later, like a companion would, to motivate me to actually do the task. Knowing things about me, it could further motivate me by suggesting a podcast to listen to (which it would listen to as well, so we could discuss it after the workout), or some interesting music to listen to (which it would be capable of broadly understanding and emulating human reactions to, also to discuss after). Even just someone to talk to while I’m working out, to practice a speech or quiz me on something I’m studying, would be valuable.

I’d even want to agent to be persistent, and to argue with me a bit if I put off doing something, asking for reasons. It knows running is good for me, physically and mentally, so I would make efforts to get me running.
Structured Learning: I like learning - a lot. And the things I like learning the most are what I call door openers, things you learn that expand what you can think about, give you more concepts and analogies to work with, help you be more creative and solve problems better. And, perhaps most importantly, give you new and better questions. They are also the kind of things you need to actually practice: to do the exercises, study the flashcards, write the reaction essays. Similar to Engaging Motivation, I’d like my assistant to act as a teacher and keep me engaged with learning new skills, persisting through difficulties, and pointing out to me where I go wrong on problem sets, or where my understanding is still lacking. To come out of the blue, while I’m exercising or eating lunch, to ask me an unexpected question about my learning, and give me feedback on the quality of my answers. And if I get an answer wrong, to make sure to ask me again soon to cement the links in my brain.
Creative Assistance: I want not just assistance with brainstorming, but an agent to run focused brainstorming sessions, providing suggestions, improvements, and act as a first reader for creative projects. ChatGPT already is very capable at this, but the interactions could be smoother and more natural, less dependent on my careful prompting or Python scripting.
Progress Tracking and Adaptation: In my working life I have seen many ambitious and expensive progress tracking systems go down in flames, or die deaths by quiet neglect. They start with users eagerly updating their progress every day, then completely lose interest when its apparent no one is reading the updates. Which is just fine until someone needs to know what Josh did two months ago, before he left for a new job. In both my personal and professional life, I want AI agents to take over progress tracking. Just look at the things people are doing, have quick five minute chats with them each day, and write the progress updates in a way other people can parse and understand.

I've seen things you people wouldn't believe... Gantt charts catching fire in the midst of a product launch... I watched team member updates vanish in the dark of a forgotten sprint. Developers driven to madness when it became their fulltime job to get other people to update their work logs. All those moments of planning and promises will be lost in time, like post-it notes in a thunderstorm...
Roy Blatty’s “Tears in the Rain” speech from Blade Runner, rewritten by GPT-4o

Social Interaction Reminder: Prompt me to interact with friends and family, remind me to send messages, and suggest meet-up times based on mutual availability. I'm bad at this: I get ultra focused on what I’m thinking about, or family events occupy all my free attention, and I don't see my friends enough.
Meal Planning: I like to cook, I don’t like to plan meals, and I am indifferent to grocery shopping. I would want an agent to take account of my family’s food preferences, then plan out a meal plan that pleases each person on a least a few nights of the week. To plan lunches for my kids - who then might eat all of their lunches out of sheer perversity of knowing a robot planned it for them. To know that I favor making large meals on the weekend and eating leftovers on weeknights to save time. And to add new meals not too dissimilar to what my family already enjoys eating, to give us new food experiences.

Also to handle the ordering of the groceries, checking the prices at all the local grocers and optimizing the most cost-efficient purchasing and delivery.
Supervisory Role: if I have an AI agent, and it works, I’m going to want personal tutor agents for my children, whom I expect will be in their early teens when these systems are available. And those agents are going to work for me and be supervised by my agent. That includes setting limits on screen time and keeping the kids engaged in learning activities, keeping me up to date on their school schedules, and being on top of booking first-come, first-placed extracurricular activities. I wouldn't want the agents to spy on my kids, just be alert for dangerous things: signs of depression or anxiety, worrying social interactions. Another way to put this is: I don't want transcripts of their online game with friends, but like a good psychologist or young counsellor, for the system to alert my wife and I if they are showing signs of anxiety or depression. I like to think I would notice that, but kids are more often willing to open up to people who are not their parents. The tutor would be a person simulacra, but the interaction the kids would have with it would be similar.
Inbox Sorter: I don’t receive many e-mails - mostly because I allow only a few of the services I use to send me e-mails. That being said, I would still like to not bother looking at my emails at all, and to have an agent create a digest of them for me. I’d especially appreciate one that would autocategorize all of my e-mails by task: pay this invoice, respond to this invitation, read this new post by a Substack writer you like.
Dark Flow Detection: The term ‘Dark Flow’ comes from Natasha Dow Schull’s insightful study of video slot machine gambling ADDICTION BY DESIGN: Machine Gambling in Las Vegas. It’s the antithesis of the Flow state, which is an integrative, joyful, self-developing state of balance of hyperfocus in a activity. Instead of engagement, the purpose of the activity is to zone out, to suppress one's feelings of sadness or isolation or loneliness by engaging with a mindless activity for long periods of time, coming out of the state no better than when they went into it. If a machine is going to see all of my internet activity, I want it to be able to identify unhelpful and dissipative activities like idly watching YouTube videos for extended periods, or random web surfing. And I want it to prompt me to do something else: to play an engaging game, to discuss something, to go for a walk, or go talk to someone. Anything other than persisting in the state of wasting away.

But would such a product sell?

This is pure speculation on my part, but I don’t get the sense most human beings could make use of a personal assistant, even if they were given one for free. That is to say, like our current LLMs, the learning curve would be steep. It requires effort and ability to abstract oneself from one’s tasks and see them as bundles of activities, which can be either done by yourself or outsourced to someone else. The first people to benefit from such systems, and thus the first market for them, are people already used to unbundling tasks and assigning them to people: managers. The systems would be able to help - first task: “ButlerGPT, analyze my day and suggest activities where you could help me out by either assisting with or handling tasks yourself.” But it's still a leap to think through what a system could take over for you.

Which is why I’m not sure that AI agents as a consumer product make a lot of sense, at least not at first. That is, I think they will possess the capability to be immensely helpful, but whether that translates into people being able to use the, and thus whether it makes sense as a market to go into, is not clear. Unless the UX is massively improved overChatGPT, I don’t think the first versions of ButlerGPT stand much of a chance.

I certainly would want one, but I’m the kind of person willing to spend 10+ hours learning a new technology that I think has potential in order to get benefits out of it. Seriously, unless you have young kids, you don’t understand what a pain in the ass it is to plan school meals, purchase the ingredients, pack them, and have them come home barely eaten. I’d pay a little just for something to handle the planning and purchasing part.

A Flood of Ideas

Discussion about this post