I gave into the hype around openclaw and gave it a go. I had stayed away from it because I knew it was a 400k thats a huge red flag for something I'm supposed to entrust with my money. Turns out I was right but it lead me to try some other products1 and in the process I gradually learned how these things work. They are really ridiculously simple, I don't know how openclaw is taken seriously. It's obviously mega slop.
An AI agent is just an interpreter for an LLM that understands what affect the LLM would like to have one the world and it just goes ahead and applies that effect and tells the LLM what the results was. As the user of an agent you're just another tool2 from their perspective. The only difference between you and it's other tools is that is that it listens to you all the time. It's other tools are only heard after being invoked.
Despite how impressive agents make LLM's look, an LLM is still just a function that takes in text and puts out text. An agent is just something that tells that function about a list of spells and when the LLM emits one of those magic spells the agent casts it. This is all claude code is and it's also all openclaw is. The difference is that openclaw has been given a longer list of spells. And one of them is 'cron' which makes it appear as if the agent has come to life. All it's really doing though is scheduling tasks for itself.
Actually there is one more thing agents add to LLMs and that's short and long term memory. Long term memory has become known as skills. They are just collections of files along with a brief note about when to use them. They enable agents to for example, build themselves a dataset of colour schemes for designing GUI's or a dataset of suppliers that saves them time when estimating the cost of a project. Humans can manually edit these long term memories if they like and in doing so we are teaching our agents how to do things more efficiently or just more to our liking.
The implementation of short term memory is more interesting though. It's conceptually just a concatenation of all previous input and output of the LLM. This would quickly generate context (LLM input) larger than the LLM can make use of. So occasionally the agent will ask the LLM to summarise it's recent activity and then older activity can be forgotten an context is accumulated from here as per usual. This alone gives the agent short term memory. However, there is a further improvement that can be make which is to abstract the users input before storing it in memory. "So find me a book" and "Look for a book" become the same thing. And "google this book" in it's abstract form would be closer than "order me a sandwich". This enables the agent to find memories that are older than the last compaction but highly relevant to the current query. And therefore behave as if it has a much larger context that it actually has.
The best project I'm aware of is zeptoclaw. It's much simpler that openclaw, but being Rust it's not really human readable.
Tool is the word the industry has given to the things the interpreter invokes on behalf of the LLM