We are excited to release Council - an open-source framework for the rapid development and robust deployment of customized generative AI applications using teams of ‘agents.’
Council extends the Large Language Model (LLM) ecosystem by enabling advanced control and scalable oversight for AI applications using these agents, which are designed to have limited autonomy - with the ability to act on a task within a budget and under human supervision. Users can leverage this approach to create sophisticated agents with predictable behavior, which allows automated routing among agents, comparing, evaluating, and selecting the best results for a (sub-)task.
In this post, we describe our vision for Council, how we see AI models evolving and its impact on agents, and why we believe an important direction for the future is to build a community around ‘teams of cooperating agents’ that we can trust and oversee.
AI’s Innovation (and Underlying Risks)
Recent advances in LLMs like GPT-4, Llama 2, and Claude 2 have unlocked impressive new capabilities integrating superhuman knowledge with reasoning, code generation and analysis across countless industries. There are tremendous opportunities to empower knowledge workers like researchers, analysts, and product managers by building systems built for specialized tasks that use them as modular components (i.e., agents).
For example, an AI assistant for analyzing user churn across nearly any kind of product can use one agent to query databases, another to clean and prepare data, a third to generate hypotheses and perform statistical analysis, and a fourth to generate visual or written insights that can help inform strategies.
However, there are also risks if such systems are deployed irresponsibly. They can hallucinate, interpret prompts incorrectly, and respond inconsistently. Simplistic schemes for integrating them into applications can cause errors to compound — and real-world applications built without oversight can codify harmful biases, gain unfounded user trust, or act without regard for common sense constraints.
With so many opportunities (but also risks and challenges) on the forefront for the capabilities of LLMs to power the future AI, our industry requires robust, all-in-one solutions to increasingly present and potentially dangerous problems.
At the forefront of AI innovation and our ability to control outcomes is the potential for meticulously built collaborating agents. These agents help balance the autonomous knowledge and vast potential of LLMs and their opportunity to integrate human oversight to minimize risk, adhere to budgetary constraints, and confidently complete tasks.
Our proposed solution, Council, makes control flow possible while not limiting the potential of AI outputs that bring incredible value to our world's many builders and application users. Through our solution, we are preparing (and empowering) AI for a future where innovation never has to sacrifice safety and predictability. Let's look around where we are in AI to learn more about how Council will get us to where we need to — and can — go.
AI Development Trajectory — What We Expect
Today’s LLMs are trained on broad datasets of over a trillion words to acquire general knowledge about our world. Through self-supervised learning, they develop superhuman knowledge with powerful capabilities in reasoning, conversational understanding, and language generation.
There has been a rapid increase in the amount of additional information that LLMs can integrate. Research advances have been deployed to allow prompting with more input (such as GPT-4 supporting 32,000 tokens of inputs and Claude supporting 100,000), and there is rapid progress on the retrieval-augmented generation that uses specialized search techniques to find relevant information to give context to tasks.
So what do we see coming to the rapidly expanding LLM space? To name just a few innovations on the horizon.
- Rapid progress in other dimensions for LLMs as they become multi-modal foundation models (with support for video, images, and sound).
- Significant improvements in planning and reasoning, as well as integrating and updating external knowledge.
- Continued optimization to allow faster and lower cost inference.
These optimizations will come from model architectures like pruning, quantization, and distillation, as well as increasingly efficient accelerator chips and optimization techniques. Such advances will enable practical applications to leverage much more iterative generation, exploration of alternatives, and feedback.
Investment management firm Ark Invest projects that the cost of inference for LLMs decline more than 10,000-fold by following Wright’s Law. As the cost of inference drops dramatically, we believe there will be tremendous value in leveraging the power of AI models in concert to do more sophisticated tasks with more autonomy.
Safely Unlocking the Potential of AI
LLMs show promise for complex reasoning and planning when prompted systematically. Techniques like Chain-of-Thought, Self-Consistency, and Multi-Chain Reasoning elicit substantially more coherent behavior from LLMs. As a logical (and critical) next step, we see tremendous promise in framing tasks in terms of how a team of ‘agents’ with complementary skills can collaborate to achieve results. We have observed that LLMs produce significantly better plans for collaborating teams compared to plans they produce for a single generalist agent.
However, thoughtlessly applying these models carries high risks. Techniques like self-critique, constitutional AI, and reinforcement learning from human feedback show promise for optimizing model behavior during inference. For example, an agent can learn to identify and mitigate issues like bias by soliciting user critiques on its outputs.
Architecting overarching control flow is essential to steering advanced reasoning capabilities towards safe outcomes for developers and users of applications. Interaction design is key for providing transparency without overwhelming users. Continuously measuring and optimizing output quality also relies on a repertoire of validation techniques.
As foundation models continue to improve in reasoning quality, integrating more advanced search techniques will become increasingly valuable. For example, Monte Carlo tree search allows an agent to explore alternative plans under computational budgets and constraints. To make this efficient, LLMs can be used to estimate probabilities for different options with upper confidence bounds (much as AlphaZero does in searching a game tree) For more complex tasks, we expect this approach to generate alternative plans recursively will greatly improve result quality.
For example, if a manager needs market data for a direct-to-consumer food delivery service, they can use an application that taps Council's Research Agent and SQL Engineer to work together to generate analytical insights. The Research Manager formulates hypotheses, and the SQL Engineer collects relevant data through queries. The Research Manager can then perform statistical analyses to support or refute the hypotheses, resulting in actionable insights for the manager.
This process applies to various industries, including technical support, where multiple agents with different expertise collaborate to enhance efficiency and customer experience using data such as configuration, logs, code, and support ticket history. Council offers developers the ability to build applications that use AI with confidence and control over accurate outcomes.
Defense in Depth
To enable this potential, it’s important to build applications in a robust, predictable manner. There isn’t a single technique to achieve this, instead a resilient approach requires defense in depth with complementary mitigations. LLMs already use techniques such as Reinforcement Learning from Human Feedback to filter inappropriate or dangerous content. Because agents interact through natural language, it is natural to include automated evaluation and revision of output to improve quality. For example, it is possible to apply Constitutional AI to generate critiques of agent output and to revise them (in an analogous manner to how it was used at training time). Similarly approaches like CRITIC involve agents critiquing and revising output with the use of tools.
In addition, because agents interact through natural language it allows for human review and oversight. For example, agents can solicit user input on ambiguous situations to provide key guidance.
Council intends to allow for a flexible approach to quality management for agents. This encompasses automated generation of tests and automated evaluation of result quality. However, it also needs to include facilities to integrate user and rater feedback. We believe that the gold standard for building high-quality LLM systems is the approach to ratings that leading search engines have created (e.g., an overview of Google’s approach).
Council intends to improve control and predictability by emphasizing compartmentalization between components, measurable metrics as feedback signals, and continuous incremental improvements leveraging research in orchestration as well as underlying model improvements.
With all the options for how agents can interact and how to best produce results, the best path to take is hard to discern. LLMs have tremendous knowledge and flexibility, so agents should take advantage of those capabilities. But it is still important to allow human oversight, to allow for budget constraints and to have the control to let agents reliably complete known tasks.
To balance these forces, we believe that control flow is one the most important aspects of agent design and that controllers that can combine the power of LLMs, search, and constraints will be an area of rapid evolution. So Council makes a Controller a first-class object to allow experimentation and reuse of effective control flow techniques.
A Stronger (and Safer) Future for AI, Empowered by Council
The staggering recent progress in AI represents an unprecedented opportunity to generate insights, automate rote work and empower knowledge workers. However, it is important to employ these increasingly powerful models responsibly to ensure safety.
By combining techniques like flexible control architectures, defense in depth, and quality metrics, Council aims to realize the promise of AI for building specialized applications that you can deploy with confidence. We are eager to foster a community to advance techniques that harness the promise of AI for collaborating agents while continually improving safety and predictability. We would welcome you to join the Council community.
Stay connected to ChainML on Twitter.