Usage#
The following sections provide an overview of how to use KodeAgent to create and run intelligent agents capable of performing various tasks using LLMs and tools.
ReAct and CodeAct Agents#
Using KodeAgent, you can create a ReAct agent and run a task like this:
from kodeagent import ReActAgent, print_response, extract_as_markdown, search_web
agent = ReActAgent(
name='Web agent',
model_name='gemini/gemini-2.5-flash-lite',
tools=[search_web, extract_as_markdown],
max_iterations=7, # This parameter is being deprecated; pass it to run() instead
)
for task in [
'What are the festivals in Paris? How they differ from Kolkata?',
]:
print(f'User: {task}')
async for response in agent.run(task):
print_response(response, only_final=True)
The print_response function displays the agent’s responses using a rich format. Setting only_final=True ensures that only the final response is printed. Otherwise, the intermediate streaming responses from the agent are also shown.
Tasks can also be run with input files. The files can be local files, remote files, or URLs. Run a task with files in this way:
async for response in agent.run(
task='Caption these images',
files=[ # Always a list of files
'/home/user/image1.jpg',
'http://example.com/image2.jpg',
],
max_iterations=5,
):
print_response(response, only_final=True)
You can also create a CodeAct agent:
from kodeagent import CodeActAgent, search_web, extract_as_markdown
agent = CodeActAgent(
name='Web agent',
model_name='gemini/gemini-2.5-flash-lite',
tools=[search_web, extract_as_markdown],
run_env='host',
max_iterations=7,
allowed_imports=['re', 'requests', 'duckduckgo_search', 'markitdown'],
pip_packages='ddgs~=9.5.2;"markitdown[all]";',
)
The run_env parameter specifies the environment where the agent’s code will execute. Setting it to 'host' allows the agent to run code directly on the host machine. You can also use 'e2b' to run the code on E2B sandbox.
Function Calling Agent (SLM Optimized)#
The FunctionCallingAgent (FCA) is specifically designed and optimized for Small Language Models (SLMs). It uses the model’s native function-calling capabilities instead of the more complex ReAct or CodeAct reasoning loops.
While its internal implementation is specialized for SLMs, it maintains the same run() API as ReActAgent and CodeActAgent, making it easy to swap according to your model choice. Some features, such as observability, are yet to be supported for function-calling agent.
from kodeagent import FunctionCallingAgent, search_web, calculator, print_response
# Optimized for models like Qwen, Granite, Phi, or smaller local models
agent = FunctionCallingAgent(
model_name='ollama/qwen3:4b-instruct-2507-fp16',
tools=[search_web, calculator],
litellm_params={'temperature': 0, 'timeout': 90} # Required for robust tool use with SLMs
)
async for response in agent.run('Calculate the market cap of Apple'):
print_response(response, only_final=True)
Key Features#
Lightweight: Minimal overhead, ideal for models with at least 4B parameters.
Native Tooling: Leverages the model’s built-in tool-calling interface.
Final Answer Tool: Automatically includes a
final_answerpseudo-tool to ensure structured outputs from smaller models.Nudge Mechanism: Includes built-in loop detection and “nudges” to prevent smaller models from getting stuck.
Graceful Fallback: If the model finishes without a final answer tool call, the agent uses a summary step to produce a clear final response.
Function-calling agent may not work with every SLM. Our experiemnts reveal that it works well with models having 4B parameters or more. When possible, use q8 or higher quantization. In particular, FunctionCallingAgent has been tested with:
qwen3:8b-q8_0
qwen3:4b-instruct-2507-fp16
granite4:7b-a1b-h
phi4-mini:3.8b-q8_0
ministral-3:8b-instruct-2512-q8_0
Try out this Colab notebook to see it in action.
The agent may not work with reasoning/thinking models. They often produce answers without any tool calls.
Choosing the Right Agent#
KodeAgent provides three types of agents. Choosing the right one depends on your target model and the task’s complexity.
Agent |
Best For |
Model Type |
Notes |
|---|---|---|---|
FunctionCallingAgent |
General tasks, orchestration |
SLMs (≥ 4B params) |
Optimized for smaller models, such as Qwen, Granite, or Phi. |
ReActAgent |
Search, simple reasoning |
Frontier LLMs |
Uses standard Thought-Action-Observation loop. |
CodeActAgent |
Data heavy, algorithmic |
Frontier LLMs |
Most powerful; writes and executes Python code. |
When to use which?#
Model Size: If you are running small local models (SLMs), use
FunctionCallingAgent. It is lightweight and designed to work within the constraints of models with fewer parameters. Try 8-bit or higher quantization for better performance.Task Complexity: For tasks involving complex data manipulation, algorithmic problem solving, or multi-step calculations,
CodeActAgentis usually more robust thanReActAgent.Safety & Sandboxing: Because
CodeActAgentexecutes arbitrary Python code, it is highly recommended to run it in a sandboxed environment (e.g.,run_env='e2b') rather than the host machine.LLM Performance: Frontier models like Gemini 3.0 or GPT-5 perform exceptionally well with both
ReActAgentandCodeActAgent.
Task Result and State#
The only way to execute any task is by invoking the run() method with the task description and optional files. The run() method provides streaming responses from the task execution, which can be iterated over asynchronously. However, often you may want to access the final response from the agent. This can be accessed via agent.task.result:
async for _ in agent.run('Some task description'):
pass
final_response = agent.task.result
print(final_response)
ⓘ NOTE
The
resultis available only when the agent has found the final answer. Otherwise, it will beNone. An agent can fail to find a final answer due to several reasons, e.g., a timeout, an error in the code, and max iterations reached. See the next section for more details on how to check if the final answer has been found.
agent.task provides access to the current task object, which contains useful information such as the task inputs, the final result, and LLM usage statistics. Refer to the Task model’s API documentation for more details.
The agent also tracks the LLM usage statistics (reported by LiteLLM) for each task, detailing the component-wise token usage and the cost in USD. This can be accessed in two ways: raw data (dict) and formatted report (str), for example:
# Access raw LLM usage data
llm_usage_data = agent.get_usage_metrics()
# Print formatted LLM usage report
llm_usage_report = agent.get_usage_report()
print(llm_usage_report)
You can also view the plan followed by the agent to complete the task:
print(agent.current_plan)
Memory and State Management#
KodeAgent is designed to be minimalistic and memoryless across tasks by default. Each call to run() is independent, and the agent does not retain conversation history or results from previous tasks.
However, you have two ways to manage state across runs if needed.
1. Recurrent Mode (Single-Task Context)#
You can enable Recurrent Mode by passing recurrent_mode=True to the run() method. When enabled, the current task description is automatically augmented with context from the immediately preceding task executed by that agent instance.
What is Augmented?#
In Recurrent Mode, the agent is provided with:
Previous Task: The description of the last task.
Result: The final answer from the last task (truncated if too long).
Status: Whether the last task completed successfully or failed.
Generated Files: A list of files created during the last task.
Progress Summary: If the previous task was interrupted or failed to finish, a summary of what was achieved so far (via
salvage_response).
Example Usage#
Recurrent mode is useful for chaining tasks where the second task depends on the outcome of the first:
# Task 1: Perform a calculation or data retrieval
async for response in agent.run('Find the population of France in 2023'):
print_response(response, only_final=True)
# Task 2: Use the result of Task 1 with recurrent_mode=True
async for response in agent.run('What would it be with a 0.5% growth?', recurrent_mode=True):
print_response(response, only_final=True)
In the second run, the agent’s task description is internally modified to include:
### Previous Task Context
**Previous Task**: Find the population of France in 2023
**Result**: 68.1 million
**Status**: ✅ Completed
---
### Current Task
What would it be with a 0.5% growth?
Tracing#
When using tracing (Langfuse or LangSmith), the augmented task description is captured as the task input. This ensures that the context provided to the agent is fully visible in your observability dashboard.
ⓘ NOTE
While
langfuseis included with KodeAgent by default,langsmithis not and must be installed separately usingpip install langsmith.
2. Chat History Injection (Full Session Context)#
For more complex state management or multi-session persistence, you can manually provide an existing OpenAI-compliant Chat History to the run() method. This allows you to “resume” an agent from a previously saved state. For example, an application that uses KodeAgent to solve tasks can save the chat history to a database or external log, and then use it to resume the agent from that state. Here is an example:
# A previously saved chat history
history = [
{"role": "user", "content": "What is 5 + 5?"},
{"role": "assistant", "content": "10.0"}
]
# Inject the history to continue the conversation in a new task
async for response in agent.run('Now add 20 to that result', chat_history=history):
print_response(response, only_final=True)
The injected chat history may also include tool calls and tool results from previous tasks. The agent will use this information to continue the conversation in a seamless manner. However, any incorrect tool call sequences in the provided chat_history will raise errors.
Validation#
KodeAgent performs stringent validation on the provided chat_history to ensure it is compliant with OpenAI’s API structure:
Roles: Must be one of
system,user,assistant, ortool.System Message: Only one system message is allowed, and it must be at index 0. If you don’t provide one, the agent’s default system prompt will be prepended automatically.
Tool Integrity: Every
toolmessage must refer to a validtool_call_idfrom a precedingassistantmessage.Pending Resolution: You cannot inject a history that ends with unresolved tool calls (i.e., an assistant message with
tool_callsthat does not have correspondingtoolresults).
ⓘ NOTE
chat_historyandrecurrent_modeare mutually exclusive. Attempting to provide both will raise aValueError.
Streaming Responses#
The agent.run() method is an asynchronous generator that yields AgentResponse objects. This allows you to monitor the agent’s progress in real-time.
Response Structure#
Each AgentResponse is a dictionary with the following fields:
Field |
Type |
Description |
|---|---|---|
|
|
The type of update: |
|
|
The content of the update. For |
|
|
Optional identifier for the source of the response (e.g., ‘run’, ‘think’, ‘act’). |
|
|
Optional dictionary containing additional information like |
Handling Responses#
You can use the response type to filter or format the output:
async for response in agent.run(task):
if response['type'] == 'final':
print(f"Final Answer: {response['value']}")
elif response['type'] == 'step':
# value is a ChatMessage object (ReActChatMessage or CodeActChatMessage)
print(f"Step: {response['value'].thought}")
elif response['type'] == 'log':
print(f"Log: {response['value']}")
Setting API Keys#
Set your LLM and other API keys as environment variables. KodeAgent relies on several environment variables for model access, code execution, and observability.
Category |
Variable |
Description |
|---|---|---|
LLM (LiteLLM) |
|
API key for Gemini models. |
|
API key for OpenAI models. |
|
|
API key for Claude models. |
|
Code Execution |
|
Required for |
Observability |
|
Public key for Langfuse tracing. |
|
Secret key for Langfuse tracing. |
|
|
Host URL for Langfuse (default: https://cloud.langfuse.com). |
|
Tools |
|
Required for |
Setting Variables#
For example, if your are using Linux, you can add the following lines to your ~/.bashrc or ~/.bash_profile:
# OpenAI API key (if using OpenAI models)
export OPENAI_API_KEY=your_openai_api_key
# Gemini API key (if using Gemini models); for Vertex AI, see LiteLLM documentation
export GOOGLE_API_KEY=your_google_api_key
# For remote code execution on E2B
export E2B_API_KEY=your_e2b_key
# For logging with Langfuse
export LANGFUSE_PUBLIC_KEY=pk_something
export LANGFUSE_SECRET_KEY=sk_something
export LANGFUSE_HOST=your_url
On Windows, navigate to Settings > Environment Variables and add the variables there.
If you are using KodeAgent in development mode, you can also create a .env file in the root directory of the project and add your keys there:
OPENAI_API_KEY=your_openai_api_key
GOOGLE_API_KEY=your_google_api_key
Observation and Feedback#
The Observer is an internal component that monitors the agent’s work to detect if it’s stuck in a loop or has stalled. It can provide corrective feedback to help the agent get back on track.
By default, the Observer is enabled and triggered based on a predefined threshold in the agent’s logic. It analyzes the chat history and the current plan to ensure the agent is making meaningful progress.
Customizing Agent Identity#
KodeAgent provides two approaches to optionally customize the system prompt of ReActAgent and CodeActAgent:
persona: Use this parameter to define a specific role or behavior for the agent while keeping the default system prompt structure. This is the recommended way to steer the agent’s identity.system_prompt: Use this parameter to completely override the default system prompt with your own.
Both of these parameters are optional.
ⓘ NOTE
The
personaandsystem_promptparameters are mutually exclusive. If both are provided,system_promptwill take precedence, andpersonawill be ignored.
Examples#
Setting a Persona:
agent = ReActAgent(
name='Web agent',
model_name='gemini/gemini-2.5-flash-lite',
tools=[search_web, extract_as_markdown],
max_iterations=7,
persona='You are an expert assistant specialized in analyzing CSV files.',
)
Overriding the System Prompt:
agent = ReActAgent(
name='Web agent',
model_name='gemini/gemini-2.5-flash-lite',
tools=[search_web, extract_as_markdown],
max_iterations=7,
system_prompt='You are a helpful assistant. Always respond in markdown. Use these tools...',
)
⚠ CAUTION
It is strongly recommended that the default system prompt is retained almost entirely; new or additional instructions can be added to it. For example, if you are building a CSV agent, you can add instructions to analyze CSV files to the default system prompt. Removing the instructions from the default system prompt altogether may affect the agent’s performance.
How is Persona Injected?#
The default system prompt is generic and designed to work for a wide range of tasks. In some scenarios you might want to build specialized agents that exhibit specific behaviors or expertise. The persona parameter allows you to do this without completely overriding the default system prompt.
For example, if you are building a CSV agent, via persona, you can instruct the agent to focus on CSV analysis while still following the core instructions of the default prompt.
When you provide a persona, it is injected into the default system prompt at a designated placeholder. This allows the agent to adapt its behavior according to the specified persona while still following the core instructions of the default prompt.
In particular, the first few lines of the default system prompt contain a placeholder for the persona:
You are an expert AI agent that solves tasks using specialized tools through a structured reasoning process.
{persona}
## Your Process
So, word the persona string accordingly to fit naturally in this context. For a detailed example of persona usage, refer to the CSV cleaning agent example using CodeActAgent.