Streaming¶

Streaming delivers agent responses token-by-token as they're generated, instead of waiting for the complete response.

What is Streaming?¶

Without streaming:

[User waits...]
[User waits...]
[Complete response arrives]

With streaming:

The capital
The capital of
The capital of France
The capital of France is
The capital of France is Paris.

Benefits¶

Real-time feedback - Users see progress immediately
Better UX - Feels more interactive
Early cancellation - Stop if going wrong direction
Perceived speed - Feels faster even if same total time

Enabling Streaming¶

Configuration¶

agents:
  assistant:
    reasoning:
      enable_streaming: true  # Enable streaming

REST API (SSE - Server-Sent Events)¶

curl -N http://localhost:8080/agents/assistant/messages/stream \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Explain quantum computing"
  }'

Response (SSE format):

data: {"chunk": "Quantum"}
data: {"chunk": " computing"}
data: {"chunk": " uses"}
data: {"chunk": " quantum"}
data: {"chunk": " mechanics..."}
data: [DONE]

WebSocket¶

const ws = new WebSocket('ws://localhost:8080/agents/assistant/stream');

ws.send(JSON.stringify({
  message: "Explain quantum computing"
}));

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  if (data.chunk) {
    process.stdout.write(data.chunk);
  }
};

gRPC Streaming¶

stream, err := client.StreamMessage(ctx, &pb.SendMessageRequest{
    Agent: "assistant",
    Message: "Explain quantum computing",
})

for {
    chunk, err := stream.Recv()
    if err == io.EOF {
        break
    }
    fmt.Print(chunk.Content)
}

CLI (Automatic)¶

# Streaming enabled by default in CLI
hector call "Explain quantum computing" --agent assistant --config config.yaml
# Response streams as it's generated

Streaming with Tools¶

When agents use tools, streaming shows progress:

agents:
  coder:
    reasoning:
      enable_streaming: true
      show_tool_execution: true
    tools: ["write_file", "execute_command"]

Streamed output:

Let me create that file...
[Tool: write_file("hello.py", "print('Hello')")]
File created successfully.
Now let me test it...
[Tool: execute_command("python hello.py")]
Output: Hello
The program works correctly!

Streaming with Sessions¶

Combine both for best experience:

agents:
  assistant:
    reasoning:
      enable_streaming: true
    memory:
      working:
        strategy: "summary_buffer"
        budget: 4000
      longterm:

        storage_scope: "session"

REST API:

# Create session
SESSION_ID=$(curl -X POST http://localhost:8080/agents/assistant/sessions | jq -r '.session_id')

# Stream messages in session
curl -N http://localhost:8080/agents/assistant/sessions/$SESSION_ID/messages/stream \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello"}'

Agent streams responses and maintains session context!

Advanced Configuration¶

Streaming Options¶

agents:
  assistant:
    reasoning:
      enable_streaming: true
      show_tool_execution: true  # Show tool calls in stream
      show_thinking: false       # Show internal reasoning
      show_debug_info: false     # Show debug details

Streaming Customization¶

agents:
  custom:
    streaming:
      chunk_size: 10      # Characters per chunk
      delay_ms: 50        # Delay between chunks
      buffer_size: 1024   # Buffer size

Use Cases¶

Chat Applications¶

agents:
  chatbot:
    reasoning:
      enable_streaming: true
    memory:
      working:
        strategy: "buffer_window"
        window_size: 20

Frontend:

const eventSource = new EventSource(
  `http://localhost:8080/agents/chatbot/sessions/${sessionId}/messages/stream`
);

eventSource.onmessage = (event) => {
  const data = JSON.parse(event.data);
  appendToChat(data.chunk);
};

Customer Support¶

agents:
  support:
    reasoning:
      enable_streaming: true
      show_tool_execution: true
    tools: ["search", "agent_call"]

Code Assistants¶

agents:
  coder:
    reasoning:
      enable_streaming: true
      show_tool_execution: true
      show_thinking: true  # Show reasoning process
    tools: ["write_file", "execute_command", "search"]

Monitoring & Debugging¶

Monitor stream chunks:

curl -N http://localhost:8080/agents/assistant/messages/stream?debug=true \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello"}'

API Reference¶

REST Endpoints¶

Endpoint	Method	Description
`/agents/{agent}/messages/stream`	POST	Stream message (SSE)
`/agents/{agent}/sessions/{id}/messages/stream`	POST	Stream in session (SSE)
`/agents/{agent}/stream`	WS	WebSocket streaming

gRPC Methods¶

service A2AService {
  rpc StreamMessage(SendMessageRequest) returns (stream MessageChunk);
}

See API Reference for full details.

Best Practices¶

Enable for Interactive Apps¶

# ✅ Good: Streaming for interactive apps
agents:
  fast:
    reasoning:
      enable_streaming: true

# ❌ Bad: Streaming disabled for interactive apps
agents:
  slow:
    reasoning:
      enable_streaming: false  # Users wait for complete response

Error Handling¶

// ✅ Good: Handle stream errors
const eventSource = new EventSource(url);

eventSource.onerror = (error) => {
  console.error('Stream error:', error);
  eventSource.close();
};

// ❌ Bad: No error handling
const eventSource = new EventSource(url);
eventSource.onmessage = (event) => { /* ... */ };

Buffering for Display¶

// ✅ Good: Buffer and update UI efficiently
let buffer = '';
let updateTimer = null;

eventSource.onmessage = (event) => {
  buffer += event.data;

  clearTimeout(updateTimer);
  updateTimer = setTimeout(() => {
    updateUI(buffer);
    buffer = '';
  }, 50); // Update every 50ms
};

Next Steps¶

Sessions - Combine streaming with sessions
API Reference - Complete API documentation
Memory - Configure memory strategies
Build a Chat Application - Complete tutorial

Agent Overview - Understanding agents
Configuration Reference - All streaming options
Reasoning - Reasoning engines