spring-ai-playground

description: Agentic Chat - one local runtime combining documents, tools, models, and memory, with system-prompt presets, per-turn reasoning, and rich Markdown rendering.

Agentic Chat

Where: top navigation → Agentic Chat.

Agentic Chat is the unified runtime where Spring AI Playground combines documents, tools, models, and conversation state. It is also where everything you assemble elsewhere - system prompts, built-in and authored tools, proxied MCP servers, and indexed documents - comes together as the live context for a single conversation.

Agentic Chat workspace - the conversation area above the prompt input, with the reasoning, built-in tool, external MCP, and RAG document selectors on the selector row and the New Chat, Export, Prompt Library, and Settings actions in the header{ width=”1500” }

This unified interface lets you:

Key Features

This area is closely aligned with Spring AI’s workflow and agentic guidance. If you want the conceptual background behind these two modes, see Building Effective Agents. For how the Playground assembles the full model context - system prompt, retrieved documents, tools, memory, and options - see Context Engineering.

The chat workspace

The screen has three regions:

Composing a request

Reasoning effort

The lightbulb dropdown on the selector row sets how hard the model thinks on the next turn - Off, Low, Medium, or High. It is dynamic: change it between turns without starting a new chat.

The reasoning effort dropdown open on the selector row, showing Off, Low, Medium, and High{ width=”166” }

The control is provider-aware and only appears for models that support it. The level maps to each provider’s own knobs - on OpenAI it becomes reasoning_effort; on Ollama it toggles thinking and its depth. Off sends no reasoning option at all, which is the safe default for non-reasoning models. See Context Engineering → Reasoning effort for the mapping.

Choosing tools and documents

The tools icon on the selector row opens the tool popover. It is the per-chat switch for what the agent may call:

The tool selector popover - a Use built-in MCP server toggle, then multi-select boxes for Custom tools, Built-in tools to expose, and Composed external tools{ width=”412” }

The document selector beside it enables Vector Database collections for RAG grounding. Both selections are remembered per conversation.

System prompts and presets

The system prompt frames every turn. You can type one in the settings drawer, or pull a ready-made preset or a variable-driven template from the Prompt Library (clipboard icon in the header). The two are related but distinct - a preset is a complete prompt you apply as-is; a template has `` you fill in first - and each has its own page below.

The chat settings drawer

The Settings cog opens the chat model drawer - the static configuration for the conversation. Editing it and pressing Apply & New Chat starts a fresh conversation with the new settings.

The chat settings drawer - sections for Model, Context, Generation, and Advanced details, with the Apply and New Chat button{ width=”732” }

The Advanced details section expanded, showing the provider-options JSON editor{ width=”732” }

The drawer is provider-aware: the option labels, the stop-sequence cap, and the JSON placeholder change with the active provider. The full property mapping lives in Context Engineering → Generation options. Out-of-range entries (a Recent-messages or Max-Tokens below 1, or more stop sequences than the provider allows) are flagged inline; Apply & New Chat is blocked until you fix them, and the drawer stays open with the offending field highlighted.

Switching models and the download gate

The download gate applies only when the active provider is Ollama (local models you pull). With Ollama active, the model dropdown marks any model that is not yet pulled with a download indicator.

The model dropdown with a download indicator on models that are not yet downloaded locally{ width=”702” }

If you apply an Ollama model that is not installed, the chat does not start on a missing model. A gate dialog appears first; choosing Download pulls it with a live progress bar and a cancel option, and when the download finishes the chat starts on the new model.

For a remote provider such as OpenAI there is nothing to download, so the download indicator, helper text, and gate dialog do not appear.

The model download gate dialog - a message that the model is not downloaded in Ollama yet, with Cancel and Download buttons{ width=”516” }

Provider lock

Each conversation is stamped with the provider that created it. If you open a saved conversation while the app is running a different provider, the conversation is shown read-only with a banner, so its history stays intact but you cannot append turns that the current provider could not have produced.

A conversation opened under a different provider - the input is disabled and a banner reads that the conversation was created with Ollama but the app is now running OpenAi{ width=”1275” }

Reading a response

Markdown, code, math, and diagrams

Assistant turns render as full Markdown. Code blocks are syntax-highlighted (highlight.js) with a language label and a one-click copy button; math renders with KaTeX both inline ($...$) and as display blocks ($$...$$); and fenced ` ```mermaid ` blocks render as diagrams. Links open in a new tab. Rendering runs once the turn finishes streaming.

A rendered assistant turn - a highlighted Python code block with a copy button, inline and display math, a Mermaid flow diagram, and a small table{ width=”583” }

Message actions

Hovering a turn reveals its action bar - six controls left to right:

The hover action bar on an assistant message - Collapse, Copy, Show raw, Read aloud, Quote in prompt, and Export{ width=”463” }

Timing and token metrics

Every assistant turn carries its own metrics in the header line - the time, how long the turn took, and the token counts, for example 4.2s · 331 tokens (in 90 · out 241). When a turn reasons or calls tools, the tokens spent in those stages are attributed to their respective panels.

Reasoning and tool panels

When a turn thinks, calls tools, or retrieves documents, those stages appear as collapsible panels above the answer, each summarizing its duration and token cost:

An agentic turn with an expanded THINK panel showing the model's reasoning and an expanded MCP TOOLS panel showing a getCurrentTime call with its request and result, above the final answer{ width=”1263” }

The panels collapse once a stage completes so the answer stays front and center; click any panel to reopen it. This is the same visibility the Observability dashboards capture after the fact.

Exporting a conversation

The Export conversation action in the header (and the per-message Export) writes the chat out as Markdown (.md), Plain text (.txt), JSON (.json), or a PDF (print).

The Export menu listing Markdown, Plain text, JSON, and PDF{ width=”193” }

Two Integrated Paradigms

1. RAG: Knowledge via Chain Workflow

When documents are selected, Agentic Chat follows a deterministic retrieval pattern:

2. MCP: Actions via Agentic Reasoning

When MCP connections are enabled, Agentic Chat can behave like an agent:

When a tool requires approval, Agentic Chat pauses and asks you to approve or decline the call before it runs - the on-device half of Human-in-the-Loop Approval. Declining tells the model the call was not run, so it won’t silently retry.

Workflow Integration

The intended end-to-end flow is:

  1. prepare tools in Tool Studio or connect them in MCP Server
  2. prepare knowledge in Vector Database
  3. enable the relevant documents and MCP connections in Agentic Chat
  4. send a request and observe how retrieval and tool use combine

This is the place where the rest of the product becomes visible as one coherent system rather than separate screens. The outputs of Tool Studio, MCP Server, and Vector Database all converge here.

Requirements for Agentic Reasoning

Basic chat can work with any supported provider. Tool-enabled agentic behavior works best with models that support function calling and stronger reasoning.

For Ollama-based flows:

The default playground.chat.models list features qwen3.5:2b (default) plus qwen3.5:9b / qwen3.6:35b for stronger tool-oriented reasoning, with gemma4:e4b, gpt-oss:20b, and deepseek-r1:8b as alternatives. See Picking a Model in the Tutorials for the tradeoffs.

Agentic Chat Architecture Overview

The diagram below is included as a conceptual reference to the related agentic systems material in the Spring AI docs.

It is included here to explain how the Playground’s Agentic Chat maps onto the broader Spring AI mental model. In this project, the diagram is not describing a separate product feature hidden behind the UI. It is a conceptual reference for understanding how the Playground combines model reasoning, retrieval, tool execution, and memory in one chat runtime. For the concrete build-up of that context in this project - system prompt, presets and templates, RAG, tools, memory, and per-request options - see Context Engineering.

Spring AI Agentic System Structure

If you want the fuller conceptual background, start with Building Effective Agents. That reference explains the workflow-versus-agent distinction that this Playground makes concrete through Tool Studio, MCP Server, Vector Database, and Agentic Chat.

This Chat experience facilitates exploration of Spring AI’s workflow and agentic paradigms, empowering developers to build AI systems that combine chain-based RAG workflows with agentic, tool-augmented reasoning. In practice, it follows Spring AI’s Agentic Systems architecture, where grounded retrieval and dynamic tool execution coexist in one context-aware chat runtime.

Component Type Description Configuration Location Key Benefits Model Requirements
LLM Core Model Executes chain-based workflows and performs agentic reasoning for tool usage within a unified chat runtime. Agentic Chat Central reasoning and response generation; supports both deterministic workflows and agentic patterns. Chat models; tool-aware and reasoning-capable models recommended.
Retrieval (RAG) Chain Workflow Deterministic retrieval and prompt augmentation using vector search over selected documents. Vector Database Predictable, controllable knowledge grounding; tunable retrieval parameters such as Top-K and thresholds. Standard chat plus embedding models.
Tools (MCP) Agentic Execution Dynamic tool selection and invocation via MCP, driven by LLM reasoning and tool schemas. Tool Studio, MCP Server Enables external actions, multi-step reasoning, and adaptive behavior. Tool-enabled models with function calling and reasoning support.
Memory Shared Agentic State The full conversation is kept locally; each turn the model sees a configurable trailing window, supplied through MessageChatMemoryAdvisor over an LlmWindowChatMemory decorator. Agentic Chat drawer (per-chat Recent messages) + spring.ai.playground.chat.memory-max-messages (default 10); history-max-messages (2000) caps the local store Coherent multi-turn dialogue without inflating every request; the recent-context window is tunable per conversation. Models benefit from a longer window when the task needs more history.

By leveraging these elements, Agentic Chat goes beyond basic Q&A and becomes a practical environment for building effective, modular AI applications that combine workflow predictability with agentic autonomy.

What the Chat can reach

Agentic Chat is a consumer of three inventories curated elsewhere in the Playground. Use these references to know what’s available before composing a chat session:

→ Try it: Tutorials - end-to-end flows that combine Tool Studio, MCP Inspector, Vector Database, and Agentic Chat.