description: Agentic Chat - one local runtime combining documents, tools, models, and memory, with system-prompt presets, per-turn reasoning, and rich Markdown rendering.
Where: top navigation → Agentic Chat.
Agentic Chat is the unified runtime where Spring AI Playground combines documents, tools, models, and conversation state. It is also where everything you assemble elsewhere - system prompts, built-in and authored tools, proxied MCP servers, and indexed documents - comes together as the live context for a single conversation.
{ width=”1500” }
This unified interface lets you:
This area is closely aligned with Spring AI’s workflow and agentic guidance. If you want the conceptual background behind these two modes, see Building Effective Agents. For how the Playground assembles the full model context - system prompt, retrieved documents, tools, memory, and options - see Context Engineering.
The screen has three regions:
The lightbulb dropdown on the selector row sets how hard the model thinks on the next turn - Off, Low, Medium, or High. It is dynamic: change it between turns without starting a new chat.
{ width=”166” }
The control is provider-aware and only appears for models that support it. The level maps to each provider’s own knobs - on OpenAI it becomes reasoning_effort; on Ollama it toggles thinking and its depth. Off sends no reasoning option at all, which is the safe default for non-reasoning models. See Context Engineering → Reasoning effort for the mapping.
The tools icon on the selector row opens the tool popover. It is the per-chat switch for what the agent may call:
{ width=”412” }
The document selector beside it enables Vector Database collections for RAG grounding. Both selections are remembered per conversation.
The system prompt frames every turn. You can type one in the settings drawer, or pull a ready-made preset or a variable-driven template from the Prompt Library (clipboard icon in the header). The two are related but distinct - a preset is a complete prompt you apply as-is; a template has `` you fill in first - and each has its own page below.
The Settings cog opens the chat model drawer - the static configuration for the conversation. Editing it and pressing Apply & New Chat starts a fresh conversation with the new settings.
{ width=”732” }
Model - Ollama or Model - OpenAI) and lets you switch the chat model.
{ width=”732” }
The drawer is provider-aware: the option labels, the stop-sequence cap, and the JSON placeholder change with the active provider. The full property mapping lives in Context Engineering → Generation options. Out-of-range entries (a Recent-messages or Max-Tokens below 1, or more stop sequences than the provider allows) are flagged inline; Apply & New Chat is blocked until you fix them, and the drawer stays open with the offending field highlighted.
The download gate applies only when the active provider is Ollama (local models you pull). With Ollama active, the model dropdown marks any model that is not yet pulled with a download indicator.
{ width=”702” }
If you apply an Ollama model that is not installed, the chat does not start on a missing model. A gate dialog appears first; choosing Download pulls it with a live progress bar and a cancel option, and when the download finishes the chat starts on the new model.
For a remote provider such as OpenAI there is nothing to download, so the download indicator, helper text, and gate dialog do not appear.
{ width=”516” }
Each conversation is stamped with the provider that created it. If you open a saved conversation while the app is running a different provider, the conversation is shown read-only with a banner, so its history stays intact but you cannot append turns that the current provider could not have produced.
{ width=”1275” }
Assistant turns render as full Markdown. Code blocks are syntax-highlighted (highlight.js) with a language label and a one-click copy button; math renders with KaTeX both inline ($...$) and as display blocks ($$...$$); and fenced ` ```mermaid ` blocks render as diagrams. Links open in a new tab. Rendering runs once the turn finishes streaming.
{ width=”583” }
Hovering a turn reveals its action bar - six controls left to right:
{ width=”463” }
> quote for a follow-up.Every assistant turn carries its own metrics in the header line - the time, how long the turn took, and the token counts, for example 4.2s · 331 tokens (in 90 · out 241). When a turn reasons or calls tools, the tokens spent in those stages are attributed to their respective panels.
When a turn thinks, calls tools, or retrieves documents, those stages appear as collapsible panels above the answer, each summarizing its duration and token cost:
{ width=”1263” }
The panels collapse once a stage completes so the answer stays front and center; click any panel to reopen it. This is the same visibility the Observability dashboards capture after the fact.
The Export conversation action in the header (and the per-message Export) writes the chat out as Markdown (.md), Plain text (.txt), JSON (.json), or a PDF (print).
{ width=”193” }
When documents are selected, Agentic Chat follows a deterministic retrieval pattern:
When MCP connections are enabled, Agentic Chat can behave like an agent:
When a tool requires approval, Agentic Chat pauses and asks you to approve or decline the call before it runs - the on-device half of Human-in-the-Loop Approval. Declining tells the model the call was not run, so it won’t silently retry.
The intended end-to-end flow is:
This is the place where the rest of the product becomes visible as one coherent system rather than separate screens. The outputs of Tool Studio, MCP Server, and Vector Database all converge here.
Basic chat can work with any supported provider. Tool-enabled agentic behavior works best with models that support function calling and stronger reasoning.
For Ollama-based flows:
The default playground.chat.models list features qwen3.5:2b (default) plus qwen3.5:9b / qwen3.6:35b for stronger tool-oriented reasoning, with gemma4:e4b, gpt-oss:20b, and deepseek-r1:8b as alternatives. See Picking a Model in the Tutorials for the tradeoffs.
The diagram below is included as a conceptual reference to the related agentic systems material in the Spring AI docs.
It is included here to explain how the Playground’s Agentic Chat maps onto the broader Spring AI mental model. In this project, the diagram is not describing a separate product feature hidden behind the UI. It is a conceptual reference for understanding how the Playground combines model reasoning, retrieval, tool execution, and memory in one chat runtime. For the concrete build-up of that context in this project - system prompt, presets and templates, RAG, tools, memory, and per-request options - see Context Engineering.

If you want the fuller conceptual background, start with Building Effective Agents. That reference explains the workflow-versus-agent distinction that this Playground makes concrete through Tool Studio, MCP Server, Vector Database, and Agentic Chat.
This Chat experience facilitates exploration of Spring AI’s workflow and agentic paradigms, empowering developers to build AI systems that combine chain-based RAG workflows with agentic, tool-augmented reasoning. In practice, it follows Spring AI’s Agentic Systems architecture, where grounded retrieval and dynamic tool execution coexist in one context-aware chat runtime.
| Component | Type | Description | Configuration Location | Key Benefits | Model Requirements |
|---|---|---|---|---|---|
| LLM | Core Model | Executes chain-based workflows and performs agentic reasoning for tool usage within a unified chat runtime. | Agentic Chat | Central reasoning and response generation; supports both deterministic workflows and agentic patterns. | Chat models; tool-aware and reasoning-capable models recommended. |
| Retrieval (RAG) | Chain Workflow | Deterministic retrieval and prompt augmentation using vector search over selected documents. | Vector Database | Predictable, controllable knowledge grounding; tunable retrieval parameters such as Top-K and thresholds. | Standard chat plus embedding models. |
| Tools (MCP) | Agentic Execution | Dynamic tool selection and invocation via MCP, driven by LLM reasoning and tool schemas. | Tool Studio, MCP Server | Enables external actions, multi-step reasoning, and adaptive behavior. | Tool-enabled models with function calling and reasoning support. |
| Memory | Shared Agentic State | The full conversation is kept locally; each turn the model sees a configurable trailing window, supplied through MessageChatMemoryAdvisor over an LlmWindowChatMemory decorator. |
Agentic Chat drawer (per-chat Recent messages) + spring.ai.playground.chat.memory-max-messages (default 10); history-max-messages (2000) caps the local store |
Coherent multi-turn dialogue without inflating every request; the recent-context window is tunable per conversation. | Models benefit from a longer window when the task needs more history. |
By leveraging these elements, Agentic Chat goes beyond basic Q&A and becomes a practical environment for building effective, modular AI applications that combine workflow predictability with agentic autonomy.
Agentic Chat is a consumer of three inventories curated elsewhere in the Playground. Use these references to know what’s available before composing a chat session:
${ENV_VAR} requirements per page.SpringAiPlaygroundRagAdvisor short-circuits when no documents are selected, so retrieval is opt-in per conversation).→ Try it: Tutorials - end-to-end flows that combine Tool Studio, MCP Inspector, Vector Database, and Agentic Chat.