Overview

The OpenBrowse browser agent can read, click, navigate, execute code, and connect to external services.

The OpenBrowse agent is an AI-powered browser automation tool that runs in your side panel. It can interact with web pages, manage tabs, remember context across sessions, and connect to external services via MCP.

Press ⌥I (or Alt+I on Windows/Linux) or click the OpenBrowse extension icon to open the agent side panel.

How it works

When you give the agent a task, it uses a loop of thought and action. It reasons about what you want to achieve, looks at the current page (using tools like snapshot), and then takes actions (like clickElement or navigate) to fulfill the request.

Here's an example of what you can ask:

"Find the cheapest full-size mechanical keyboard on this page and add it to my cart."

The agent will:

Analyze the page to find the list of products.
Reason about which product matches your criteria.
Execute a click on the "Add to cart" button.

Capabilities

The agent is equipped with a comprehensive set of Tools, allowing it to:

Browse — Read page content, take screenshots, scroll, navigate.
Interact — Click elements, type text, select tabs.
Execute — Run JavaScript or Python in sandboxed environments.
Read & write files — Operate on a per-conversation virtual Workspace using Read, Write, Edit, Glob, Grep, and LS.
Remember — Persistent memory across conversations (see Memory).
Plan — Create and manage structured to-do lists for complex, multi-step tasks.
Use skills — Invoke installed Skills to apply curated workflows on demand.
Connect — MCP connectors for GitHub, Linear, Slack, and more (see Connectors).

Long conversations & compaction

Once a conversation grows large enough to threaten the model's context window, OpenBrowse automatically compacts the older history. A compaction event is inserted inline as a regular chat message — a synthetic "what did we do so far?" turn followed by a summary — so you keep full visibility into what was condensed.

The full message list stays in the UI; only the LLM sees the compacted view (head dropped, summary substituted, oversized tool outputs pruned). Compaction runs at the next safe step boundary during multi-tool runs, and Stop cancels any in-flight summarization cleanly.

Best practices

Be specific: The more context you provide, the better the agent can fulfill your request.
Use Spaces: The agent is aware of your current Space and its context. If you are in your "Work" space, it will prefer tools and memory associated with that space.
Watch the tools: The agent explains what tools it's calling. You have full visibility into its actions.

How it works

Capabilities

Long conversations & compaction

Best practices

On this page