Skip to main content

Overview

Browser control endpoints allow you to directly manipulate the browser environment independently of agent tasks. Use these endpoints to:
  • Navigate to specific URLs before or during agent execution
  • Capture screenshots for debugging, monitoring, or verification

When to Use Browser Control

Use the navigate endpoint when you need:
  • Pre-positioning the browser before sending a task
  • Direct control over navigation (bypassing agent decision-making)
  • Immediate navigation without waiting for agent processing

Screenshot Endpoint

Use the screenshot endpoint when you need:
  • Visual debugging of agent behavior
  • Progress monitoring during long-running tasks
  • Verification of page state after operations
  • Documentation of agent actions

Comparison with Agent Messages

MethodWhen to Use
Navigate APIDirect control, immediate navigation
Message with start_urlLet agent handle navigation naturally
Message with URL in textAgent decides how to interpret URL

Best Practices

Use navigate before sending a message to ensure the agent starts on the right page.
Screenshots are useful for debugging and verification, but avoid excessive calls as they can be resource-intensive.
The navigate endpoint changes browser state immediately. Use carefully to avoid disrupting agent tasks.
For most use cases, providing a start_url in your message is more flexible than using the navigate endpoint directly.