Overview
Browser control endpoints allow you to directly manipulate the browser environment independently of agent tasks. Use these endpoints to:- Navigate to specific URLs before or during agent execution
- Capture screenshots for debugging, monitoring, or verification
When to Use Browser Control
Navigate Endpoint
Use the navigate endpoint when you need:- Pre-positioning the browser before sending a task
- Direct control over navigation (bypassing agent decision-making)
- Immediate navigation without waiting for agent processing
Screenshot Endpoint
Use the screenshot endpoint when you need:- Visual debugging of agent behavior
- Progress monitoring during long-running tasks
- Verification of page state after operations
- Documentation of agent actions
Comparison with Agent Messages
| Method | When to Use |
|---|---|
| Navigate API | Direct control, immediate navigation |
| Message with start_url | Let agent handle navigation naturally |
| Message with URL in text | Agent decides how to interpret URL |
Best Practices
Screenshots are useful for debugging and verification, but avoid excessive calls as they can be resource-intensive.
For most use cases, providing a
start_url in your message is more flexible than using the navigate endpoint directly.