Skip to content

Conversation

@Kylejeong2
Copy link
Member

what

WIP but benchmarking browser automation mcp servers on onlinemind2web

`Task: ${task.confirmed_task}`,
"",
"Use the available MCP tools to browse and complete this task.",
"When you are done, briefly summarize what you did.",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should have it give a response to the task too?

idk if we can force structured output but we can add it to prompt to try and get some kind of structured output

ie result steps and reasoning

@shrey150 shrey150 force-pushed the shreypandya/benchmark-mcp branch from 0ebcf5b to 8eae59e Compare November 17, 2025 20:45
@shrey150 shrey150 force-pushed the shreypandya/benchmark-mcp branch from 8c47e7d to c13052c Compare November 18, 2025 00:13
@Kylejeong2
Copy link
Member Author

tabled

@Kylejeong2 Kylejeong2 closed this Nov 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants