Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: Documentation not updated for new user-facing types
- I documented the new
vf.Text/vf.Image/vf.Audiotypes and ergonomic message constructors indocs/reference.mdand added constructor usage guidance indocs/environments.md(with synced generated AGENTS docs).
- I documented the new
- ✅ Fixed: Return type annotation mismatches actual returned types
- I updated CUA mode return annotations to
list[vf.ContentPart]and madeis_valid_tool_content_partsaccept pydantic content-part models so multipart screenshot payloads are preserved instead of stringified.
- I updated CUA mode return annotations to
Or push these changes by commenting:
@cursor push a7c23af157
Preview (a7c23af157)
diff --git a/assets/lab/environments/AGENTS.md b/assets/lab/environments/AGENTS.md
--- a/assets/lab/environments/AGENTS.md
+++ b/assets/lab/environments/AGENTS.md
@@ -121,6 +121,17 @@
]+If you prefer typed constructors over raw dicts, you can build the same prompt with:
+
+```python
+[
- vf.SystemMessage("You are a helpful math tutor."),
- vf.UserMessage("What is 2+2?"),
+]
+```
+vf.UserMessage / vf.SystemMessage also support multipart content via vf.Text, vf.Image, and vf.Audio parts.
+
If your dataset already has a prompt column, question is ignored. However, if a system_prompt is provided, it will be prepended to existing prompts that don't already start with a system message.
Evaluation Datasets
diff --git a/docs/environments.md b/docs/environments.md
--- a/docs/environments.md
+++ b/docs/environments.md
@@ -115,6 +115,17 @@
]
+If you prefer typed constructors over raw dicts, you can build the same prompt with:
+
+```python
+[
+ vf.SystemMessage("You are a helpful math tutor."),
+ vf.UserMessage("What is 2+2?"),
+]
+```
+
+`vf.UserMessage` / `vf.SystemMessage` also support multipart content via `vf.Text`, `vf.Image`, and `vf.Audio` parts.
+
If your dataset already has a `prompt` column, `question` is ignored. However, if a `system_prompt` is provided, it will be prepended to existing prompts that don't already start with a system message.
### Evaluation Datasets
diff --git a/docs/reference.md b/docs/reference.md
--- a/docs/reference.md
+++ b/docs/reference.md
@@ -21,19 +21,41 @@
### Messages
```python
-Messages = str | list[ChatMessage]
+ContentPart = vf.Text | vf.Image | vf.Audio | dict[str, Any]
+MessageContent = str | list[ContentPart]
+Message = (
+ vf.SystemMessage
+ | vf.UserMessage
+ | vf.AssistantMessage
+ | vf.ToolMessage
+ | vf.TextMessage
+)
+Messages = list[Message]
-The primary message type. Either a plain string (completion mode) or a list of chat messages (chat mode).
+Provider-agnostic message types used across environments and clients.
-### ChatMessage
+### Content Parts (vf.Text, vf.Image, vf.Audio)
-ChatMessage = ChatCompletionMessageParam # from openai.types.chat
+vf.Text("hello")
+vf.Image("data:image/png;base64,...")
+vf.Audio(data="...", format="wav")-OpenAI's chat message type with role, content, and optional tool_calls / tool_call_id fields.
+vf.Text, vf.Image, and vf.Audio are aliases for content-part models and can be used directly when building multipart message content.
+### Ergonomic Message Constructors
+
+python +user = vf.UserMessage("Look at this", vf.Image("data:image/png;base64,...")) +system = vf.SystemMessage("You are a helpful assistant.") +tool_call = vf.ToolCall(id="call_0", name="search", arguments={"q": "verifiers"}) +tool_result = vf.ToolMessage(tool_call_id=tool_call, content=[vf.Text("done")]) +
+
+These constructors are optional conveniences for environment authors; raw dict-based messages are still supported.
+
Info
@@ -264,7 +286,7 @@
dataset: Dataset | None = None,
eval_dataset: Dataset | None = None,
system_prompt: str | None = None,
- few_shot: list[ChatMessage] | None = None,
+ few_shot: list[Message] | None = None,
parser: Parser | None = None,
rubric: Rubric | None = None,
sampling_args: SamplingArgs | None = None,
@@ -433,7 +455,7 @@
num_train_examples: int = 100,
num_eval_examples: int = 50,
seed: int = 0,
- prompt_renderer: Callable[..., ChatMessages] | None = None,
+ prompt_renderer: Callable[..., Messages] | None = None,
max_turns: int = -1,
rubric: Rubric | None = None,
**kwargs,
diff --git a/environments/AGENTS.md b/environments/AGENTS.md
--- a/environments/AGENTS.md
+++ b/environments/AGENTS.md
@@ -121,6 +121,17 @@
]+If you prefer typed constructors over raw dicts, you can build the same prompt with:
+
+```python
+[
- vf.SystemMessage("You are a helpful math tutor."),
- vf.UserMessage("What is 2+2?"),
+]
+```
+vf.UserMessage / vf.SystemMessage also support multipart content via vf.Text, vf.Image, and vf.Audio parts.
+
If your dataset already has a prompt column, question is ignored. However, if a system_prompt is provided, it will be prepended to existing prompts that don't already start with a system message.
Evaluation Datasets
diff --git a/tests/test_tool_env.py b/tests/test_tool_env.py
--- a/tests/test_tool_env.py
+++ b/tests/test_tool_env.py
@@ -34,6 +34,14 @@
]
assert is_valid_tool_content_parts(content) is True
-
def test_valid_pydantic_content_parts(self):
-
"""Valid list with pydantic text/image content parts.""" -
content = [ -
vf.Text("Here's the screenshot"), -
vf.Image("data:image/png;base64,abc123"), -
] -
assert is_valid_tool_content_parts(content) is True -
def test_empty_list_is_valid(self):
"""Empty list is valid (no invalid parts)."""
assert is_valid_tool_content_parts([]) is True
@@ -372,6 +380,33 @@
]@pytest.mark.asyncio
-
async def test_call_tool_returns_pydantic_content_parts(
-
self, mock_client, sample_chat_dataset -
):
-
"""Test that call_tool preserves pydantic text/image content parts.""" -
def pydantic_parts_tool() -> list: -
return [ -
vf.Text("Here's the screenshot"), -
vf.Image("data:image/png;base64,abc"), -
] -
env = vf.ToolEnv( -
tools=[pydantic_parts_tool], -
client=mock_client, -
model="test-model", -
dataset=sample_chat_dataset, -
) -
result = await env.call_tool("pydantic_parts_tool", {}, "call_0") -
assert isinstance(result["content"], list) -
assert result["content"][0] == {"type": "text", "text": "Here's the screenshot"} -
assert result["content"][1] == { -
"type": "image_url", -
"image_url": {"url": "data:image/png;base64,abc"}, -
} -
@pytest.mark.asyncio
async def test_call_tool_casts_invalid_list_to_str(
self, mock_client, sample_chat_dataset
):
diff --git a/verifiers/envs/integrations/browser_env/modes/cua_mode.py b/verifiers/envs/integrations/browser_env/modes/cua_mode.py
--- a/verifiers/envs/integrations/browser_env/modes/cua_mode.py
+++ b/verifiers/envs/integrations/browser_env/modes/cua_mode.py
@@ -741,7 +741,9 @@
self.logger.warning(f"Failed to save screenshot: {e}")
return None
- def _format_response(self, response: dict, session_id: str = "") -> list[dict]:
- def _format_response(
-
self, response: dict, session_id: str = "" - ) -> list[vf.ContentPart]:
"""Format action response as multipart content with text and image."""
success = response.get("success", False)
error = response.get("error")
@@ -763,7 +765,7 @@
f"Viewport: {viewport.get('width', 0)}x{viewport.get('height', 0)}"
)
-
content: list = [vf.Text("\n".join(text_parts))]
-
content: list[vf.ContentPart] = [vf.Text("\n".join(text_parts))] if screenshot_b64 and session_id: self._save_screenshot(session_id, screenshot_b64, url)
@@ -1029,7 +1031,7 @@
session_id: str = "",
sandbox_id: str = "",
tool_call_id: str = "",
- ) -> list[dict]:
- ) -> list[vf.ContentPart]:
"""Click at coordinates (x, y) on the page."""
response = await self._execute_action(
session_id,
@@ -1046,7 +1048,7 @@
session_id: str = "",
sandbox_id: str = "",
tool_call_id: str = "",
- ) -> list[dict]:
- ) -> list[vf.ContentPart]:
"""Double-click at coordinates (x, y) on the page."""
response = await self._execute_action(
session_id,
@@ -1062,7 +1064,7 @@
session_id: str = "",
sandbox_id: str = "",
tool_call_id: str = "",
- ) -> list[dict]:
- ) -> list[vf.ContentPart]:
"""Type text into the currently focused element."""
response = await self._execute_action(
session_id,
@@ -1078,7 +1080,7 @@
session_id: str = "",
sandbox_id: str = "",
tool_call_id: str = "",
- ) -> list[dict]:
- ) -> list[vf.ContentPart]:
"""Press keyboard key(s)."""
response = await self._execute_action(
session_id,
@@ -1097,7 +1099,7 @@
session_id: str = "",
sandbox_id: str = "",
tool_call_id: str = "",
- ) -> list[dict]:
- ) -> list[vf.ContentPart]:
"""Scroll the page at a specific position."""
response = await self._execute_action(
session_id,
@@ -1119,7 +1121,7 @@
session_id: str = "",
sandbox_id: str = "",
tool_call_id: str = "",
- ) -> list[dict]:
- ) -> list[vf.ContentPart]:
"""Navigate to a URL."""
try:
response = await self._execute_action(
@@ -1141,7 +1143,7 @@
session_id: str = "",
sandbox_id: str = "",
tool_call_id: str = "",
- ) -> list[dict]:
- ) -> list[vf.ContentPart]:
"""Navigate back in browser history."""
response = await self._execute_action(
session_id,
@@ -1156,7 +1158,7 @@
session_id: str = "",
sandbox_id: str = "",
tool_call_id: str = "",
- ) -> list[dict]:
- ) -> list[vf.ContentPart]:
"""Navigate forward in browser history."""
response = await self._execute_action(
session_id,
@@ -1172,7 +1174,7 @@
session_id: str = "",
sandbox_id: str = "",
tool_call_id: str = "",
- ) -> list[dict]:
- ) -> list[vf.ContentPart]:
"""Wait for a specified amount of time."""
try:
response = await self._execute_action(
@@ -1194,7 +1196,7 @@
session_id: str = "",
sandbox_id: str = "",
tool_call_id: str = "",
- ) -> list[dict]:
- ) -> list[vf.ContentPart]:
"""Capture a screenshot of the current page state."""
response = await self._execute_action(
session_id,
diff --git a/verifiers/utils/tool_utils.py b/verifiers/utils/tool_utils.py
--- a/verifiers/utils/tool_utils.py
+++ b/verifiers/utils/tool_utils.py
@@ -1,3 +1,4 @@
+from collections.abc import Mapping
from typing import Any
from agents.function_schema import function_schema
@@ -10,14 +11,19 @@
def is_valid_tool_content_parts(value: Any) -> bool:
"""Check if value is a valid list of tool content parts.
- Valid content parts have a "type" field with value "text" or "image_url".
- Valid content parts have a "type" field with value "text" or "image_url",
- and can be either dict-like objects or pydantic models.
"""
if not isinstance(value, list):
return False
for item in value:
-
if not isinstance(item, dict):
-
if isinstance(item, Mapping): -
content_type = item.get("type") -
elif hasattr(item, "model_dump"): -
content_type = getattr(item, "type", None) -
else: return False
-
if item.get("type") not in VALID_TOOL_CONTENT_PART_TYPES:
-
return True
if content_type not in VALID_TOOL_CONTENT_PART_TYPES: return False
</details>
<sub>This Bugbot Autofix run was free. To enable autofix for future PRs, go to the <a href="https://www.cursor.com/dashboard?tab=bugbot">Cursor dashboard</a>.</sub>
</details>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Description
Env authors no longer need raw dicts to build messages. UserMessage("describe", Image.from_pil(img)) just works. ToolCall accepts dict arguments, ToolMessage accepts a ToolCall for tool_call_id. Migrated browser env, mcp env, and gym env.
Type of Change
Testing
uv run pytestlocally.Checklist
Additional Notes
Note
Medium Risk
Touches core message/tool typing and serialization paths plus multiple environment integrations; subtle provider formatting or JSON-serialization regressions are possible without broad test coverage.
Overview
Adds typed, ergonomic constructors for building prompts and tool results:
vf.SystemMessage/vf.UserMessagenow accept either a plain string or multipart content viavf.Text/vf.Image/vf.Audio,vf.ToolCallauto-serializes dictarguments, andvf.ToolMessageaccepts aToolCallfortool_call_id.Updates
ToolEnv/tool utilities and integrations to handle structured tool outputs: tools may return alistof content parts (text/image) that is preserved, and the browser CUA mode,MCPEnv, andGymEnvare migrated off raw message dicts to these constructors. Public exports and docs are expanded to document and expose the new types (vf.Text,vf.Image,vf.Audio) and usage patterns.Written by Cursor Bugbot for commit 778b1e9. This will update automatically on new commits. Configure here.