Skip to content

ergonomic message constructors#1002

Open
hallerite wants to merge 4 commits intomainfrom
hallerite/constructors
Open

ergonomic message constructors#1002
hallerite wants to merge 4 commits intomainfrom
hallerite/constructors

Conversation

@hallerite
Copy link
Member

@hallerite hallerite commented Mar 10, 2026

Description

Env authors no longer need raw dicts to build messages. UserMessage("describe", Image.from_pil(img)) just works. ToolCall accepts dict arguments, ToolMessage accepts a ToolCall for tool_call_id. Migrated browser env, mcp env, and gym env.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes


Note

Medium Risk
Touches core message/tool typing and serialization paths plus multiple environment integrations; subtle provider formatting or JSON-serialization regressions are possible without broad test coverage.

Overview
Adds typed, ergonomic constructors for building prompts and tool results: vf.SystemMessage/vf.UserMessage now accept either a plain string or multipart content via vf.Text/vf.Image/vf.Audio, vf.ToolCall auto-serializes dict arguments, and vf.ToolMessage accepts a ToolCall for tool_call_id.

Updates ToolEnv/tool utilities and integrations to handle structured tool outputs: tools may return a list of content parts (text/image) that is preserved, and the browser CUA mode, MCPEnv, and GymEnv are migrated off raw message dicts to these constructors. Public exports and docs are expanded to document and expose the new types (vf.Text, vf.Image, vf.Audio) and usage patterns.

Written by Cursor Bugbot for commit 778b1e9. This will update automatically on new commits. Configure here.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Documentation not updated for new user-facing types
    • I documented the new vf.Text/vf.Image/vf.Audio types and ergonomic message constructors in docs/reference.md and added constructor usage guidance in docs/environments.md (with synced generated AGENTS docs).
  • ✅ Fixed: Return type annotation mismatches actual returned types
    • I updated CUA mode return annotations to list[vf.ContentPart] and made is_valid_tool_content_parts accept pydantic content-part models so multipart screenshot payloads are preserved instead of stringified.

Create PR

Or push these changes by commenting:

@cursor push a7c23af157
Preview (a7c23af157)
diff --git a/assets/lab/environments/AGENTS.md b/assets/lab/environments/AGENTS.md
--- a/assets/lab/environments/AGENTS.md
+++ b/assets/lab/environments/AGENTS.md
@@ -121,6 +121,17 @@
 ]

+If you prefer typed constructors over raw dicts, you can build the same prompt with:
+
+```python
+[

  • vf.SystemMessage("You are a helpful math tutor."),
  • vf.UserMessage("What is 2+2?"),
    +]
    +```

+vf.UserMessage / vf.SystemMessage also support multipart content via vf.Text, vf.Image, and vf.Audio parts.
+
If your dataset already has a prompt column, question is ignored. However, if a system_prompt is provided, it will be prepended to existing prompts that don't already start with a system message.

Evaluation Datasets

diff --git a/docs/environments.md b/docs/environments.md
--- a/docs/environments.md
+++ b/docs/environments.md
@@ -115,6 +115,17 @@
]


+If you prefer typed constructors over raw dicts, you can build the same prompt with:
+
+```python
+[
+    vf.SystemMessage("You are a helpful math tutor."),
+    vf.UserMessage("What is 2+2?"),
+]
+```
+
+`vf.UserMessage` / `vf.SystemMessage` also support multipart content via `vf.Text`, `vf.Image`, and `vf.Audio` parts.
+
If your dataset already has a `prompt` column, `question` is ignored. However, if a `system_prompt` is provided, it will be prepended to existing prompts that don't already start with a system message.

### Evaluation Datasets

diff --git a/docs/reference.md b/docs/reference.md
--- a/docs/reference.md
+++ b/docs/reference.md
@@ -21,19 +21,41 @@
### Messages

```python
-Messages = str | list[ChatMessage]
+ContentPart = vf.Text | vf.Image | vf.Audio | dict[str, Any]
+MessageContent = str | list[ContentPart]
+Message = (
+    vf.SystemMessage
+    | vf.UserMessage
+    | vf.AssistantMessage
+    | vf.ToolMessage
+    | vf.TextMessage
+)
+Messages = list[Message]

-The primary message type. Either a plain string (completion mode) or a list of chat messages (chat mode).
+Provider-agnostic message types used across environments and clients.

-### ChatMessage
+### Content Parts (vf.Text, vf.Image, vf.Audio)

-ChatMessage = ChatCompletionMessageParam  # from openai.types.chat
+vf.Text("hello")
+vf.Image("data:image/png;base64,...")
+vf.Audio(data="...", format="wav")

-OpenAI's chat message type with role, content, and optional tool_calls / tool_call_id fields.
+vf.Text, vf.Image, and vf.Audio are aliases for content-part models and can be used directly when building multipart message content.

+### Ergonomic Message Constructors
+
+python +user = vf.UserMessage("Look at this", vf.Image("data:image/png;base64,...")) +system = vf.SystemMessage("You are a helpful assistant.") +tool_call = vf.ToolCall(id="call_0", name="search", arguments={"q": "verifiers"}) +tool_result = vf.ToolMessage(tool_call_id=tool_call, content=[vf.Text("done")]) +
+
+These constructors are optional conveniences for environment authors; raw dict-based messages are still supported.
+

Info

@@ -264,7 +286,7 @@
        dataset: Dataset | None = None,
        eval_dataset: Dataset | None = None,
        system_prompt: str | None = None,
-        few_shot: list[ChatMessage] | None = None,
+        few_shot: list[Message] | None = None,
        parser: Parser | None = None,
        rubric: Rubric | None = None,
        sampling_args: SamplingArgs | None = None,
@@ -433,7 +455,7 @@
        num_train_examples: int = 100,
        num_eval_examples: int = 50,
        seed: int = 0,
-        prompt_renderer: Callable[..., ChatMessages] | None = None,
+        prompt_renderer: Callable[..., Messages] | None = None,
        max_turns: int = -1,
        rubric: Rubric | None = None,
        **kwargs,

diff --git a/environments/AGENTS.md b/environments/AGENTS.md
--- a/environments/AGENTS.md
+++ b/environments/AGENTS.md
@@ -121,6 +121,17 @@
]

+If you prefer typed constructors over raw dicts, you can build the same prompt with:
+
+```python
+[

  • vf.SystemMessage("You are a helpful math tutor."),
  • vf.UserMessage("What is 2+2?"),
    +]
    +```

+vf.UserMessage / vf.SystemMessage also support multipart content via vf.Text, vf.Image, and vf.Audio parts.
+
If your dataset already has a prompt column, question is ignored. However, if a system_prompt is provided, it will be prepended to existing prompts that don't already start with a system message.

Evaluation Datasets

diff --git a/tests/test_tool_env.py b/tests/test_tool_env.py
--- a/tests/test_tool_env.py
+++ b/tests/test_tool_env.py
@@ -34,6 +34,14 @@
]
assert is_valid_tool_content_parts(content) is True

  • def test_valid_pydantic_content_parts(self):

  •    """Valid list with pydantic text/image content parts."""
    
  •    content = [
    
  •        vf.Text("Here's the screenshot"),
    
  •        vf.Image("data:image/png;base64,abc123"),
    
  •    ]
    
  •    assert is_valid_tool_content_parts(content) is True
    
  • def test_empty_list_is_valid(self):
    """Empty list is valid (no invalid parts)."""
    assert is_valid_tool_content_parts([]) is True
    @@ -372,6 +380,33 @@
    ]

    @pytest.mark.asyncio

  • async def test_call_tool_returns_pydantic_content_parts(

  •    self, mock_client, sample_chat_dataset
    
  • ):

  •    """Test that call_tool preserves pydantic text/image content parts."""
    
  •    def pydantic_parts_tool() -> list:
    
  •        return [
    
  •            vf.Text("Here's the screenshot"),
    
  •            vf.Image("data:image/png;base64,abc"),
    
  •        ]
    
  •    env = vf.ToolEnv(
    
  •        tools=[pydantic_parts_tool],
    
  •        client=mock_client,
    
  •        model="test-model",
    
  •        dataset=sample_chat_dataset,
    
  •    )
    
  •    result = await env.call_tool("pydantic_parts_tool", {}, "call_0")
    
  •    assert isinstance(result["content"], list)
    
  •    assert result["content"][0] == {"type": "text", "text": "Here's the screenshot"}
    
  •    assert result["content"][1] == {
    
  •        "type": "image_url",
    
  •        "image_url": {"url": "data:image/png;base64,abc"},
    
  •    }
    
  • @pytest.mark.asyncio
    async def test_call_tool_casts_invalid_list_to_str(
    self, mock_client, sample_chat_dataset
    ):

diff --git a/verifiers/envs/integrations/browser_env/modes/cua_mode.py b/verifiers/envs/integrations/browser_env/modes/cua_mode.py
--- a/verifiers/envs/integrations/browser_env/modes/cua_mode.py
+++ b/verifiers/envs/integrations/browser_env/modes/cua_mode.py
@@ -741,7 +741,9 @@
self.logger.warning(f"Failed to save screenshot: {e}")
return None

  • def _format_response(self, response: dict, session_id: str = "") -> list[dict]:
  • def _format_response(
  •    self, response: dict, session_id: str = ""
    
  • ) -> list[vf.ContentPart]:
    """Format action response as multipart content with text and image."""
    success = response.get("success", False)
    error = response.get("error")
    @@ -763,7 +765,7 @@
    f"Viewport: {viewport.get('width', 0)}x{viewport.get('height', 0)}"
    )
  •    content: list = [vf.Text("\n".join(text_parts))]
    
  •    content: list[vf.ContentPart] = [vf.Text("\n".join(text_parts))]
    
       if screenshot_b64 and session_id:
           self._save_screenshot(session_id, screenshot_b64, url)
    

@@ -1029,7 +1031,7 @@
session_id: str = "",
sandbox_id: str = "",
tool_call_id: str = "",

  • ) -> list[dict]:
  • ) -> list[vf.ContentPart]:
    """Click at coordinates (x, y) on the page."""
    response = await self._execute_action(
    session_id,
    @@ -1046,7 +1048,7 @@
    session_id: str = "",
    sandbox_id: str = "",
    tool_call_id: str = "",
  • ) -> list[dict]:
  • ) -> list[vf.ContentPart]:
    """Double-click at coordinates (x, y) on the page."""
    response = await self._execute_action(
    session_id,
    @@ -1062,7 +1064,7 @@
    session_id: str = "",
    sandbox_id: str = "",
    tool_call_id: str = "",
  • ) -> list[dict]:
  • ) -> list[vf.ContentPart]:
    """Type text into the currently focused element."""
    response = await self._execute_action(
    session_id,
    @@ -1078,7 +1080,7 @@
    session_id: str = "",
    sandbox_id: str = "",
    tool_call_id: str = "",
  • ) -> list[dict]:
  • ) -> list[vf.ContentPart]:
    """Press keyboard key(s)."""
    response = await self._execute_action(
    session_id,
    @@ -1097,7 +1099,7 @@
    session_id: str = "",
    sandbox_id: str = "",
    tool_call_id: str = "",
  • ) -> list[dict]:
  • ) -> list[vf.ContentPart]:
    """Scroll the page at a specific position."""
    response = await self._execute_action(
    session_id,
    @@ -1119,7 +1121,7 @@
    session_id: str = "",
    sandbox_id: str = "",
    tool_call_id: str = "",
  • ) -> list[dict]:
  • ) -> list[vf.ContentPart]:
    """Navigate to a URL."""
    try:
    response = await self._execute_action(
    @@ -1141,7 +1143,7 @@
    session_id: str = "",
    sandbox_id: str = "",
    tool_call_id: str = "",
  • ) -> list[dict]:
  • ) -> list[vf.ContentPart]:
    """Navigate back in browser history."""
    response = await self._execute_action(
    session_id,
    @@ -1156,7 +1158,7 @@
    session_id: str = "",
    sandbox_id: str = "",
    tool_call_id: str = "",
  • ) -> list[dict]:
  • ) -> list[vf.ContentPart]:
    """Navigate forward in browser history."""
    response = await self._execute_action(
    session_id,
    @@ -1172,7 +1174,7 @@
    session_id: str = "",
    sandbox_id: str = "",
    tool_call_id: str = "",
  • ) -> list[dict]:
  • ) -> list[vf.ContentPart]:
    """Wait for a specified amount of time."""
    try:
    response = await self._execute_action(
    @@ -1194,7 +1196,7 @@
    session_id: str = "",
    sandbox_id: str = "",
    tool_call_id: str = "",
  • ) -> list[dict]:
  • ) -> list[vf.ContentPart]:
    """Capture a screenshot of the current page state."""
    response = await self._execute_action(
    session_id,

diff --git a/verifiers/utils/tool_utils.py b/verifiers/utils/tool_utils.py
--- a/verifiers/utils/tool_utils.py
+++ b/verifiers/utils/tool_utils.py
@@ -1,3 +1,4 @@
+from collections.abc import Mapping
from typing import Any

from agents.function_schema import function_schema
@@ -10,14 +11,19 @@
def is_valid_tool_content_parts(value: Any) -> bool:
"""Check if value is a valid list of tool content parts.

  • Valid content parts have a "type" field with value "text" or "image_url".
  • Valid content parts have a "type" field with value "text" or "image_url",
  • and can be either dict-like objects or pydantic models.
    """
    if not isinstance(value, list):
    return False
    for item in value:
  •    if not isinstance(item, dict):
    
  •    if isinstance(item, Mapping):
    
  •        content_type = item.get("type")
    
  •    elif hasattr(item, "model_dump"):
    
  •        content_type = getattr(item, "type", None)
    
  •    else:
           return False
    
  •    if item.get("type") not in VALID_TOOL_CONTENT_PART_TYPES:
    
  •    if content_type not in VALID_TOOL_CONTENT_PART_TYPES:
           return False
    
    return True

</details>
<sub>This Bugbot Autofix run was free. To enable autofix for future PRs, go to the <a href="https://www.cursor.com/dashboard?tab=bugbot">Cursor dashboard</a>.</sub>

</details>

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

@hallerite hallerite requested a review from willccbb March 10, 2026 04:07
@snimu snimu mentioned this pull request Mar 10, 2026
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant