Skip to content

Add Execution Mode for ADK Agent (multi-agent cost + non-real-time UX) #4115

@rafiatha09

Description

@rafiatha09

Is your feature request related to a problem? Please describe.
Yes.

In my experience, multi-agent workflows are heavy to run in real time, and in most of my cases I do not require immediate answers.

A concrete example: I have a “scoring agent” used to prioritize client projects/deals. This agent is expensive because it needs to retrieve and synthesize data from multiple sources (docs, CRM, email, meeting transcripts, and browsing), which means multiple complex tool calls and often multiple agent handoffs. The common expectation is:

“Please assess these deals, and return the results later.”

This becomes a cost and UX problem:

  • Cost: scoring a single deal can be expensive, and we often need to score many deals.
  • UX: users don’t want to wait until the process finishes; they still want to interact with the agent while the job is running.

ADK already supports batch/async execution in the context of offline evaluation, but production use cases frequently need a first-class “non-real-time execution mode” (e.g., “finish this task and email me later”).

Given that Gemini supports offline/batch-style processing, it would be very valuable if ADK could expose an execution mode that allows:

  1. running agent jobs asynchronously (non-blocking)
  2. optionally routing model calls through a batch backend for cost/throughput efficiency
  3. resuming the workflow when the job completes

Describe the solution you'd like

Introduce an explicit Execution Mode for ADK agents and a first-class Job Service for job lifecycle management.

  1. New Agent API: Execution Mode
    Add a config option to Agent (or LlmAgent) such as:
  • execution_mode="realtime" (default)
  • execution_mode="offline" (enqueue job + return immediately)

Example (based on the multi-agent pattern in adk-samples):

root_agent = Agent(
    name="root_agent",
    global_instruction="You are a helpful virtual assistant for Cymbal Auto Insurance.",
    instruction="...",
    sub_agents=[membership_agent, roadside_agent, claims_agent, rewards_agent],
    tools=[membership],
    model="gemini-2.5-flash",
    execution_mode="realtime", #default
)

membership_agent = Agent(
    name="membership_agent",
    model="gemini-2.5-flash",
    description="Registers new members",
    instruction="...",
    tools=[membership],
    execution_mode="offline",   # NEW: run this agent as an offline job
)
  1. Persist status in a shared store (JobService / JobStoreService)
    Introduce a dedicated Job Service abstraction to store and retrieve job records across invocations.

Motivation:
Job state (QUEUED/RUNNING/DONE/FAILED) is operational metadata that changes frequently and needs deterministic reads/writes. This does not map cleanly to “memory” (which is primarily for long-term knowledge retrieval) or to session state alone (which is scoped per session). A dedicated service makes the semantics clear and avoids mixing concerns.

Proposed interface (conceptual):

  • create_job(job_spec) -> job_id
  • get_job(job_id) -> job_record
  • update_job(job_id, patch)
  • store_result(job_id, result_ref or result_payload)
  • load_result(job_id) -> result_payload

Default implementation:

  • InMemoryJobService for local dev
    Pluggable implementations:
  • Redis / SQL / Firestore / etc.

Job record fields (minimum viable):

  • job_id
  • agent_name
  • parent_invocation_id (optional)
  • status: QUEUED | RUNNING | DONE | FAILED
  • created_at / updated_at
  • result_ref (or result payload)
  • error (if failed)
  • optional: retry_count, next_retry_at
  1. Runtime behavior: enqueue offline agent and return control immediately
    When a parent agent delegates to a child agent with execution_mode="offline":
  • The runtime enqueues a job via JobService
  • Returns a structured “job pending” event to the parent flow
  • The root agent can immediately respond to the user with:
    • an acknowledgement (“I started the scoring job…”)
    • the job_id
    • what to do next (“You can continue chatting; ask ‘status <job_id>’ anytime.”)
  1. On subsequent calls, check job status before re-running
    Every time the parent/root tries to invoke the offline agent again (or when the user asks for status), ADK checks JobService:

If RUNNING:

  • Do not re-run.
  • Emit a structured signal/event back to the root agent:
    {type: OFFLINE_JOB_RUNNING, job_id, last_update_at, optional_progress}
  • Root agent can respond:
    “Your scoring job is still running. You can continue chatting; I’ll update you when it’s done.”

If DONE:

  • Load the completed result payload via JobService (or result_ref).
  • Inject it into the parent flow as if it were the child agent’s final response.
  • Continue the workflow normally.

If FAILED:

  • Attach failure details.
  • Allow retry policy (either automatic retry or user-triggered retry).

Describe alternatives you've considered

  1. Status quo: Only use offline evaluation batching
    Offline evaluation is helpful for regression testing and benchmarking, but it does not solve production async job execution patterns like:
    “please finish this task and email me later.”

  2. Custom job orchestration outside ADK
    It’s possible to build a custom queue, persistence layer, and polling logic around ADK, but this becomes duplicated boilerplate and inconsistent across teams.

  3. Store job state in MemoryService or Session state

  • Session state is scoped and not ideal for multi-session job tracking.
  • MemoryService is typically optimized for knowledge retrieval rather than frequent operational status updates.
    A dedicated JobService is clearer, safer, and more extensible.

Additional context
My current production-like use cases that do not require real-time completion, but do require online access to tools/data sources:

  1. AI Scoring Agent (prioritize client deals/projects)
  • pulls from CRM, docs, email, meeting notes, browsing
  • outputs score + rationale
  • expected UX: “assess these deals and return later”
  1. SOW (Scope of Work) document generation
  • pulls from meeting transcripts, emails, requirement docs, client metadata
  • outputs a structured SOW draft
  • expected UX: “generate SOW and notify me when ready”

Metadata

Metadata

Assignees

No one assigned

    Labels

    core[Component] This issue is related to the core interface and implementation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions