Skip to content

callstackincubator/agent-device

Repository files navigation

agent-device

agent-device

CLI to control iOS and Android devices for AI agents influenced by Vercel’s agent-browser.

The project is in early development and considered experimental. Pull requests are welcome!

Features

  • Platforms: iOS (simulator + physical device core automation) and Android (emulator + device).
  • Core commands: open, back, home, app-switcher, press, long-press, focus, type, fill, scroll, scrollintoview, wait, alert, screenshot, close, reinstall.
  • Inspection commands: snapshot (accessibility tree), appstate, apps, devices.
  • Device tooling: adb (Android), simctl/devicectl (iOS via Xcode).
  • Minimal dependencies; TypeScript executed directly on Node 22+ (no build step).

Install

npm install -g agent-device

Or use it without installing:

npx agent-device open SampleApp

Quick Start

Use refs for agent-driven exploration and normal automation flows.

agent-device open Contacts --platform ios # creates session on iOS Simulator
agent-device snapshot
agent-device click @e5
agent-device fill @e6 "John"
agent-device fill @e7 "Doe"
agent-device click @e3
agent-device close

CLI Usage

agent-device <command> [args] [--json]

Basic flow:

agent-device open SampleApp
agent-device snapshot
agent-device click @e7
agent-device fill @e8 "hello"
agent-device close SampleApp

Debug flow:

agent-device trace start
agent-device snapshot -s "Sample App"
agent-device find label "Wi-Fi" click
agent-device trace stop ./trace.log

Coordinates:

  • All coordinate-based commands (press, long-press, swipe, focus, fill) use device coordinates with origin at top-left.
  • X increases to the right, Y increases downward.

Gesture series examples:

agent-device press 300 500 --count 12 --interval-ms 45
agent-device press 300 500 --count 6 --hold-ms 120 --interval-ms 30 --jitter-px 2
agent-device swipe 540 1500 540 500 120 --count 8 --pause-ms 30 --pattern ping-pong

Command Index

  • boot, open, close, reinstall, home, back, app-switcher
  • snapshot, find, get
  • click, focus, type, fill, press, long-press, swipe, scroll, scrollintoview, pinch, is
  • alert, wait, screenshot
  • trace start, trace stop
  • settings wifi|airplane|location on|off
  • appstate, apps, devices, session list

iOS Snapshots

Notes:

  • iOS snapshots use XCTest on simulators and physical devices.
  • Scope snapshots with -s "<label>" or -s @ref.
  • If XCTest returns 0 nodes (e.g., foreground app changed), agent-device fails explicitly.

Flags:

  • --version, -V print version and exit
  • --platform ios|android
  • --device <name>
  • --udid <udid> (iOS)
  • --serial <serial> (Android)
  • --activity <component> (Android app launch only; package/Activity or package/.Activity; not for URL opens)
  • --session <name>
  • --count <n> repeat count for press/swipe
  • --interval-ms <ms> delay between press iterations
  • --hold-ms <ms> hold duration per press iteration
  • --jitter-px <n> deterministic coordinate jitter for press
  • --pause-ms <ms> delay between swipe iterations
  • --pattern one-way|ping-pong repeat pattern for swipe
  • --verbose for daemon and runner logs
  • --json for structured output

Pinch:

  • pinch is supported on iOS simulators.
  • On Android, pinch currently returns UNSUPPORTED_OPERATION in the adb backend.

Swipe timing:

  • swipe accepts optional durationMs (default 250, range 16..10000).
  • Android uses requested swipe duration directly.
  • iOS uses a safe normalized duration to avoid long-press side effects.

Skills

Install the automation skills listed in SKILL.md.

npx skills add https://github.com/callstackincubator/agent-device --skill agent-device

Sessions:

  • open starts a session. Without args boots/activates the target device/simulator without launching an app.
  • All interaction commands require an open session.
  • If a session is already open, open <app|url> switches the active app or opens a deep link URL.
  • close stops the session and releases device resources. Pass an app to close it explicitly, or omit to just close the session.
  • Use --session <name> to manage multiple sessions.
  • Session scripts are written to ~/.agent-device/sessions/<session>-<timestamp>.ad when recording is enabled with --save-script.
  • --save-script accepts an optional path: --save-script ./workflows/my-flow.ad.
  • For ambiguous bare values, use an explicit form: --save-script=workflow.ad or a path-like value such as ./workflow.ad.
  • Deterministic replay is .ad-based; use replay --update (-u) to update selector drift and rewrite the replay file in place.
  • On iOS, appstate is session-scoped and requires an active session on the target device.

Navigation helpers:

  • boot --platform ios|android ensures the target is ready without launching an app.
  • Use boot mainly when starting a new session and open fails because no booted simulator/emulator is available.
  • open [app|url] [url] already boots/activates the selected target when needed.
  • reinstall <app> <path> uninstalls and installs the app binary in one command (Android + iOS simulator).
  • reinstall accepts package/bundle id style app names and supports ~ in paths.

Deep links:

  • open <url> supports deep links with scheme://....
  • open <app> <url> opens a deep link on iOS.
  • Android opens deep links via VIEW intent.
  • iOS simulator opens deep links via simctl openurl.
  • iOS device opens deep links via devicectl --payload-url.
  • On iOS devices, http(s):// URLs open in Safari when no app is active. Custom scheme URLs (myapp://) require an active app in the session.
  • --activity cannot be combined with URL opens.
agent-device open "myapp://home" --platform android
agent-device open "https://example.com" --platform ios          # open link in web browser
agent-device open MyApp "myapp://screen/to" --platform ios      # open deep link to MyApp

Find (semantic):

  • find <text> <action> [value] finds by any text (label/value/identifier) using a scoped snapshot.
  • find text|label|value|role|id <value> <action> [value] for specific locators.
  • Actions: click (default), fill, type, focus, get text, get attrs, wait [timeout], exists.

Assertions:

  • is predicates: visible, hidden, exists, editable, selected, text.
  • is text uses exact equality.

Replay update:

  • replay <path> runs deterministic replay from .ad scripts.
  • replay -u <path> attempts selector updates on failures and atomically rewrites the same file.
  • Refs are the default/core mechanism for interactive agent flows.
  • Update targets: click, fill, get, is, wait.
  • Selector matching is a replay-update internal: replay parses .ad lines into actions, tries them, snapshots on failure, resolves a better selector, then rewrites that failing line.

Update examples:

# Before (stale selector)
click "id=\"old_continue\" || label=\"Continue\""

# After replay -u (rewritten in place)
click "id=\"auth_continue\" || label=\"Continue\""
# Before (ref-based action from discovery)
snapshot -i -c -s "Continue"
click @e13 "Continue"

# After replay -u (upgraded to selector-based action)
snapshot -i -c -s "Continue"
click "id=\"auth_continue\" || label=\"Continue\""

Android fill reliability:

  • fill clears the current value, then enters text.
  • type enters text into the focused field without clearing.
  • fill now verifies the entered value on Android.
  • If value does not match, agent-device clears the field and retries once with slower typing.
  • This reduces IME-related character swaps on long strings (e.g. emails and IDs).

Settings helpers:

  • settings wifi on|off
  • settings airplane on|off
  • settings location on|off (iOS uses per-app permission for the current session app) Note: iOS supports these only on simulators. iOS wifi/airplane toggles status bar indicators, not actual network state. Airplane off clears status bar overrides.

App state:

  • appstate shows the foreground app/activity (Android).
  • On iOS, appstate returns the currently tracked session app (source: session) and requires an active session on the selected device.
  • apps includes default/system apps by default (use --user-installed to filter).

Debug

  • agent-device trace start
  • agent-device trace stop ./trace.log
  • The trace log includes snapshot logs and XCTest runner logs for the session.
  • Built-in retries cover transient runner connection failures and Android UI dumps.
  • For snapshot issues (missing elements), compare with --raw flag for unaltered output and scope with -s "<label>".
  • If startup fails with stale metadata hints, remove stale ~/.agent-device/daemon.json / ~/.agent-device/daemon.lock and retry.

Boot diagnostics:

  • Boot failures include normalized reason codes in error.details.reason (JSON mode) and verbose logs.
  • Reason codes: IOS_BOOT_TIMEOUT, IOS_RUNNER_CONNECT_TIMEOUT, ANDROID_BOOT_TIMEOUT, ADB_TRANSPORT_UNAVAILABLE, CI_RESOURCE_STARVATION_SUSPECTED, BOOT_COMMAND_FAILED, UNKNOWN.
  • Android boot waits fail fast for permission/tooling issues and do not always collapse into timeout errors.
  • Use agent-device boot --platform ios|android when starting a new session only if open cannot find/connect to an available target.
  • Set AGENT_DEVICE_RETRY_LOGS=1 to print structured retry telemetry (attempt, phase, delay, elapsed/remaining deadline, reason).

App resolution

  • Bundle/package identifiers are accepted directly (e.g., com.apple.Preferences).
  • Human-readable names are resolved when possible (e.g., Settings).
  • Built-in aliases include Settings for both platforms.

iOS notes

  • Core runner commands: snapshot, wait, click, fill, get, is, find, press, long-press, focus, type, scroll, scrollintoview, back, home, app-switcher.
  • Simulator-only commands: alert, pinch, record, reinstall, settings.
  • iOS device runs require valid signing/provisioning (Automatic Signing recommended). Optional overrides: AGENT_DEVICE_IOS_TEAM_ID, AGENT_DEVICE_IOS_SIGNING_IDENTITY, AGENT_DEVICE_IOS_PROVISIONING_PROFILE.

Testing

pnpm test

Useful local checks:

pnpm typecheck
pnpm test:unit
pnpm test:smoke

Build

pnpm build

Environment selectors:

  • ANDROID_DEVICE=Pixel_9_Pro_XL or ANDROID_SERIAL=emulator-5554
  • IOS_DEVICE="iPhone 17 Pro" or IOS_UDID=<udid>
  • AGENT_DEVICE_IOS_BOOT_TIMEOUT_MS=<ms> to adjust iOS simulator boot timeout (default: 120000, minimum: 5000).
  • AGENT_DEVICE_DAEMON_TIMEOUT_MS=<ms> to override daemon request timeout (default 90000). Increase for slow physical-device setup (for example 120000).
  • AGENT_DEVICE_IOS_TEAM_ID=<team-id> optional Team ID override for iOS device runner signing.
  • AGENT_DEVICE_IOS_SIGNING_IDENTITY=<identity> optional signing identity override.
  • AGENT_DEVICE_IOS_PROVISIONING_PROFILE=<profile> optional provisioning profile specifier for iOS device runner signing.
  • AGENT_DEVICE_IOS_RUNNER_DERIVED_PATH=<path> optional override for iOS runner derived data root. By default, simulator uses ~/.agent-device/ios-runner/derived and physical device uses ~/.agent-device/ios-runner/derived/device. If you set this override, use separate paths per kind to avoid simulator/device artifact collisions.
  • AGENT_DEVICE_IOS_CLEAN_DERIVED=1 rebuild iOS runner artifacts from scratch. When AGENT_DEVICE_IOS_RUNNER_DERIVED_PATH is set, cleanup is blocked by default; set AGENT_DEVICE_IOS_ALLOW_OVERRIDE_DERIVED_CLEAN=1 only for trusted custom paths.

Test screenshots are written to:

  • test/screenshots/android-settings.png
  • test/screenshots/ios-settings.png

Contributing

See CONTRIBUTING.md.

Made at Callstack

agent-device is an open source project and will always remain free to use. Callstack is a group of React and React Native geeks. Contact us at hello@callstack.com if you need any help with these technologies or just want to say hi.