Skip to content

Conversation

@linamy85
Copy link
Collaborator

@linamy85 linamy85 commented Jan 21, 2026

  • Implement simple baseline using jax.device_put and jax.device_get.
  • Ensure 2D data layout (rows, 128) for memory alignment.

To execute:

kl apply -f Ironwood/guides/host_device/tpu7x-host-device-benchmark.yaml

Sample output:

+ export TPU_VISIBLE_CHIPS=0                                                                                                                                                                                                                                │
│ + TPU_VISIBLE_CHIPS=0                                                                                                                                                                                                                                       │
│ + bash ./Ironwood/scripts/run_host_device_benchmark.sh --config Ironwood/configs/host_device/host_device.yaml                                                                                                                                               │
│ --- Starting Host-Device Transfer Benchmark (H2D/D2H) ---                                                                                                                                                                                                   │
│ ********************************************************                                                                                                                                                                                                    │
│ WARNING: This benchmark is currently a WORK IN PROGRESS                                                                                                                                                                                                     │
│ ********************************************************                                                                                                                                                                                                    │
│                                                                                                                                                                                                                                                             │
│ Configuration:                                                                                                                                                                                                                                              │
│     Interleaved: false                                                                                                                                                                                                                                      │
│                                                                                                                                                                                                                                                             │
│ --- Running Config: Ironwood/configs/host_device/host_device.yaml ---                                                                                                                                                                                       │
│                                                                                                                                                                                                                                                             │
│ ==============================Starting benchmark 'host_device'==============================                                                                                                                                                                │
│                                                                                                                                                                                                                                                             │
│ Running benchmark: host_device with params: {'data_size_mb': 1, 'num_runs': 20, 'trace_dir': '../microbenchmarks/host_device/trace/benchmark_0'}                                                                                                            │
│ Benchmarking (Simple) Transfer with Data Size: 1 MB on 2 devices for 20 iterations                                                                                                                                                                          │
│ Running benchmark: host_device with params: {'data_size_mb': 16, 'num_runs': 20, 'trace_dir': '../microbenchmarks/host_device/trace/benchmark_1'}                                                                                                           │
│ Benchmarking (Simple) Transfer with Data Size: 16 MB on 2 devices for 20 iterations                                                                                                                                                                         │
│ Running benchmark: host_device with params: {'data_size_mb': 128, 'num_runs': 20, 'trace_dir': '../microbenchmarks/host_device/trace/benchmark_2'}                                                                                                          │
│ Benchmarking (Simple) Transfer with Data Size: 128 MB on 2 devices for 20 iterations                                                                                                                                                                        │
│ Running benchmark: host_device with params: {'data_size_mb': 256, 'num_runs': 20, 'trace_dir': '../microbenchmarks/host_device/trace/benchmark_3'}                                                                                                          │
│ Benchmarking (Simple) Transfer with Data Size: 256 MB on 2 devices for 20 iterations                                                                                                                                                                        │
│ Running benchmark: host_device with params: {'data_size_mb': 512, 'num_runs': 20, 'trace_dir': '../microbenchmarks/host_device/trace/benchmark_4'}                                                                                                          │
│ Benchmarking (Simple) Transfer with Data Size: 512 MB on 2 devices for 20 iterations                                                                                                                                                                        │
│ Running benchmark: host_device with params: {'data_size_mb': 1024, 'num_runs': 20, 'trace_dir': '../microbenchmarks/host_device/trace/benchmark_5'}                                                                                                         │
│ Benchmarking (Simple) Transfer with Data Size: 1024 MB on 2 devices for 20 iterations                                                                                                                                                                       │
│ Running benchmark: host_device with params: {'data_size_mb': 2048, 'num_runs': 20, 'trace_dir': '../microbenchmarks/host_device/trace/benchmark_6'}                                                                                                         │
│ Benchmarking (Simple) Transfer with Data Size: 2048 MB on 2 devices for 20 iterations                                                                                                                                                                       │
│ Running benchmark: host_device with params: {'data_size_mb': 4096, 'num_runs': 20, 'trace_dir': '../microbenchmarks/host_device/trace/benchmark_7'}                                                                                                         │
│ Benchmarking (Simple) Transfer with Data Size: 4096 MB on 2 devices for 20 iterations                                                                                                                                                                       │
│ Running benchmark: host_device with params: {'data_size_mb': 8192, 'num_runs': 20, 'trace_dir': '../microbenchmarks/host_device/trace/benchmark_8'}                                                                                                         │
│ Benchmarking (Simple) Transfer with Data Size: 8192 MB on 2 devices for 20 iterations                                                                                                                                                                       │
│ Running benchmark: host_device with params: {'data_size_mb': 16384, 'num_runs': 20, 'trace_dir': '../microbenchmarks/host_device/trace/benchmark_9'}                                                                                                        │
│ Benchmarking (Simple) Transfer with Data Size: 16384 MB on 2 devices for 20 iterations                                                                                                                                                                      │
│ Running benchmark: host_device with params: {'data_size_mb': 32768, 'num_runs': 20, 'trace_dir': '../microbenchmarks/host_device/trace/benchmark_10'}                                                                                                       │
│ Benchmarking (Simple) Transfer with Data Size: 32768 MB on 2 devices for 20 iterations                                                                                                                                                                      │
│ Metrics written to CSV at ../microbenchmarks/host_device/t_host_device_TDZKVGG9P2.tsv.                                                                                                                                                                      │
│ --- Finished Config: Ironwood/configs/host_device/host_device.yaml ---

@linamy85 linamy85 force-pushed the feature/simple-host-device-baseline branch 3 times, most recently from fca5cc3 to ce217c7 Compare January 21, 2026 09:59
chishuen
chishuen previously approved these changes Jan 21, 2026
Copy link
Collaborator

@chishuen chishuen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. Overall LGTM. Left some minor comments.

1. Use 2D dimension to match memory layout
2. Use default device_get and device_put
@linamy85 linamy85 force-pushed the feature/simple-host-device-baseline branch from ce217c7 to b286e69 Compare January 21, 2026 12:10
@chishuen chishuen self-requested a review January 21, 2026 16:42
@chishuen chishuen merged commit e3cf453 into AI-Hypercomputer:main Jan 21, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants