Skip to content

Conversation

@linamy85
Copy link
Collaborator

This change propagates dtype from the benchmark function arguments to final reporting results. This impacts benchmarks that were missing such metadata, including:

  • gemm_multiple_run
  • inference_add
  • inference_rmsnorm
  • inference_silu_mul
  • inference_sigmoid

@linamy85 linamy85 force-pushed the fix/propagate-dtype branch from 7f0fbb4 to 7df8f7d Compare January 23, 2026 07:20
Copy link
Collaborator

@chishuen chishuen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@linamy85 linamy85 merged commit 2c847e8 into AI-Hypercomputer:main Jan 23, 2026
2 checks passed
@linamy85 linamy85 deleted the fix/propagate-dtype branch January 23, 2026 14:36
@junjieqian
Copy link
Collaborator

Hi @linamy85 , seems the final report still does not have the data type for gemm_multiple_run. Can you check again? Thank you

@linamy85
Copy link
Collaborator Author

Hi @junjieqian , I can see the dtype result from tsv when running the following kube config. Could you share the yaml that you were using?

apiVersion: v1
kind: Pod
metadata:
  name: microbenchmark
spec:
  restartPolicy: Never
  nodeSelector:
    cloud.google.com/gke-tpu-accelerator: tpu7x
    cloud.google.com/gke-tpu-topology: 2x2x1
  containers:
  - name: tpu-job
    image: python:3.12
    ports:
    - containerPort: 8431
    securityContext:
      privileged: false
    command:
    - bash
    - -c
    - |
      set -ex

      git clone https://github.com/AI-Hypercomputer/accelerator-microbenchmarks.git
      cd accelerator-microbenchmarks
      pip install -r requirements.txt

      python3 Ironwood/src/run_benchmark.py --config=Ironwood/configs/training/gemm_multiple_run.yaml

      sleep 36000

    resources:
      requests:
        google.com/tpu: 4
      limits:
        google.com/tpu: 4

@junjieqian
Copy link
Collaborator

Hi @linamy85 thanks for checking this! We actually did not look into the csv file but only from the stdout logs, which does not include the dtype.
Would you mind adding it to the print log as well?
Thanks

@linamy85
Copy link
Collaborator Author

@junjieqian To make sure we're aligned, would [float4_e2m1fn] prefix at the following logging line help?

[float4_e2m1fn] Total floating-point ops: 9895604649984, Step Time (median): 16.36, Throughput (median): 605.00 TFLOP / second / device, TotalThroughput (median): 4840.01 TFLOP / second, MFU: 52.45%

I feel like for result gathering, it's the easier if we could rely on final tsv file. Though I know it's quite difficult at the moment due to the lack of GCS support.

log for single run

==============================Starting benchmark 'gemm_multiple_run'==============================

Running benchmark: gemm_multiple_run with params: {'m': 16384, 'k': 18432, 'n': 16384, 'num_runs': 100, 'dtype': <class 'jax.numpy.float4_e2m1fn'>, 'trace_dir': '../microbenchmarks/gemm_multiple_run_fp4/benchmark_0'}
Running gemm_multiple_run benchmark 100
[gemm_multiple_run] Running iteration 0 of 100 with float4_e2m1fn_16384x16384x18432...
[gemm_multiple_run] Running iteration 10 of 100 with float4_e2m1fn_16384x16384x18432...
[gemm_multiple_run] Running iteration 20 of 100 with float4_e2m1fn_16384x16384x18432...
[gemm_multiple_run] Running iteration 30 of 100 with float4_e2m1fn_16384x16384x18432...
[gemm_multiple_run] Running iteration 40 of 100 with float4_e2m1fn_16384x16384x18432...
[gemm_multiple_run] Running iteration 50 of 100 with float4_e2m1fn_16384x16384x18432...
[gemm_multiple_run] Running iteration 60 of 100 with float4_e2m1fn_16384x16384x18432...
[gemm_multiple_run] Running iteration 70 of 100 with float4_e2m1fn_16384x16384x18432...
[gemm_multiple_run] Running iteration 80 of 100 with float4_e2m1fn_16384x16384x18432...
[gemm_multiple_run] Running iteration 90 of 100 with float4_e2m1fn_16384x16384x18432...
Unique PIDs: {3, 4, 37, 38, 20, 21, 54, 55}
Collected 100 events from trace for pid 3.
[16.357372149, 16.356966387, 16.356578631, 16.356494598, 16.356966387, 16.356685474, 16.356817527, 16.356793517, 16.356641056, 16.356123649, 16.356306122, 16.356777911, 16.356222089, 16.356584634, 16.35657503, 16.356619448, 16.356470588, 16.356492197, 16.356786315, 16.356092437, 16.356364946, 16.356726291, 16.356442977, 16.356752701, 16.356268908, 16.35607443, 16.356295318, 16.356204082, 16.356129652, 16.355857143, 16.356102041, 16.356626651, 16.355811525, 16.356255702, 16.356326531, 16.356422569, 16.35630012, 16.356361345, 16.356201681, 16.356086435, 16.356452581, 16.357060024, 16.356105642, 16.356382953, 16.35652461, 16.356328932, 16.356165666, 16.355831933, 16.356129652, 16.356297719, 16.356169268, 16.356297719, 16.356229292, 16.35630012, 16.356343337, 16.355992797, 16.356163265, 16.355931573, 16.356142857, 16.356313325, 16.356333733, 16.356342137, 16.356271309, 16.356060024, 16.35622569, 16.356398559, 16.356909964, 16.356506603, 16.356357743, 16.356567827, 16.35609964, 16.356482593, 16.356405762, 16.356771909, 16.356645858, 16.356792317, 16.35629892, 16.356187275, 16.356336134, 16.356256903, 16.356228091, 16.356009604, 16.356192077, 16.356626651, 16.356704682, 16.356114046, 16.356420168, 16.356164466, 16.356243697, 16.35644898, 16.356571429, 16.356236495, 16.356433373, 16.356410564, 16.356590636, 16.356092437, 16.356990396, 16.356142857, 16.356846339, 16.356685474]
The XLA dump is stored in ../microbenchmarks/gemm_multiple_run_fp4/hlo_graphs
Could not find replica_groups in ../microbenchmarks/gemm_multiple_run_fp4/hlo_graphs/gemm_multiple_run_m_16384_k_18432_n_16384_num_runs_100_dtype_float4.after_optimizations.txt.
[float4_e2m1fn] Total floating-point ops: 9895604649984, Step Time (median): 16.36, Throughput (median): 605.00 TFLOP / second / device, TotalThroughput (median): 4840.01 TFLOP / second, MFU: 52.45%
Writing metrics to JSONL file: ../microbenchmarks/gemm_multiple_run_fp4/metrics_report.jsonl
Metrics written to CSV at ../microbenchmarks/gemm_multiple_run_fp4/t_gemm_multiple_run_U3GEQ267RK.tsv.

linamy85 added a commit that referenced this pull request Jan 26, 2026
As requested in [another PR](#84 (comment)) for easier result inspection.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants