Skip to content

Conversation

@linamy85
Copy link
Collaborator

@linamy85 linamy85 commented Jan 26, 2026

Change Summary

As requested in another PR for easier result inspection.

This will impact the following benchmark:

  • gemm_multiple_run
  • inference_add
  • inference_rmsnorm
  • inference_silu_mul
  • inference_sigmoid

Sample log

Adding the dtype prefix for aggregated result as followed:

[float4_e2m1fn] Total floating-point ops: 9895604649984, Step Time (median): 16.36, Throughput (median): 605.00 TFLOP / second / device, TotalThroughput (median): 4840.01 TFLOP / second, MFU: 52.45%

The benchmark often seems 'hanging' which is caused by stdout not
flushed until the benchmark is completed.
@linamy85 linamy85 requested a review from junjieqian January 26, 2026 03:05
@junjieqian
Copy link
Collaborator

Thank you very much for the quick change! This is what needed as sometimes users want to collect the aggregated results from the log, while the csv files provide detailed information.

@linamy85 linamy85 merged commit 5564ea5 into AI-Hypercomputer:main Jan 26, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants