Skip to content

Add per-request cost support (obj_cost) and cost-miss reporting#308

Closed
1a1a11a wants to merge 1 commit intodevelopfrom
codex/add-cost-support-with-object-size
Closed

Add per-request cost support (obj_cost) and cost-miss reporting#308
1a1a11a wants to merge 1 commit intodevelopfrom
codex/add-cost-support-with-object-size

Conversation

@1a1a11a
Copy link
Owner

@1a1a11a 1a1a11a commented Mar 15, 2026

Motivation

  • Provide first-class per-request "cost" in traces so simulations can report cost-weighted miss ratios in addition to object counts and bytes.
  • Allow traces to carry a dedicated cost column (CSV/binary) without breaking existing traces that only supply object size.

Description

  • Added obj_cost to request_t with default initialization to 1 and included cost in print_request debug output.
  • Extended reader_init_param_t with obj_cost_field and added parsing of obj-cost-col / cost-col in parse_reader_params to configure a cost column from CSV parameters.
  • CSV and binary readers now parse an optional cost field (obj_cost) when present and wire the cost-field index/offset through readerInternal structures; when not configured the reader sets req->obj_cost = req->obj_size as a backward-compatible fallback.
  • read_one_req initializes obj_cost and ensures ignore_obj_size behavior still applies; verbose logging now prints size and cost for each read request.
  • cachesim simulation logic collects req_cost and miss_cost, computes cost_miss_ratio, and prints it alongside request miss ratio and byte miss ratio in the output.
  • Updated README.md examples and plotting script invocation to demonstrate obj-cost-col usage.

Testing

  • Attempted an automated CMake configuration (cmake -S . -B build) which failed due to a missing system dependency glib-2.0, so no full build/test ran in this environment (failure is environmental, not code-related).
  • No unit or integration tests were executed in the workspace; changes were validated via static edits and local file diffs and committed.

Codex Task

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the cache simulation framework by integrating a first-class concept of 'request cost' (obj_cost). This allows for more sophisticated analysis by enabling the calculation and reporting of cost-weighted miss ratios, which can provide deeper insights into cache performance than traditional object count or byte-based metrics. The changes ensure that trace readers can parse this new cost information from various formats and that the simulation engine correctly processes and reports it, all while maintaining compatibility with existing traces.

Highlights

  • Per-request Cost Support: Introduced a new obj_cost field to the request_t structure, allowing each request to carry an associated cost, defaulting to 1.
  • Configurable Cost Column: Added support for parsing obj-cost-col and cost-col parameters in reader_init_param_t for both CSV and binary trace readers, enabling traces to specify a dedicated cost column.
  • Cost-Weighted Miss Ratio Reporting: Modified the cachesim simulation logic to collect total request cost (req_cost) and total miss cost (miss_cost), and to compute and report the cost_miss_ratio alongside existing miss ratios.
  • Backward Compatibility and Logging: Ensured backward compatibility by setting req->obj_cost = req->obj_size as a fallback if no obj_cost_field is configured, and updated verbose logging to include the obj_cost for each read request.
  • Documentation and Examples: Updated README.md examples and the plotting script invocation to demonstrate how to use the new obj-cost-col parameter.
Changelog
  • README.md
    • Updated example cachesim command to include obj-cost-col=6.
    • Modified plot_mrc_size.py command to include obj-cost-col=4.
  • libCacheSim/bin/cachesim/sim.c
    • Added req_cost and miss_cost variables to track total and missed request costs.
    • Accumulated req_cost and miss_cost during the simulation loop.
    • Calculated and printed cost_miss_ratio in the final simulation output.
  • libCacheSim/bin/cli_reader_utils.c
    • Implemented parsing logic for obj-cost-col and cost-col parameters from CLI arguments.
  • libCacheSim/include/libCacheSim/reader.h
    • Added obj_cost_field to the reader_init_param_t structure to configure the cost column index.
  • libCacheSim/include/libCacheSim/request.h
    • Introduced obj_cost (int64_t) to the request_t structure.
    • Initialized obj_cost to 1 in the new_request function.
    • Included obj_cost in the print_request debug output.
  • libCacheSim/traceReader/generalReader/binary.c
    • Implemented setup for obj_cost_field_idx, obj_cost_format, and obj_cost_offset for binary traces.
    • Added debug output for binary reader obj_cost_field parameters.
    • Included logic to read obj_cost from binary trace files based on configured parameters.
  • libCacheSim/traceReader/generalReader/csv.c
    • Added logic to parse obj_cost from CSV fields during trace processing.
    • Mapped obj_cost_field from reader_init_param_t to internal CSV reader parameters.
  • libCacheSim/traceReader/reader.c
    • Initialized req->obj_cost to 1 for new requests.
    • Implemented a fallback mechanism to set req->obj_cost = req->obj_size if obj_cost_field is not specified.
    • Updated verbose logging to include obj_cost when reading a request.
  • libCacheSim/traceReader/readerInternal.h
    • Extended csvReaderInternalParams with obj_cost_field_idx.
    • Added obj_cost_offset, obj_cost_field_idx, and obj_cost_format to binaryReaderInternalParams.
Activity
  • Attempted CMake configuration (cmake -S . -B build) which failed due to a missing system dependency glib-2.0.
  • No unit or integration tests were executed in the workspace.
  • Changes were validated via static edits and local file diffs.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for per-request cost (obj_cost) and cost-based miss ratio reporting. The changes are comprehensive, touching data structures, trace readers for both CSV and binary formats, and the main simulation logic. The implementation correctly adds the obj_cost field and integrates it into the simulation and reporting.

However, the pull request description states that no tests were run due to an environmental issue. Given the breadth of the changes, this introduces a significant risk of regressions or bugs. It is highly recommended to resolve the environment issue and run the existing test suite, as well as add new tests for the cost functionality, before merging.

I have also identified a couple of areas with code duplication that could be refactored to improve maintainability.

Comment on lines 90 to 107
if (!ignore_obj_size) {
snprintf(output_str, 1024,
"%s %s cache size %8s, %16lu req, miss ratio %.4lf, throughput "
"%s %s cache size %8s, %16lu req, miss ratio %.4lf, byte miss ratio %.4lf, cost miss ratio %.4lf, throughput "
"%.2lf MQPS\n",
reader->trace_path, detailed_cache_name, size_str,
(unsigned long)req_cnt, (double)miss_cnt / (double)req_cnt,
byte_miss_ratio, cost_miss_ratio,
(double)req_cnt / 1000000.0 / runtime);
} else {
snprintf(output_str, 1024,
"%s %s cache size %8lld, %16lu req, miss ratio %.4lf, throughput "
"%s %s cache size %8lld, %16lu req, miss ratio %.4lf, byte miss ratio %.4lf, cost miss ratio %.4lf, throughput "
"%.2lf MQPS\n",
reader->trace_path, detailed_cache_name,
(long long)cache->cache_size, (unsigned long)req_cnt,
(double)miss_cnt / (double)req_cnt,
byte_miss_ratio, cost_miss_ratio,
(double)req_cnt / 1000000.0 / runtime);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The snprintf calls inside this if/else block are largely duplicated. This can be refactored to reduce code duplication and improve maintainability by building the output string in parts.

  int n;
  if (!ignore_obj_size) {
    n = snprintf(output_str, 1024, "%s %s cache size %8s",
             reader->trace_path, detailed_cache_name, size_str);
  } else {
    n = snprintf(output_str, 1024, "%s %s cache size %8lld",
             reader->trace_path, detailed_cache_name,
             (long long)cache->cache_size);
  }
  snprintf(output_str + n, 1024 - n,
           ", %16lu req, miss ratio %.4lf, byte miss ratio %.4lf, cost miss ratio %.4lf, throughput "
           "%.2lf MQPS\n",
           (unsigned long)req_cnt, (double)miss_cnt / (double)req_cnt,
           byte_miss_ratio, cost_miss_ratio,
           (double)req_cnt / 1000000.0 / runtime);

Comment on lines +240 to +244
} else if (csv_params->curr_field_idx == csv_params->obj_cost_field_idx) {
req->obj_cost = (int64_t)strtoll((char *)s, &end, 0);
if (req->obj_cost == 0 && end == s) {
WARN("csvReader obj_cost is not a number: \"%s\"\n", (char *)s);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This logic for parsing a 64-bit integer from a CSV field is duplicated from the obj_size parsing logic on lines 235-239. To improve maintainability and reduce code duplication, consider extracting this logic into a static helper function that can be called for both obj_size and obj_cost.

@1a1a11a 1a1a11a closed this Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant