Skip to content

Conversation

@FBumann
Copy link
Member

@FBumann FBumann commented Jan 16, 2026

Description

Brief description of the changes in this PR.

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Code refactoring

Related Issues

Closes #(issue number)

Testing

  • I have tested my changes
  • Existing tests still pass

Checklist

  • My code follows the project style
  • I have updated documentation if needed
  • I have added tests for new functionality (if applicable)

FBumann and others added 2 commits January 16, 2026 10:33
- Add _scalar_safe_isel_drop() to modeling.py for selecting from
  potentially reduced arrays (handles both scalar and array cases)
- Update Storage validation to use _scalar_safe_isel for
  relative_minimum/maximum_charge_state access at time=0
- Update StorageModel._relative_charge_state_bounds to handle
  reduced arrays properly with dimension checks

These wrappers prepare the codebase for future optimization where
_expand_reduced_dims() could be removed from from_dataset(),
keeping data compact throughout the system.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Major change: DataConverter.to_dataarray() now validates dimensions
instead of broadcasting to full target dimensions. This keeps data
compact (scalars stay scalar, 1D arrays stay 1D) and lets linopy
handle broadcasting at variable creation time.

Changes:
- core.py: Replace _broadcast_dataarray_to_target_specification with
  _validate_dataarray_dims in to_dataarray() method
- components.py: Fix _relative_charge_state_bounds to handle scalar
  inputs that have no time dimension (expand to timesteps_extra)
- conftest.py: Add assert_dims_compatible helper for tests
- test_*.py: Update dimension assertions to use subset checking

Memory impact (8760 timesteps × 20 periods):
- Before: 24.1 MB dataset size
- After: 137.1 KB dataset size

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 16, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

FBumann and others added 5 commits January 16, 2026 10:56
Since DataConverter no longer broadcasts, these optimizations are redundant:
- Remove _reduce_constant_dims() and _expand_reduced_dims() from io.py
- Remove reduce_constants parameter from to_dataset()
- Remove _expand_reduced_dims() call from from_dataset()

Also ensure Dataset always has model coordinates (time, period, scenario,
cluster) even if no data variable uses them. This is important for Dataset
validity and downstream operations.

Update benchmark_io.py to reflect the simplified API.

Memory footprint now:
- 24 timesteps: 568 bytes (23/24 vars are scalar)
- 168 timesteps: 2.8 KB
- 8760 timesteps × 20 periods: ~137 KB (vs 24 MB with old broadcasting)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…d_arrays optimization from the codebase. Here's a summary of the changes:

  Files modified:

  1. flixopt/io.py:
    - Removed COLLAPSED_VAR_PREFIX constant
    - Removed _collapse_constant_arrays() function (~45 lines)
    - Removed _expand_collapsed_arrays() function (~40 lines)
    - Removed collapse_constants parameter from save_dataset_to_netcdf()
    - Removed expansion logic from load_dataset_from_netcdf()
  2. flixopt/flow_system.py:
    - Removed import and call to _expand_collapsed_arrays in from_dataset()
  3. benchmark_io.py:
    - Simplified benchmark_netcdf() to only benchmark single save/load (no more comparison)
    - Removed collapse_constants parameter from roundtrip function

  Rationale: Since data is now kept in minimal form throughout the system (scalars stay scalars, 1D arrays stay 1D), there's no need for the extra collapsing/expanding layer when saving to NetCDF. The file sizes are naturally small because the data isn't expanded to full dimensions.

  Test results:
  - All 10 IO tests pass
  - All 4 integration tests pass
  - Benchmark runs successfully
  1. _dataset_resample handling all-scalar data (transform_accessor.py)

  When all data variables are scalars (no time dimension), the resample function now:
  - Creates a dummy variable to resample the time coordinate
  - Preserves all original scalar data variables
  - Preserves all non-time coordinates (period, scenario, cluster)

  2. _scalar_safe_reduce helper (modeling.py)

  Added a new helper function that safely performs aggregation operations (mean/sum/etc) over a dimension:
  - Returns reduced data if the dimension exists
  - Returns data unchanged if the dimension doesn't exist (scalar data)

  3. Updated Storage intercluster linking (components.py)

  Used _scalar_safe_reduce for:
  - relative_loss_per_hour.mean('time')
  - timestep_duration.mean('time')

  4. Updated clustering expansion (transform_accessor.py)

  Used _scalar_safe_reduce for relative_loss_per_hour.mean('time')

  5. Fixed dimension order in expand_data (clustering/base.py)

  When expanding clustered data back to original timesteps:
  - Added transpose to ensure (cluster, time) dimension order before numpy indexing
  - Fixes IndexError when dimensions were in different order
  1. flixopt/structure.py: Added dimension transpose in _ensure_coords() to ensure correct dimension order when data already has all dims but in wrong order
  2. tests/test_io_conversion.py: Updated test_returns_same_object → test_returns_equivalent_dataset since convert_old_dataset now creates a new dataset when reducing constants
  3. tests/deprecated/test_flow.py: Updated 6 assertions to expect minimal dimensions ('time',) instead of broadcasting to all model coords
  4. tests/deprecated/test_component.py: Updated 2 assertions to expect minimal dimensions ('time',)
  5. tests/deprecated/test_linear_converter.py: Updated 1 assertion to expect minimal dimensions ('time',)

  The key change is that data now stays in minimal form - a 1D time-varying array stays as ('time',) dimensions rather than being pre-broadcast to ('time', 'scenario') or other full model dimensions. Broadcasting happens at the linopy interface layer in FlowSystemModel.add_variables() when needed.
@FBumann FBumann marked this pull request as ready for review January 16, 2026 11:01
@FBumann FBumann merged commit 72b437c into feature/speedup-io Jan 16, 2026
8 checks passed
FBumann added a commit that referenced this pull request Jan 16, 2026
* Add collapse to IO

* Temp

* 1. Added dimension reduction functions in io.py:
    - _reduce_constant_dims(): Reduces dimensions where all values are constant
    - _expand_reduced_dims(): Restores original dimensions with correct order
  2. Added reduce_constants parameter to FlowSystem.to_dataset():
    - Default: False (opt-in)
    - When True: Reduces constant dimensions for memory efficiency
  3. Updated FlowSystem.from_dataset():
    - Expands both collapsed arrays (from NetCDF) and reduced dimensions
  4. Key fixes:
    - Store original dimension order in attrs to preserve (cluster, time) vs (time, cluster) ordering
    - Skip solution variables in reduction (they're prefixed with solution|)

  The optimization is opt-in via to_dataset(reduce_constants=True). For file storage, save_dataset_to_netcdf still collapses constants to scalars by default.

* Set reduce_constants=True by default in to_dataset()

Benchmarks show 99%+ reduction in memory and file size for multi-period
models, with faster IO speeds. The optimization is now enabled by default.

Co-Authored-By: Claude Opus 4.5 <[email protected]>

* Feature/speedup io+internals (#576)

* Add safe isel wrappers for dimension-independent operations

- Add _scalar_safe_isel_drop() to modeling.py for selecting from
  potentially reduced arrays (handles both scalar and array cases)
- Update Storage validation to use _scalar_safe_isel for
  relative_minimum/maximum_charge_state access at time=0
- Update StorageModel._relative_charge_state_bounds to handle
  reduced arrays properly with dimension checks

These wrappers prepare the codebase for future optimization where
_expand_reduced_dims() could be removed from from_dataset(),
keeping data compact throughout the system.

Co-Authored-By: Claude Opus 4.5 <[email protected]>

* Remove pre-broadcasting from DataConverter - data stays minimal

Major change: DataConverter.to_dataarray() now validates dimensions
instead of broadcasting to full target dimensions. This keeps data
compact (scalars stay scalar, 1D arrays stay 1D) and lets linopy
handle broadcasting at variable creation time.

Changes:
- core.py: Replace _broadcast_dataarray_to_target_specification with
  _validate_dataarray_dims in to_dataarray() method
- components.py: Fix _relative_charge_state_bounds to handle scalar
  inputs that have no time dimension (expand to timesteps_extra)
- conftest.py: Add assert_dims_compatible helper for tests
- test_*.py: Update dimension assertions to use subset checking

Memory impact (8760 timesteps × 20 periods):
- Before: 24.1 MB dataset size
- After: 137.1 KB dataset size

Co-Authored-By: Claude Opus 4.5 <[email protected]>

* Remove redundant dimension reduction - data is minimal from start

Since DataConverter no longer broadcasts, these optimizations are redundant:
- Remove _reduce_constant_dims() and _expand_reduced_dims() from io.py
- Remove reduce_constants parameter from to_dataset()
- Remove _expand_reduced_dims() call from from_dataset()

Also ensure Dataset always has model coordinates (time, period, scenario,
cluster) even if no data variable uses them. This is important for Dataset
validity and downstream operations.

Update benchmark_io.py to reflect the simplified API.

Memory footprint now:
- 24 timesteps: 568 bytes (23/24 vars are scalar)
- 168 timesteps: 2.8 KB
- 8760 timesteps × 20 periods: ~137 KB (vs 24 MB with old broadcasting)

Co-Authored-By: Claude Opus 4.5 <[email protected]>

* ⏺ Done. I've removed the _collapse_constant_arrays / _expand_collapsed_arrays optimization from the codebase. Here's a summary of the changes:

  Files modified:

  1. flixopt/io.py:
    - Removed COLLAPSED_VAR_PREFIX constant
    - Removed _collapse_constant_arrays() function (~45 lines)
    - Removed _expand_collapsed_arrays() function (~40 lines)
    - Removed collapse_constants parameter from save_dataset_to_netcdf()
    - Removed expansion logic from load_dataset_from_netcdf()
  2. flixopt/flow_system.py:
    - Removed import and call to _expand_collapsed_arrays in from_dataset()
  3. benchmark_io.py:
    - Simplified benchmark_netcdf() to only benchmark single save/load (no more comparison)
    - Removed collapse_constants parameter from roundtrip function

  Rationale: Since data is now kept in minimal form throughout the system (scalars stay scalars, 1D arrays stay 1D), there's no need for the extra collapsing/expanding layer when saving to NetCDF. The file sizes are naturally small because the data isn't expanded to full dimensions.

  Test results:
  - All 10 IO tests pass
  - All 4 integration tests pass
  - Benchmark runs successfully

* Summary of Fixes

  1. _dataset_resample handling all-scalar data (transform_accessor.py)

  When all data variables are scalars (no time dimension), the resample function now:
  - Creates a dummy variable to resample the time coordinate
  - Preserves all original scalar data variables
  - Preserves all non-time coordinates (period, scenario, cluster)

  2. _scalar_safe_reduce helper (modeling.py)

  Added a new helper function that safely performs aggregation operations (mean/sum/etc) over a dimension:
  - Returns reduced data if the dimension exists
  - Returns data unchanged if the dimension doesn't exist (scalar data)

  3. Updated Storage intercluster linking (components.py)

  Used _scalar_safe_reduce for:
  - relative_loss_per_hour.mean('time')
  - timestep_duration.mean('time')

  4. Updated clustering expansion (transform_accessor.py)

  Used _scalar_safe_reduce for relative_loss_per_hour.mean('time')

  5. Fixed dimension order in expand_data (clustering/base.py)

  When expanding clustered data back to original timesteps:
  - Added transpose to ensure (cluster, time) dimension order before numpy indexing
  - Fixes IndexError when dimensions were in different order

* Added reduction on loading old datasets

* Summary of changes made:

  1. flixopt/structure.py: Added dimension transpose in _ensure_coords() to ensure correct dimension order when data already has all dims but in wrong order
  2. tests/test_io_conversion.py: Updated test_returns_same_object → test_returns_equivalent_dataset since convert_old_dataset now creates a new dataset when reducing constants
  3. tests/deprecated/test_flow.py: Updated 6 assertions to expect minimal dimensions ('time',) instead of broadcasting to all model coords
  4. tests/deprecated/test_component.py: Updated 2 assertions to expect minimal dimensions ('time',)
  5. tests/deprecated/test_linear_converter.py: Updated 1 assertion to expect minimal dimensions ('time',)

  The key change is that data now stays in minimal form - a 1D time-varying array stays as ('time',) dimensions rather than being pre-broadcast to ('time', 'scenario') or other full model dimensions. Broadcasting happens at the linopy interface layer in FlowSystemModel.add_variables() when needed.

---------

Co-authored-by: Claude Opus 4.5 <[email protected]>

---------

Co-authored-by: Claude Opus 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants