Feature/speedup io+internals #576

FBumann · 2026-01-16T09:47:40Z

Description

Brief description of the changes in this PR.

Type of Change

Bug fix
New feature
Documentation update
Code refactoring

Related Issues

Closes #(issue number)

Testing

I have tested my changes
Existing tests still pass

Checklist

My code follows the project style
I have updated documentation if needed
I have added tests for new functionality (if applicable)

- Add _scalar_safe_isel_drop() to modeling.py for selecting from potentially reduced arrays (handles both scalar and array cases) - Update Storage validation to use _scalar_safe_isel for relative_minimum/maximum_charge_state access at time=0 - Update StorageModel._relative_charge_state_bounds to handle reduced arrays properly with dimension checks These wrappers prepare the codebase for future optimization where _expand_reduced_dims() could be removed from from_dataset(), keeping data compact throughout the system. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Major change: DataConverter.to_dataarray() now validates dimensions instead of broadcasting to full target dimensions. This keeps data compact (scalars stay scalar, 1D arrays stay 1D) and lets linopy handle broadcasting at variable creation time. Changes: - core.py: Replace _broadcast_dataarray_to_target_specification with _validate_dataarray_dims in to_dataarray() method - components.py: Fix _relative_charge_state_bounds to handle scalar inputs that have no time dimension (expand to timesteps_extra) - conftest.py: Add assert_dims_compatible helper for tests - test_*.py: Update dimension assertions to use subset checking Memory impact (8760 timesteps × 20 periods): - Before: 24.1 MB dataset size - After: 137.1 KB dataset size Co-Authored-By: Claude Opus 4.5 <[email protected]>

coderabbitai · 2026-01-16T09:48:12Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Since DataConverter no longer broadcasts, these optimizations are redundant: - Remove _reduce_constant_dims() and _expand_reduced_dims() from io.py - Remove reduce_constants parameter from to_dataset() - Remove _expand_reduced_dims() call from from_dataset() Also ensure Dataset always has model coordinates (time, period, scenario, cluster) even if no data variable uses them. This is important for Dataset validity and downstream operations. Update benchmark_io.py to reflect the simplified API. Memory footprint now: - 24 timesteps: 568 bytes (23/24 vars are scalar) - 168 timesteps: 2.8 KB - 8760 timesteps × 20 periods: ~137 KB (vs 24 MB with old broadcasting) Co-Authored-By: Claude Opus 4.5 <[email protected]>

…d_arrays optimization from the codebase. Here's a summary of the changes: Files modified: 1. flixopt/io.py: - Removed COLLAPSED_VAR_PREFIX constant - Removed _collapse_constant_arrays() function (~45 lines) - Removed _expand_collapsed_arrays() function (~40 lines) - Removed collapse_constants parameter from save_dataset_to_netcdf() - Removed expansion logic from load_dataset_from_netcdf() 2. flixopt/flow_system.py: - Removed import and call to _expand_collapsed_arrays in from_dataset() 3. benchmark_io.py: - Simplified benchmark_netcdf() to only benchmark single save/load (no more comparison) - Removed collapse_constants parameter from roundtrip function Rationale: Since data is now kept in minimal form throughout the system (scalars stay scalars, 1D arrays stay 1D), there's no need for the extra collapsing/expanding layer when saving to NetCDF. The file sizes are naturally small because the data isn't expanded to full dimensions. Test results: - All 10 IO tests pass - All 4 integration tests pass - Benchmark runs successfully

1. _dataset_resample handling all-scalar data (transform_accessor.py) When all data variables are scalars (no time dimension), the resample function now: - Creates a dummy variable to resample the time coordinate - Preserves all original scalar data variables - Preserves all non-time coordinates (period, scenario, cluster) 2. _scalar_safe_reduce helper (modeling.py) Added a new helper function that safely performs aggregation operations (mean/sum/etc) over a dimension: - Returns reduced data if the dimension exists - Returns data unchanged if the dimension doesn't exist (scalar data) 3. Updated Storage intercluster linking (components.py) Used _scalar_safe_reduce for: - relative_loss_per_hour.mean('time') - timestep_duration.mean('time') 4. Updated clustering expansion (transform_accessor.py) Used _scalar_safe_reduce for relative_loss_per_hour.mean('time') 5. Fixed dimension order in expand_data (clustering/base.py) When expanding clustered data back to original timesteps: - Added transpose to ensure (cluster, time) dimension order before numpy indexing - Fixes IndexError when dimensions were in different order

1. flixopt/structure.py: Added dimension transpose in _ensure_coords() to ensure correct dimension order when data already has all dims but in wrong order 2. tests/test_io_conversion.py: Updated test_returns_same_object → test_returns_equivalent_dataset since convert_old_dataset now creates a new dataset when reducing constants 3. tests/deprecated/test_flow.py: Updated 6 assertions to expect minimal dimensions ('time',) instead of broadcasting to all model coords 4. tests/deprecated/test_component.py: Updated 2 assertions to expect minimal dimensions ('time',) 5. tests/deprecated/test_linear_converter.py: Updated 1 assertion to expect minimal dimensions ('time',) The key change is that data now stays in minimal form - a 1D time-varying array stays as ('time',) dimensions rather than being pre-broadcast to ('time', 'scenario') or other full model dimensions. Broadcasting happens at the linopy interface layer in FlowSystemModel.add_variables() when needed.

* Add collapse to IO * Temp * 1. Added dimension reduction functions in io.py: - _reduce_constant_dims(): Reduces dimensions where all values are constant - _expand_reduced_dims(): Restores original dimensions with correct order 2. Added reduce_constants parameter to FlowSystem.to_dataset(): - Default: False (opt-in) - When True: Reduces constant dimensions for memory efficiency 3. Updated FlowSystem.from_dataset(): - Expands both collapsed arrays (from NetCDF) and reduced dimensions 4. Key fixes: - Store original dimension order in attrs to preserve (cluster, time) vs (time, cluster) ordering - Skip solution variables in reduction (they're prefixed with solution|) The optimization is opt-in via to_dataset(reduce_constants=True). For file storage, save_dataset_to_netcdf still collapses constants to scalars by default. * Set reduce_constants=True by default in to_dataset() Benchmarks show 99%+ reduction in memory and file size for multi-period models, with faster IO speeds. The optimization is now enabled by default. Co-Authored-By: Claude Opus 4.5 <[email protected]> * Feature/speedup io+internals (#576) * Add safe isel wrappers for dimension-independent operations - Add _scalar_safe_isel_drop() to modeling.py for selecting from potentially reduced arrays (handles both scalar and array cases) - Update Storage validation to use _scalar_safe_isel for relative_minimum/maximum_charge_state access at time=0 - Update StorageModel._relative_charge_state_bounds to handle reduced arrays properly with dimension checks These wrappers prepare the codebase for future optimization where _expand_reduced_dims() could be removed from from_dataset(), keeping data compact throughout the system. Co-Authored-By: Claude Opus 4.5 <[email protected]> * Remove pre-broadcasting from DataConverter - data stays minimal Major change: DataConverter.to_dataarray() now validates dimensions instead of broadcasting to full target dimensions. This keeps data compact (scalars stay scalar, 1D arrays stay 1D) and lets linopy handle broadcasting at variable creation time. Changes: - core.py: Replace _broadcast_dataarray_to_target_specification with _validate_dataarray_dims in to_dataarray() method - components.py: Fix _relative_charge_state_bounds to handle scalar inputs that have no time dimension (expand to timesteps_extra) - conftest.py: Add assert_dims_compatible helper for tests - test_*.py: Update dimension assertions to use subset checking Memory impact (8760 timesteps × 20 periods): - Before: 24.1 MB dataset size - After: 137.1 KB dataset size Co-Authored-By: Claude Opus 4.5 <[email protected]> * Remove redundant dimension reduction - data is minimal from start Since DataConverter no longer broadcasts, these optimizations are redundant: - Remove _reduce_constant_dims() and _expand_reduced_dims() from io.py - Remove reduce_constants parameter from to_dataset() - Remove _expand_reduced_dims() call from from_dataset() Also ensure Dataset always has model coordinates (time, period, scenario, cluster) even if no data variable uses them. This is important for Dataset validity and downstream operations. Update benchmark_io.py to reflect the simplified API. Memory footprint now: - 24 timesteps: 568 bytes (23/24 vars are scalar) - 168 timesteps: 2.8 KB - 8760 timesteps × 20 periods: ~137 KB (vs 24 MB with old broadcasting) Co-Authored-By: Claude Opus 4.5 <[email protected]> * ⏺ Done. I've removed the _collapse_constant_arrays / _expand_collapsed_arrays optimization from the codebase. Here's a summary of the changes: Files modified: 1. flixopt/io.py: - Removed COLLAPSED_VAR_PREFIX constant - Removed _collapse_constant_arrays() function (~45 lines) - Removed _expand_collapsed_arrays() function (~40 lines) - Removed collapse_constants parameter from save_dataset_to_netcdf() - Removed expansion logic from load_dataset_from_netcdf() 2. flixopt/flow_system.py: - Removed import and call to _expand_collapsed_arrays in from_dataset() 3. benchmark_io.py: - Simplified benchmark_netcdf() to only benchmark single save/load (no more comparison) - Removed collapse_constants parameter from roundtrip function Rationale: Since data is now kept in minimal form throughout the system (scalars stay scalars, 1D arrays stay 1D), there's no need for the extra collapsing/expanding layer when saving to NetCDF. The file sizes are naturally small because the data isn't expanded to full dimensions. Test results: - All 10 IO tests pass - All 4 integration tests pass - Benchmark runs successfully * Summary of Fixes 1. _dataset_resample handling all-scalar data (transform_accessor.py) When all data variables are scalars (no time dimension), the resample function now: - Creates a dummy variable to resample the time coordinate - Preserves all original scalar data variables - Preserves all non-time coordinates (period, scenario, cluster) 2. _scalar_safe_reduce helper (modeling.py) Added a new helper function that safely performs aggregation operations (mean/sum/etc) over a dimension: - Returns reduced data if the dimension exists - Returns data unchanged if the dimension doesn't exist (scalar data) 3. Updated Storage intercluster linking (components.py) Used _scalar_safe_reduce for: - relative_loss_per_hour.mean('time') - timestep_duration.mean('time') 4. Updated clustering expansion (transform_accessor.py) Used _scalar_safe_reduce for relative_loss_per_hour.mean('time') 5. Fixed dimension order in expand_data (clustering/base.py) When expanding clustered data back to original timesteps: - Added transpose to ensure (cluster, time) dimension order before numpy indexing - Fixes IndexError when dimensions were in different order * Added reduction on loading old datasets * Summary of changes made: 1. flixopt/structure.py: Added dimension transpose in _ensure_coords() to ensure correct dimension order when data already has all dims but in wrong order 2. tests/test_io_conversion.py: Updated test_returns_same_object → test_returns_equivalent_dataset since convert_old_dataset now creates a new dataset when reducing constants 3. tests/deprecated/test_flow.py: Updated 6 assertions to expect minimal dimensions ('time',) instead of broadcasting to all model coords 4. tests/deprecated/test_component.py: Updated 2 assertions to expect minimal dimensions ('time',) 5. tests/deprecated/test_linear_converter.py: Updated 1 assertion to expect minimal dimensions ('time',) The key change is that data now stays in minimal form - a 1D time-varying array stays as ('time',) dimensions rather than being pre-broadcast to ('time', 'scenario') or other full model dimensions. Broadcasting happens at the linopy interface layer in FlowSystemModel.add_variables() when needed. --------- Co-authored-by: Claude Opus 4.5 <[email protected]> --------- Co-authored-by: Claude Opus 4.5 <[email protected]>

FBumann and others added 2 commits January 16, 2026 10:33

FBumann and others added 5 commits January 16, 2026 10:56

Added reduction on loading old datasets

64f0c32

FBumann marked this pull request as ready for review January 16, 2026 11:01

FBumann merged commit 72b437c into feature/speedup-io Jan 16, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/speedup io+internals #576

Feature/speedup io+internals #576

Uh oh!

FBumann commented Jan 16, 2026

Uh oh!

coderabbitai bot commented Jan 16, 2026 •

edited

Loading

Review skipped

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Feature/speedup io+internals #576

Feature/speedup io+internals #576

Uh oh!

Conversation

FBumann commented Jan 16, 2026

Description

Type of Change

Related Issues

Testing

Checklist

Uh oh!

coderabbitai bot commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Jan 16, 2026 •

edited

Loading