-
Notifications
You must be signed in to change notification settings - Fork 9
Feature/speedup io+internals #576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add _scalar_safe_isel_drop() to modeling.py for selecting from potentially reduced arrays (handles both scalar and array cases) - Update Storage validation to use _scalar_safe_isel for relative_minimum/maximum_charge_state access at time=0 - Update StorageModel._relative_charge_state_bounds to handle reduced arrays properly with dimension checks These wrappers prepare the codebase for future optimization where _expand_reduced_dims() could be removed from from_dataset(), keeping data compact throughout the system. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Major change: DataConverter.to_dataarray() now validates dimensions instead of broadcasting to full target dimensions. This keeps data compact (scalars stay scalar, 1D arrays stay 1D) and lets linopy handle broadcasting at variable creation time. Changes: - core.py: Replace _broadcast_dataarray_to_target_specification with _validate_dataarray_dims in to_dataarray() method - components.py: Fix _relative_charge_state_bounds to handle scalar inputs that have no time dimension (expand to timesteps_extra) - conftest.py: Add assert_dims_compatible helper for tests - test_*.py: Update dimension assertions to use subset checking Memory impact (8760 timesteps × 20 periods): - Before: 24.1 MB dataset size - After: 137.1 KB dataset size Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Since DataConverter no longer broadcasts, these optimizations are redundant: - Remove _reduce_constant_dims() and _expand_reduced_dims() from io.py - Remove reduce_constants parameter from to_dataset() - Remove _expand_reduced_dims() call from from_dataset() Also ensure Dataset always has model coordinates (time, period, scenario, cluster) even if no data variable uses them. This is important for Dataset validity and downstream operations. Update benchmark_io.py to reflect the simplified API. Memory footprint now: - 24 timesteps: 568 bytes (23/24 vars are scalar) - 168 timesteps: 2.8 KB - 8760 timesteps × 20 periods: ~137 KB (vs 24 MB with old broadcasting) Co-Authored-By: Claude Opus 4.5 <[email protected]>
…d_arrays optimization from the codebase. Here's a summary of the changes:
Files modified:
1. flixopt/io.py:
- Removed COLLAPSED_VAR_PREFIX constant
- Removed _collapse_constant_arrays() function (~45 lines)
- Removed _expand_collapsed_arrays() function (~40 lines)
- Removed collapse_constants parameter from save_dataset_to_netcdf()
- Removed expansion logic from load_dataset_from_netcdf()
2. flixopt/flow_system.py:
- Removed import and call to _expand_collapsed_arrays in from_dataset()
3. benchmark_io.py:
- Simplified benchmark_netcdf() to only benchmark single save/load (no more comparison)
- Removed collapse_constants parameter from roundtrip function
Rationale: Since data is now kept in minimal form throughout the system (scalars stay scalars, 1D arrays stay 1D), there's no need for the extra collapsing/expanding layer when saving to NetCDF. The file sizes are naturally small because the data isn't expanded to full dimensions.
Test results:
- All 10 IO tests pass
- All 4 integration tests pass
- Benchmark runs successfully
1. _dataset_resample handling all-scalar data (transform_accessor.py)
When all data variables are scalars (no time dimension), the resample function now:
- Creates a dummy variable to resample the time coordinate
- Preserves all original scalar data variables
- Preserves all non-time coordinates (period, scenario, cluster)
2. _scalar_safe_reduce helper (modeling.py)
Added a new helper function that safely performs aggregation operations (mean/sum/etc) over a dimension:
- Returns reduced data if the dimension exists
- Returns data unchanged if the dimension doesn't exist (scalar data)
3. Updated Storage intercluster linking (components.py)
Used _scalar_safe_reduce for:
- relative_loss_per_hour.mean('time')
- timestep_duration.mean('time')
4. Updated clustering expansion (transform_accessor.py)
Used _scalar_safe_reduce for relative_loss_per_hour.mean('time')
5. Fixed dimension order in expand_data (clustering/base.py)
When expanding clustered data back to original timesteps:
- Added transpose to ensure (cluster, time) dimension order before numpy indexing
- Fixes IndexError when dimensions were in different order
1. flixopt/structure.py: Added dimension transpose in _ensure_coords() to ensure correct dimension order when data already has all dims but in wrong order
2. tests/test_io_conversion.py: Updated test_returns_same_object → test_returns_equivalent_dataset since convert_old_dataset now creates a new dataset when reducing constants
3. tests/deprecated/test_flow.py: Updated 6 assertions to expect minimal dimensions ('time',) instead of broadcasting to all model coords
4. tests/deprecated/test_component.py: Updated 2 assertions to expect minimal dimensions ('time',)
5. tests/deprecated/test_linear_converter.py: Updated 1 assertion to expect minimal dimensions ('time',)
The key change is that data now stays in minimal form - a 1D time-varying array stays as ('time',) dimensions rather than being pre-broadcast to ('time', 'scenario') or other full model dimensions. Broadcasting happens at the linopy interface layer in FlowSystemModel.add_variables() when needed.
* Add collapse to IO
* Temp
* 1. Added dimension reduction functions in io.py:
- _reduce_constant_dims(): Reduces dimensions where all values are constant
- _expand_reduced_dims(): Restores original dimensions with correct order
2. Added reduce_constants parameter to FlowSystem.to_dataset():
- Default: False (opt-in)
- When True: Reduces constant dimensions for memory efficiency
3. Updated FlowSystem.from_dataset():
- Expands both collapsed arrays (from NetCDF) and reduced dimensions
4. Key fixes:
- Store original dimension order in attrs to preserve (cluster, time) vs (time, cluster) ordering
- Skip solution variables in reduction (they're prefixed with solution|)
The optimization is opt-in via to_dataset(reduce_constants=True). For file storage, save_dataset_to_netcdf still collapses constants to scalars by default.
* Set reduce_constants=True by default in to_dataset()
Benchmarks show 99%+ reduction in memory and file size for multi-period
models, with faster IO speeds. The optimization is now enabled by default.
Co-Authored-By: Claude Opus 4.5 <[email protected]>
* Feature/speedup io+internals (#576)
* Add safe isel wrappers for dimension-independent operations
- Add _scalar_safe_isel_drop() to modeling.py for selecting from
potentially reduced arrays (handles both scalar and array cases)
- Update Storage validation to use _scalar_safe_isel for
relative_minimum/maximum_charge_state access at time=0
- Update StorageModel._relative_charge_state_bounds to handle
reduced arrays properly with dimension checks
These wrappers prepare the codebase for future optimization where
_expand_reduced_dims() could be removed from from_dataset(),
keeping data compact throughout the system.
Co-Authored-By: Claude Opus 4.5 <[email protected]>
* Remove pre-broadcasting from DataConverter - data stays minimal
Major change: DataConverter.to_dataarray() now validates dimensions
instead of broadcasting to full target dimensions. This keeps data
compact (scalars stay scalar, 1D arrays stay 1D) and lets linopy
handle broadcasting at variable creation time.
Changes:
- core.py: Replace _broadcast_dataarray_to_target_specification with
_validate_dataarray_dims in to_dataarray() method
- components.py: Fix _relative_charge_state_bounds to handle scalar
inputs that have no time dimension (expand to timesteps_extra)
- conftest.py: Add assert_dims_compatible helper for tests
- test_*.py: Update dimension assertions to use subset checking
Memory impact (8760 timesteps × 20 periods):
- Before: 24.1 MB dataset size
- After: 137.1 KB dataset size
Co-Authored-By: Claude Opus 4.5 <[email protected]>
* Remove redundant dimension reduction - data is minimal from start
Since DataConverter no longer broadcasts, these optimizations are redundant:
- Remove _reduce_constant_dims() and _expand_reduced_dims() from io.py
- Remove reduce_constants parameter from to_dataset()
- Remove _expand_reduced_dims() call from from_dataset()
Also ensure Dataset always has model coordinates (time, period, scenario,
cluster) even if no data variable uses them. This is important for Dataset
validity and downstream operations.
Update benchmark_io.py to reflect the simplified API.
Memory footprint now:
- 24 timesteps: 568 bytes (23/24 vars are scalar)
- 168 timesteps: 2.8 KB
- 8760 timesteps × 20 periods: ~137 KB (vs 24 MB with old broadcasting)
Co-Authored-By: Claude Opus 4.5 <[email protected]>
* ⏺ Done. I've removed the _collapse_constant_arrays / _expand_collapsed_arrays optimization from the codebase. Here's a summary of the changes:
Files modified:
1. flixopt/io.py:
- Removed COLLAPSED_VAR_PREFIX constant
- Removed _collapse_constant_arrays() function (~45 lines)
- Removed _expand_collapsed_arrays() function (~40 lines)
- Removed collapse_constants parameter from save_dataset_to_netcdf()
- Removed expansion logic from load_dataset_from_netcdf()
2. flixopt/flow_system.py:
- Removed import and call to _expand_collapsed_arrays in from_dataset()
3. benchmark_io.py:
- Simplified benchmark_netcdf() to only benchmark single save/load (no more comparison)
- Removed collapse_constants parameter from roundtrip function
Rationale: Since data is now kept in minimal form throughout the system (scalars stay scalars, 1D arrays stay 1D), there's no need for the extra collapsing/expanding layer when saving to NetCDF. The file sizes are naturally small because the data isn't expanded to full dimensions.
Test results:
- All 10 IO tests pass
- All 4 integration tests pass
- Benchmark runs successfully
* Summary of Fixes
1. _dataset_resample handling all-scalar data (transform_accessor.py)
When all data variables are scalars (no time dimension), the resample function now:
- Creates a dummy variable to resample the time coordinate
- Preserves all original scalar data variables
- Preserves all non-time coordinates (period, scenario, cluster)
2. _scalar_safe_reduce helper (modeling.py)
Added a new helper function that safely performs aggregation operations (mean/sum/etc) over a dimension:
- Returns reduced data if the dimension exists
- Returns data unchanged if the dimension doesn't exist (scalar data)
3. Updated Storage intercluster linking (components.py)
Used _scalar_safe_reduce for:
- relative_loss_per_hour.mean('time')
- timestep_duration.mean('time')
4. Updated clustering expansion (transform_accessor.py)
Used _scalar_safe_reduce for relative_loss_per_hour.mean('time')
5. Fixed dimension order in expand_data (clustering/base.py)
When expanding clustered data back to original timesteps:
- Added transpose to ensure (cluster, time) dimension order before numpy indexing
- Fixes IndexError when dimensions were in different order
* Added reduction on loading old datasets
* Summary of changes made:
1. flixopt/structure.py: Added dimension transpose in _ensure_coords() to ensure correct dimension order when data already has all dims but in wrong order
2. tests/test_io_conversion.py: Updated test_returns_same_object → test_returns_equivalent_dataset since convert_old_dataset now creates a new dataset when reducing constants
3. tests/deprecated/test_flow.py: Updated 6 assertions to expect minimal dimensions ('time',) instead of broadcasting to all model coords
4. tests/deprecated/test_component.py: Updated 2 assertions to expect minimal dimensions ('time',)
5. tests/deprecated/test_linear_converter.py: Updated 1 assertion to expect minimal dimensions ('time',)
The key change is that data now stays in minimal form - a 1D time-varying array stays as ('time',) dimensions rather than being pre-broadcast to ('time', 'scenario') or other full model dimensions. Broadcasting happens at the linopy interface layer in FlowSystemModel.add_variables() when needed.
---------
Co-authored-by: Claude Opus 4.5 <[email protected]>
---------
Co-authored-by: Claude Opus 4.5 <[email protected]>
Description
Brief description of the changes in this PR.
Type of Change
Related Issues
Closes #(issue number)
Testing
Checklist