Fix/segmentation expansion+variable categories #574

FBumann · 2026-01-15T14:43:10Z

Description

Brief description of the changes in this PR.

Type of Change

Bug fix
New feature
Documentation update
Code refactoring

Related Issues

Closes #(issue number)

Testing

I have tested my changes
Existing tests still pass

Checklist

My code follows the project style
I have updated documentation if needed
I have added tests for new functionality (if applicable)

Summary by CodeRabbit

Release Notes

New Features
- Added a utility function for combining multi-dimensional data slices into structured arrays.
- Introduced a new property to retrieve timestep positions within segments.
- Added semantic categorization system for optimization variables to improve classification and handling.
Refactor
- Optimized segmented data interpolation with vectorized operations for better performance.
- Improved multi-dimensional data assembly and expansion logic.
Tests
- Added comprehensive test coverage for data slice combination utility with dimensional and coordinate handling.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Changes Made 1. Added VariableCategory enum (structure.py:64-77) - STATE - For state variables like charge_state (interpolated within segments) - SEGMENT_TOTAL - For segment totals like effect contributions (divided by expansion divisor) - RATE - For rate variables like flow_rate (expanded as-is) - BINARY - For binary variables like status (expanded as-is) - OTHER - For uncategorized variables 2. Added variable_categories registry to FlowSystemModel (structure.py:214) - Dictionary mapping variable names to their categories 3. Modified add_variables() method (structure.py:388-396) - Added optional category parameter - Automatically registers variables with their category 4. Updated variable creation calls: - components.py: Storage variables (charge_state as STATE, netto_discharge as RATE) - elements.py: Flow variables (flow_rate as RATE, status as BINARY) - features.py: Effect contributions (per_timestep as SEGMENT_TOTAL, temporal shares as SEGMENT_TOTAL, startup/shutdown as BINARY) 5. Updated expand() method (transform_accessor.py:2074-2090) - Uses variable_categories registry to identify segment totals and state variables - Falls back to pattern matching for backwards compatibility with older FlowSystems Benefits - More robust categorization: Variables are categorized at creation time, not by pattern matching - Extensible: New variable types can easily be added with proper category - Backwards compatible: Old FlowSystems without categories still work via pattern matching fallback

New Categories (structure.py:45-103) class VariableCategory(Enum): # State variables CHARGE_STATE, SOC_BOUNDARY # Rate/Power variables FLOW_RATE, NETTO_DISCHARGE, VIRTUAL_FLOW # Binary state STATUS, INACTIVE # Binary events STARTUP, SHUTDOWN # Effect variables PER_TIMESTEP, SHARE, TOTAL, TOTAL_OVER_PERIODS # Investment SIZE, INVESTED # Counting/Duration STARTUP_COUNT, DURATION # Piecewise linearization INSIDE_PIECE, LAMBDA0, LAMBDA1, ZERO_POINT # Other OTHER Logical Groupings for Expansion EXPAND_INTERPOLATE = {CHARGE_STATE} # Interpolate between boundaries EXPAND_DIVIDE = {PER_TIMESTEP, SHARE} # Divide by expansion factor # Default: repeat within segment Files Modified ┌───────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ File │ Variables Updated │ ├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ components.py │ charge_state, netto_discharge, SOC_boundary │ ├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ elements.py │ flow_rate, status, virtual_supply, virtual_demand │ ├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ features.py │ size, invested, inactive, startup, shutdown, startup_count, inside_piece, lambda0, lambda1, zero_point, total, per_timestep, shares │ ├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ effects.py │ total, total_over_periods │ ├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ modeling.py │ duration │ ├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ transform_accessor.py │ Updated to use EXPAND_INTERPOLATE and EXPAND_DIVIDE groupings │ └───────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ Test Results - All 83 cluster/expand tests pass - Variable categories correctly populated and grouped

Changes Made 1. Added combine_slices() utility to flixopt/clustering/base.py (lines 52-107) - Simple function that stacks dict of {(dim_values): np.ndarray} into a DataArray - Much cleaner than the previous reverse-concat pattern 2. Refactored 3 methods to use the new utility: - Clustering.expand_data() - reduced from ~25 to ~12 lines - Clustering.build_expansion_divisor() - reduced from ~35 to ~20 lines - TransformAccessor._interpolate_charge_state_segmented() - reduced from ~43 to ~27 lines 3. Added 4 unit tests for combine_slices() in tests/test_cluster_reduce_expand.py Results ┌───────────────────────────────────┬──────────┬────────────────────────┐ │ Metric │ Before │ After │ ├───────────────────────────────────┼──────────┼────────────────────────┤ │ Complex reverse-concat blocks │ 3 │ 0 │ ├───────────────────────────────────┼──────────┼────────────────────────┤ │ Lines of dimension iteration code │ ~100 │ ~60 │ ├───────────────────────────────────┼──────────┼────────────────────────┤ │ Test coverage │ 83 tests │ 87 tests (all passing) │ └───────────────────────────────────┴──────────┴────────────────────────┘ The Pattern Change Before (complex reverse-concat): result_arrays = slices for dim in reversed(extra_dims): grouped = {} for key, arr in result_arrays.items(): rest_key = key[:-1] if len(key) > 1 else () grouped.setdefault(rest_key, []).append(arr) result_arrays = {k: xr.concat(v, dim=...) for k, v in grouped.items()} result = list(result_arrays.values())[0].transpose('time', ...) After (simple combine): return combine_slices(slices, extra_dims, dim_coords, 'time', output_coord, attrs)

1. Fully Vectorized expand_data() Before (~65 lines with loops): for combo in np.ndindex(*[len(v) for v in dim_coords.values()]): selector = {...} mapping = _select_dims(timestep_mapping, **selector).values data_slice = _select_dims(aggregated, **selector) slices[key] = _expand_slice(mapping, data_slice) return combine_slices(slices, ...) After (~25 lines, fully vectorized): timestep_mapping = self.timestep_mapping # Already multi-dimensional! cluster_indices = timestep_mapping // time_dim_size time_indices = timestep_mapping % time_dim_size expanded = aggregated.isel(cluster=cluster_indices, time=time_indices) # xarray handles broadcasting across period/scenario automatically 2. build_expansion_divisor() and _interpolate_charge_state_segmented() These still use combine_slices() because they need per-result segment data (segment_assignments, segment_durations) which isn't available as concatenated Clustering properties yet. Current State ┌───────────────────────────────────────┬─────────────────┬─────────────────────────────────┐ │ Method │ Vectorized? │ Uses Clustering Properties │ ├───────────────────────────────────────┼─────────────────┼─────────────────────────────────┤ │ expand_data() │ Yes │ timestep_mapping (fully) │ ├───────────────────────────────────────┼─────────────────┼─────────────────────────────────┤ │ build_expansion_divisor() │ No (small loop) │ cluster_assignments (partially) │ ├───────────────────────────────────────┼─────────────────┼─────────────────────────────────┤ │ _interpolate_charge_state_segmented() │ No (small loop) │ cluster_assignments (partially) │ └───────────────────────────────────────┴─────────────────┴─────────────────────────────────┘

1. _interpolate_charge_state_segmented() - Fully vectorized from ~110 lines to ~55 lines - Uses clustering.timestep_mapping for indexing - Uses clustering.results.segment_assignments, segment_durations, and position_within_segment - Single xarray expression instead of triple-nested loops Previously completed (from before context limit): - Added segment_assignments multi-dimensional property to ClusteringResults - Added segment_durations multi-dimensional property to ClusteringResults - Added position_within_segment property to ClusteringResults - Vectorized expand_data() - Vectorized build_expansion_divisor() Test results: All 130 tests pass (87 cluster/expand + 43 IO tests) The combine_slices utility function is still available in clustering/base.py if needed in the future, but all the main dimension-handling methods now use xarray's vectorized advanced indexing instead of the loop-based slice-and-combine pattern.

Summary of Simplifications 1. expand_da() in transform_accessor.py - Extracted duplicate "append extra timestep" logic into _append_final_state() helper - Reduced from ~50 lines to ~25 lines - Eliminated code duplication 2. _build_multi_dim_array() → _build_property_array() in clustering/base.py - Replaced 6 conditional branches with unified np.ndindex() pattern - Now handles both simple and multi-dimensional cases in one method - Reduced from ~50 lines to ~25 lines - Preserves dtype (fixed integer indexing bug) 3. Property boilerplate in ClusteringResults - 5 properties (cluster_assignments, cluster_occurrences, cluster_centers, segment_assignments, segment_durations) now use the unified _build_property_array() - Each property reduced from ~25 lines to ~8 lines - Total: ~165 lines → ~85 lines 4. _build_timestep_mapping() in Clustering - Simplified to single call using _build_property_array() - Reduced from ~16 lines to ~9 lines Total lines removed: ~150+ lines of duplicated/complex code

…as fixed: Summary The IO roundtrip bug was caused by representative_weights (a variable with only ('cluster',) dimension) being copied as-is during expansion, which caused the cluster dimension to incorrectly persist in the expanded dataset. Fix applied in transform_accessor.py:2063-2065: # Skip cluster-only vars (no time dim) - they don't make sense after expansion if da.dims == ('cluster',): continue This skips variables that have only a cluster dimension (and no time dimension) during expansion, as these variables don't make sense after the clustering structure is removed. Test results: - All 87 tests in test_cluster_reduce_expand.py pass ✓ - All 43 tests in test_clustering_io.py pass ✓ - Manual IO roundtrip test passes ✓ - Tests with different segment counts (3, 6) pass ✓ - Tests with 2-hour timesteps pass ✓

# Skip vars with cluster dim but no time dim - they don't make sense after expansion # (e.g., representative_weights with dims ('cluster',) or ('cluster', 'period')) if 'cluster' in da.dims and 'time' not in da.dims: continue This correctly handles: - ('cluster',) - simple cluster-only variables like cluster_weight - ('cluster', 'period') - cluster variables with period dimension - ('cluster', 'scenario') - cluster variables with scenario dimension - ('cluster', 'period', 'scenario') - cluster variables with both Variables with both cluster and time dimensions (like timestep_duration with dims ('cluster', 'time')) are correctly expanded since they contain time-series data that needs to be mapped back to original timesteps.

coderabbitai · 2026-01-15T14:43:18Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

Introduces a VariableCategory enum for semantic classification of optimization variables, propagates categories through variable creation across multiple model components, and implements vectorized expansion/interpolation logic for segmented systems. New clustering utilities (combine_slices) and properties (position_within_segment) support multi-dimensional coordinate handling.

Changes

Cohort / File(s)	Summary
Variable Categorization System `flixopt/structure.py`	Introduces VariableCategory enum with semantic categories (STATE, FLOW_RATE, STATUS, CHARGE_STATE, etc.) and expansion-related groupings (EXPAND_INTERPOLATE, EXPAND_DIVIDE, EXPAND_FIRST_TIMESTEP). Updates FlowSystemModel and Submodel.add_variables to track and register variable categories.
Category Annotation Across Components `flixopt/components.py`, `flixopt/effects.py`, `flixopt/elements.py`, `flixopt/features.py`	Imports VariableCategory and annotates variables with appropriate categories during creation: CHARGE_STATE, NETTO_DISCHARGE, SOC_BOUNDARY, TOTAL, FLOW_RATE, STATUS, VIRTUAL_FLOW, SIZE, INVESTED, STARTUP, etc.
Flow System Integration `flixopt/flow_system.py`, `flixopt/modeling.py`	Imports VariableCategory; adds _variable_categories tracking in FlowSystem; implements serialization/deserialization of variable categories into dataset attributes; preserves categories through solve operations.
Clustering Utilities & Vectorization `flixopt/clustering/base.py`	Adds combine_slices utility function for constructing multi-dimensional DataArrays from slice mappings. Introduces position_within_segment public property for computing zero-based timestep indices within segments. Refactors internal data-assembly with _build_property_array for flexible multi-dimensional support.
Expansion & Interpolation Logic `flixopt/transform_accessor.py`	Replaces explicit loop-based interpolation with vectorized multi-dimensional approach for segmented charge state. Adds state variable detection and final-state appending; uses EXPAND_DIVIDE registry for segment-total division. Consolidates extra-dimension handling into unified vectorized operations.
Test Coverage `tests/test_cluster_reduce_expand.py`	Adds TestCombineSlices test class with four test methods covering single/multi-dimensional cases, attribute propagation, and datetime coordinate handling for the new combine_slices utility.

Sequence Diagram(s)

sequenceDiagram
    participant VarCreation as Variable Creation<br/>(add_variables)
    participant FlowSystem as FlowSystem
    participant VarCatRegistry as Variable Category<br/>Registry
    participant Serialization as Serialization<br/>(to_dataset)
    participant Expansion as Expansion Logic<br/>(transform_accessor)
    
    VarCreation->>FlowSystem: add_variables(name, category=VC.FLOW_RATE)
    FlowSystem->>VarCatRegistry: Store mapping {name: category}
    VarCatRegistry-->>FlowSystem: Registered
    
    FlowSystem->>Serialization: to_dataset()
    Serialization->>Serialization: Extract variable_categories dict
    Serialization-->>Serialization: Serialize to dataset attrs
    
    Note over Serialization: Dataset persisted with<br/>variable category metadata
    
    Expansion->>VarCatRegistry: Lookup category for variable
    VarCatRegistry-->>Expansion: Return category (e.g., EXPAND_DIVIDE)
    Expansion->>Expansion: Apply vectorized expansion<br/>based on category
    Expansion-->>Expansion: Generate expanded DataArray

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Feature: Clustering IO, single selection, and weights API improvements #549: Modifies clustering base and transform accessor with related timestep/mapping changes and multi-dimensional expansion refactoring
Clustering Parameter Refinements & Unified Slot Assignment #552: Shares clustering and expansion codepath modifications, including multi-dimensional properties and vectorized interpolation/expansion logic

Poem

🐰 Behold! Variables now wear their categories with pride,
Like labeled carrots sorted by size, shape, and type!
Slices combine, segments expand with vectorized grace,
And clustering utilities dance through multi-dim space. ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description consists entirely of a template with uncompleted sections—no actual description of changes, objectives, or testing is provided by the author.	Replace the template placeholder with a concrete description of the changes, including what was fixed, why the changes were needed, and a summary of test results.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Fix/segmentation expansion+variable categories' is closely related to the changeset, referencing two main aspects (segmentation expansion and variable categories) that are prominent throughout the code changes.
Docstring Coverage	✅ Passed	Docstring coverage is 90.74% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

FBumann · 2026-01-15T14:43:43Z

@coderabbitai review

coderabbitai · 2026-01-15T14:43:51Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@flixopt/clustering/base.py`:
- Around line 52-108: combine_slices currently can break dtypes and crash
silently on empty input and missing keys; fix by validating input non-empty at
the top (if not slices: raise ValueError("slices cannot be empty")), capture the
first array via first = next(iter(slices.values())) and use its dtype when
allocating the output buffer (data = np.empty(shape, dtype=first.dtype)), and
replace the raw dict lookup inside the loop with a try/except that raises a
clearer KeyError including the missing key and the extra_dims (e.g., try:
data[...] = slices[key] except KeyError: raise KeyError(f"Missing slice for key
{key} (extra_dims={extra_dims})")).

In `@flixopt/flow_system.py`:
- Around line 259-262: Clear the stale _variable_categories mapping when the
model is invalidated and when reset() is called: add
self._variable_categories.clear() (or reassign {}) inside _invalidate_model()
and reset() so segment expansion (transform.expand) can’t reuse old categories;
and harden enum restoration by wrapping VariableCategory(value) in a try/except
ValueError (or use a safe lookup) to map unknown/renamed values to a fallback
enum (e.g., VariableCategory.UNKNOWN or a default) instead of raising, applying
this safe restore wherever serialized values are converted back to
VariableCategory.

In `@flixopt/transform_accessor.py`:
- Around line 1844-1858: The timestep_mapping decode wrongly always uses
clustering.timesteps_per_cluster; for segmented systems use
clustering.n_segments instead. In the block that computes time_dim_size (around
where timestep_mapping, cluster_indices, and time_indices are computed), set
time_dim_size = clustering.n_segments when clustering.is_segmented and
clustering.n_segments is not None, otherwise use
clustering.timesteps_per_cluster so that subsequent indexing into
segment_assignments and position_within_segment uses the correct divisor
(matching the logic in expand_data() and build_expansion_divisor()).

🧹 Nitpick comments (3)

flixopt/structure.py (1)

1670-1698: Submodel.add_variables(..., category=...) wiring looks correct; tighten typing + consider overwrite semantics.

Two small tweaks:

Type hint as VariableCategory | None for clarity.

Decide whether re-registering the same variable.name with a different category should warn (today it silently overwrites).

flixopt/clustering/base.py (2)

326-405: _build_property_array refactor is a solid direction; one ordering edge-case to confirm.

This is much cleaner than hand-rolling multidim assembly per property, and preserving dtype via np.empty(..., dtype=...) is a good touch. Only thing to double-check: key construction assumes extra dims are ordered (period, scenario); if self._dim_names order ever differs, self._results[key] lookups can break.

Also applies to: 523-558

422-474: position_within_segment implementation looks correct; consider output_dtypes for apply_ufunc.

If this ever runs with dask-backed arrays, xr.apply_ufunc(..., vectorize=True) can benefit from explicit output_dtypes=[...] to avoid dtype inference surprises.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3336eb5 and 3bdcd1d.

📒 Files selected for processing (10)

flixopt/clustering/base.py
flixopt/components.py
flixopt/effects.py
flixopt/elements.py
flixopt/features.py
flixopt/flow_system.py
flixopt/modeling.py
flixopt/structure.py
flixopt/transform_accessor.py
tests/test_cluster_reduce_expand.py

🧰 Additional context used

🧬 Code graph analysis (4)

flixopt/modeling.py (1)

flixopt/structure.py (2)

Submodel (1646-1844)

VariableCategory (45-90)

flixopt/features.py (1)

flixopt/structure.py (9)

FlowSystemModel (149-413)

Submodel (1646-1844)

VariableCategory (45-90)

add_variables (1670-1698)

get_coords (358-393)

get_coords (1748-1753)

dims (293-295)

label_full (1214-1215)

label_full (1779-1780)

tests/test_cluster_reduce_expand.py (1)

flixopt/clustering/base.py (7)

combine_slices (52-107)

dims (205-207)

dims (685-687)

values (292-294)

values (1306-1313)

coords (215-221)

coords (690-700)

flixopt/transform_accessor.py (2)

flixopt/clustering/base.py (19)

timestep_mapping (802-808)

segment_assignments (368-382)

segment_assignments (841-849)

results (1221-1232)

segment_durations (385-404)

segment_durations (852-860)

position_within_segment (423-473)

timesteps_per_cluster (309-311)

timesteps_per_cluster (665-667)

isel (244-268)

is_segmented (752-758)

build_expansion_divisor (924-969)

items (284-286)

items (1291-1298)

values (292-294)

values (1306-1313)

dims (205-207)

dims (685-687)

expand_data (877-922)

flixopt/flow_system.py (3)

isel (2581-2607)

is_segmented (2243-2249)

dims (2137-2157)

🔇 Additional comments (24)

flixopt/structure.py (2)

158-164: FlowSystemModel.variable_categories is a good central registry—ensure it’s kept consistent in IO paths.

This is a clean “single source of truth” for expansion metadata; just ensure load/save roundtrips restore it (and don’t silently drop it).

16-104: No changes needed — serialization is already handled correctly.

The enum values are already serialization-friendly. The code explicitly converts VariableCategory enum members to their .value strings during JSON serialization (lines 751–754 in flow_system.py) and reconstructs them from strings on deserialization (lines 919–924). This is proper handling and requires no modification.

Using frozenset for the expansion constants is a reasonable optional improvement for immutability (no actual mutation risk exists today), but it is not necessary for correctness or serialization.

Likely an incorrect or invalid review comment.

flixopt/features.py (4)

14-15: Importing VariableCategory for tagging is straightforward.

67-81: Investment variables tagged sensibly (SIZE, INVESTED).

197-257: Status tracking tags (INACTIVE, STARTUP/SHUTDOWN, STARTUP_COUNT) align with the declared expansion groupings.

One thing to double-check: coords=self.get_coords() (Line 224/230) includes 'time' in all configurations where startup/shutdown constraints are built.

403-425: Piecewise-related tagging (INSIDE_PIECE, LAMBDA0/1, ZERO_POINT) looks consistent.

Also applies to: 515-520

flixopt/modeling.py (2)

7-9: Category import is fine.

296-307: Tagging the duration tracking variable as DURATION is consistent with the new registry.

Please verify duration_per_step is always an xr.DataArray here (it’s typed as scalar-or-DataArray, but sum(...) is used earlier), otherwise this path can still break irrespective of the new category.

flixopt/components.py (3)

20-21: Import update for tagging is fine.

935-949: Storage tagging matches intended expansion semantics; verify extra-timestep interpolation path.

charge_state is both extra_timestep=True and CHARGE_STATE; make sure segmented expansion handles the last boundary timestep correctly (no off-by-one / dropped final point).

1344-1351: Tagging SOC_boundary as SOC_BOUNDARY is consistent and should help expansion logic.

flixopt/effects.py (2)

20-28: Import block update is OK.

383-409: Effect total variables are categorized appropriately (TOTAL, TOTAL_OVER_PERIODS).

Just ensure expansion code doesn’t try to “expand” these when there’s no time/cluster dim (should be a no-op).

tests/test_cluster_reduce_expand.py (1)

1398-1478: Nice targeted coverage for combine_slices (dims/coords/attrs + DatetimeIndex).

These tests should catch most regressions in dimension ordering, coordinate wiring, and attribute propagation.

flixopt/elements.py (4)

19-25: Importing VariableCategory here is appropriate for centralized variable tagging.

671-677: Flow rate variable categorization looks correct (FLOW_RATE).

730-736: Status variables are consistently tagged (STATUS) for both FlowModel and ComponentModel.

Also applies to: 1043-1049

966-978: Virtual imbalance vars are correctly categorized as VIRTUAL_FLOW.

flixopt/clustering/base.py (1)

877-970: Vectorized expand_data / build_expansion_divisor is great—please sanity-check against pinned xarray.

The advanced-indexing approach should scale better and eliminates nested loops over period/scenario. The cleanup (drop_vars(['cluster','time',...]) + renaming original_time→time) is sensible, but behavior can differ subtly across xarray versions (coords retained vs. turned into non-dim coords).

Also applies to: 1072-1081

flixopt/transform_accessor.py (5)

19-20: LGTM!

Clean import of the new category constants used for registry-based expansion logic below.

1860-1877: Interpolation logic is mathematically sound.

The interpolation factor calculation and linear interpolation between segment boundaries is correct:

Uses mid-point positioning within segments (positions + 0.5)

Handles single-timestep segments with fallback to 0.5

Properly indexes start (seg_indices) and end (seg_indices + 1) boundary values

2058-2067: Good defensive handling for cluster-only variables.

The new skip logic at lines 2065-2066 correctly excludes variables that have a cluster dimension but no time dimension. These variables (e.g., representative_weights) are cluster-level metadata that don't make sense after expansion to original timesteps.

2031-2053: Clear control flow for variable expansion paths.

The expand_da function correctly handles the three expansion paths:

Segmented state variables: Interpolate within segments, then append final boundary value

Segment total variables: Expand and divide by segment duration to get hourly rates

Other variables (including non-segmented state): Standard expansion with final state append for state vars

The mutual exclusivity between state variables and segment totals ensures no variable gets both treatments.

2002-2029: Good backward-compatible registry approach with pattern-matching fallback.

The implementation correctly:

Attempts registry-based lookup first via variable_categories (lines 2006, 2014)

Falls back to structural pattern matching for older FlowSystems without categories

Properly extracts and appends final state values for state variables

The membership checks are valid: EXPAND_DIVIDE and EXPAND_INTERPOLATE are sets of VariableCategory values defined in structure.py (lines 96, 99), supporting the in operator used at lines 2006 and 2014.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

coderabbitai · 2026-01-15T14:54:11Z

flixopt/clustering/base.py

+def combine_slices(
+    slices: dict[tuple, np.ndarray],
+    extra_dims: list[str],
+    dim_coords: dict[str, list],
+    output_dim: str,
+    output_coord: Any,
+    attrs: dict | None = None,
+) -> xr.DataArray:
+    """Combine {(dim_values): 1D_array} dict into a DataArray.
+
+    This utility simplifies the common pattern of iterating over extra dimensions
+    (like period, scenario), processing each slice, and combining results.
+
+    Args:
+        slices: Dict mapping dimension value tuples to 1D numpy arrays.
+            Keys are tuples like ('period1', 'scenario1') matching extra_dims order.
+        extra_dims: Dimension names in order (e.g., ['period', 'scenario']).
+        dim_coords: Dict mapping dimension names to coordinate values.
+        output_dim: Name of the output dimension (typically 'time').
+        output_coord: Coordinate values for output dimension.
+        attrs: Optional DataArray attributes.
+
+    Returns:
+        DataArray with dims [output_dim, *extra_dims].
+
+    Example:
+        >>> slices = {
+        ...     ('P1', 'base'): np.array([1, 2, 3]),
+        ...     ('P1', 'high'): np.array([4, 5, 6]),
+        ...     ('P2', 'base'): np.array([7, 8, 9]),
+        ...     ('P2', 'high'): np.array([10, 11, 12]),
+        ... }
+        >>> result = combine_slices(
+        ...     slices,
+        ...     extra_dims=['period', 'scenario'],
+        ...     dim_coords={'period': ['P1', 'P2'], 'scenario': ['base', 'high']},
+        ...     output_dim='time',
+        ...     output_coord=[0, 1, 2],
+        ... )
+        >>> result.dims
+        ('time', 'period', 'scenario')
+    """
+    n_output = len(next(iter(slices.values())))
+    shape = [n_output] + [len(dim_coords[d]) for d in extra_dims]
+    data = np.empty(shape)
+
+    for combo in np.ndindex(*shape[1:]):
+        key = tuple(dim_coords[d][i] for d, i in zip(extra_dims, combo, strict=True))
+        data[(slice(None),) + combo] = slices[key]
+
+    return xr.DataArray(
+        data,
+        dims=[output_dim] + extra_dims,
+        coords={output_dim: output_coord, **dim_coords},
+        attrs=attrs or {},
+    )
+


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Find project configuration files fd -HI 'pyproject\.toml|setup\.cfg|setup\.py|requirements.*\.txt|environment\.yml|pixi\.toml' . -type f

Repository: flixOpt/flixopt

Length of output: 229

🏁 Script executed:

# Check the actual combine_slices implementation and surrounding context cat -n flixopt/clustering/base.py | head -120

Repository: flixOpt/flixopt

Length of output: 5161

🏁 Script executed:

# Search for Python version requirements and dependency pins rg -n "requires-python|python_requires|Programming Language :: Python ::|xarray|numpy" . --type toml --type cfg --type txt -A 2

Repository: flixOpt/flixopt

Length of output: 86

🏁 Script executed:

# Find config files find . -maxdepth 2 -name "pyproject.toml" -o -name "setup.py" -o -name "setup.cfg" -o -name "requirements*.txt" -o -name "pixi.toml" 2>/dev/null

Repository: flixOpt/flixopt

Length of output: 75

🏁 Script executed:

# Search for Python version and dependency information rg "requires-python|python_requires|Programming Language :: Python" . -A 2

Repository: flixOpt/flixopt

Length of output: 537

🏁 Script executed:

# Look for tests of combine_slices rg -n "combine_slices" . -B 2 -A 5

Repository: flixOpt/flixopt

Length of output: 9577

🏁 Script executed:

# Get full test class for combine_slices sed -n '1398,1500p' ./tests/test_cluster_reduce_expand.py

Repository: flixOpt/flixopt

Length of output: 2955

🏁 Script executed:

# Check if there are more tests beyond line 1500 sed -n '1500,1600p' ./tests/test_cluster_reduce_expand.py

Repository: flixOpt/flixopt

Length of output: 41

🏁 Script executed:

# Check xarray and numpy versions in pyproject.toml cat ./pyproject.toml | grep -A 20 "dependencies"

Repository: flixOpt/flixopt

Length of output: 1738

🏁 Script executed:

# Check how combine_slices is actually used in the codebase rg -B 5 -A 10 "combine_slices\(" ./flixopt --type py | head -60

Repository: flixOpt/flixopt

Length of output: 2088

🏁 Script executed:

# Check if there are any integer or boolean dtype usages in the codebase related to clustering rg "dtype.*int|dtype.*bool" ./flixopt/clustering -A 2 -B 2

Repository: flixOpt/flixopt

Length of output: 449

🏁 Script executed:

# Check if any code passes integer arrays to combine_slices or similar rg "_build_property_array|combine_slices" ./flixopt/clustering -B 3 -A 15 | grep -A 15 "np.array"

Repository: flixOpt/flixopt

Length of output: 5321

🏁 Script executed:

# Check test file for any integer array tests grep -n "np.array.*\[.*[0-9]" ./tests/test_cluster_reduce_expand.py | grep -v "1.0\|2.0\|3.0\|4.0\|5.0\|6.0\|7.0\|8.0\|9.0\|10\|11\|12" | head -20

Repository: flixOpt/flixopt

Length of output: 41

🏁 Script executed:

# Check if there are tests for edge cases like empty dict or missing keys rg -A 5 "empty|missing|KeyError|StopIteration" ./tests/test_cluster_reduce_expand.py | grep -A 5 "TestCombineSlices"

Repository: flixOpt/flixopt

Length of output: 41

🏁 Script executed:

# Look at _build_property_array to see what arrays it creates rg -B 5 -A 20 "_build_property_array" ./flixopt/clustering/base.py | head -100

Repository: flixOpt/flixopt

Length of output: 3636

🏁 Script executed:

# Check the _cluster_occurrences function to confirm it returns integers rg -B 2 -A 10 "_cluster_occurrences" ./flixopt/clustering/base.py

Repository: flixOpt/flixopt

Length of output: 1035

🏁 Script executed:

# Get the full _build_property_array implementation rg -A 30 "def _build_property_array" ./flixopt/clustering/base.py

Repository: flixOpt/flixopt

Length of output: 1433

🏁 Script executed:

# Check what happens with integer arrays - specifically look for cluster_assignments and similar rg "cluster_assignments|cluster_occurrences" ./flixopt/clustering/base.py -B 2 -A 8 | grep -A 8 "_build_property_array"

Repository: flixOpt/flixopt

Length of output: 673

🏁 Script executed:

# Check if combine_slices is actually imported or called in the codebase rg "from.*combine_slices|import.*combine_slices" ./flixopt --type py

Repository: flixOpt/flixopt

Length of output: 41

🏁 Script executed:

# Search for actual calls to combine_slices (not just definitions) rg "combine_slices\(" ./flixopt --type py -B 3 | grep -v "def combine_slices"

Repository: flixOpt/flixopt

Length of output: 440

🏁 Script executed:

# Verify what _cluster_occurrences returns - is it definitely integers? python3 << 'EOF' from collections import Counter import numpy as np counts = Counter([1, 1, 2, 3, 3, 3]) result = np.array([counts.get(i, 0) for i in range(4)]) print(f"Type: {type(result[0])}") print(f"Dtype: {result.dtype}") print(f"Values: {result}") EOF

Repository: flixOpt/flixopt

Length of output: 145

combine_slices: preserve dtype and add input validation.

The function uses np.empty(shape) which defaults to float64, causing integer and boolean arrays to lose their dtype. Additionally, next(iter(slices.values())) will raise StopIteration if the dict is empty, and missing keys will surface with an unhelpful KeyError.

Suggested fixes:

Preserve dtype: data = np.empty(shape, dtype=first.dtype) where first = next(iter(slices.values()))

Validate non-empty input: if not slices: raise ValueError("slices cannot be empty")

Enhance key lookup errors with context about which key was missing

Note: zip(..., strict=True) is valid—the project requires Python ≥3.11, which supports this since Python 3.10.

🤖 Prompt for AI Agents

In `@flixopt/clustering/base.py` around lines 52 - 108, combine_slices currently can break dtypes and crash silently on empty input and missing keys; fix by validating input non-empty at the top (if not slices: raise ValueError("slices cannot be empty")), capture the first array via first = next(iter(slices.values())) and use its dtype when allocating the output buffer (data = np.empty(shape, dtype=first.dtype)), and replace the raw dict lookup inside the loop with a try/except that raises a clearer KeyError including the missing key and the extra_dims (e.g., try: data[...] = slices[key] except KeyError: raise KeyError(f"Missing slice for key {key} (extra_dims={extra_dims})")).

coderabbitai · 2026-01-15T14:54:11Z

flixopt/flow_system.py

+        # Variable categories for segment expansion handling
+        # Populated when model is built, used by transform.expand()
+        self._variable_categories: dict[str, VariableCategory] = {}
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

rg -n --hidden --no-ignore -S "_variable_categories" .

Repository: flixOpt/flixopt

Length of output: 645

🏁 Script executed:

# Examine the from_dataset restore at lines 920-930 sed -n '920,930p' flixopt/flow_system.py # Examine to_dataset serialization at lines 745-765 sed -n '745,765p' flixopt/flow_system.py # Examine solve() copy at lines 1630-1640 sed -n '1630,1640p' flixopt/flow_system.py # Look for invalidate/reset methods rg -n "(invalidate|reset|clear)" flixopt/flow_system.py | head -20

Repository: flixOpt/flixopt

Length of output: 3312

🏁 Script executed:

# Examine _invalidate_model() method to see if it clears _variable_categories sed -n '1674,1710p' flixopt/flow_system.py # Also check reset() method rg -n "def reset" flixopt/flow_system.py -A 15

Repository: flixOpt/flixopt

Length of output: 2164

🏁 Script executed:

# Get the full reset() method implementation sed -n '1697,1730p' flixopt/flow_system.py # Also check if VariableCategory enum can raise ValueError rg -n "class VariableCategory" . -A 10

Repository: flixOpt/flixopt

Length of output: 2103

🏁 Script executed:

# Examine the full from_dataset context around lines 920-925 to understand error handling sed -n '900,935p' flixopt/flow_system.py # Check if VariableCategory enum has specific values defined sed -n '45,100p' flixopt/structure.py

Repository: flixOpt/flixopt

Length of output: 4034

Clear _variable_categories on model invalidation and harden enum restore against unknown values.

The _variable_categories mapping is not cleared by _invalidate_model() or reset(), allowing stale categories to persist across structural changes and cause incorrect behavior in segment expansion. Additionally, VariableCategory(value) raises ValueError on unknown/renamed enum values, making deserialization brittle across enum evolution.

Hardening for enum restore with graceful fallback and cleanup

- if 'variable_categories' in reference_structure: - categories_dict = json.loads(reference_structure['variable_categories']) - # Convert string values back to VariableCategory enum - flow_system._variable_categories = { - name: VariableCategory(value) for name, value in categories_dict.items() - } + if 'variable_categories' in reference_structure: + categories_dict = json.loads(reference_structure['variable_categories']) + restored: dict[str, VariableCategory] = {} + for name, value in categories_dict.items(): + try: + restored[name] = VariableCategory(value) + except ValueError: + logger.warning("Unknown VariableCategory value %r for variable %r; skipping.", value, name) + flow_system._variable_categories = restored

And in _invalidate_model() add:

self._flow_carriers = None # Invalidate flow-to-carrier mapping + self._variable_categories = {} # Clear stale variable category mappings

🤖 Prompt for AI Agents

In `@flixopt/flow_system.py` around lines 259 - 262, Clear the stale _variable_categories mapping when the model is invalidated and when reset() is called: add self._variable_categories.clear() (or reassign {}) inside _invalidate_model() and reset() so segment expansion (transform.expand) can’t reuse old categories; and harden enum restoration by wrapping VariableCategory(value) in a try/except ValueError (or use a safe lookup) to map unknown/renamed values to a fallback enum (e.g., VariableCategory.UNKNOWN or a default) instead of raising, applying this safe restore wherever serialized values are converted back to VariableCategory.

coderabbitai · 2026-01-15T14:54:11Z

flixopt/transform_accessor.py

+        # Get multi-dimensional properties from Clustering
+        timestep_mapping = clustering.timestep_mapping
+        segment_assignments = clustering.results.segment_assignments
+        segment_durations = clustering.results.segment_durations
+        position_within_segment = clustering.results.position_within_segment

-                    # Get start and end boundary values
-                    start_val = charge_state_data[typical_cluster_idx, seg_idx]
-                    end_val = charge_state_data[typical_cluster_idx, seg_idx + 1]
+        # Decode timestep_mapping into cluster and time indices
+        time_dim_size = clustering.timesteps_per_cluster
+        cluster_indices = timestep_mapping // time_dim_size
+        time_indices = timestep_mapping % time_dim_size

-                    # Linear interpolation
-                    result[orig_cluster_idx * timesteps_per_cluster + t] = start_val + (end_val - start_val) * factor
+        # Get segment index and position for each original timestep
+        seg_indices = segment_assignments.isel(cluster=cluster_indices, time=time_indices)
+        positions = position_within_segment.isel(cluster=cluster_indices, time=time_indices)
+        durations = segment_durations.isel(cluster=cluster_indices, segment=seg_indices)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's find the file and examine the method in question fd "transform_accessor.py" -type f

Repository: flixOpt/flixopt

Length of output: 229

🏁 Script executed:

# Check the actual _interpolate_charge_state_segmented method sed -n '1821,1877p' flixopt/transform_accessor.py

Repository: flixOpt/flixopt

Length of output: 3019

🏁 Script executed:

# Find where timestep_mapping is built/assigned rg -n "timestep_mapping\s*=" --type py -B 2 -A 2 | head -80

Repository: flixOpt/flixopt

Length of output: 1229

🏁 Script executed:

# Look for build_expansion_divisor to see the actual encoding logic rg -n "def build_expansion_divisor" --type py -A 20

Repository: flixOpt/flixopt

Length of output: 1738

🏁 Script executed:

# Check expand_data method rg -n "def expand_data" --type py -A 25 | head -60

Repository: flixOpt/flixopt

Length of output: 1934

🏁 Script executed:

# Examine segment_assignments definition and dimensions rg -n "segment_assignments" --type py -B 1 -A 3 | head -120

Repository: flixOpt/flixopt

Length of output: 8202

🏁 Script executed:

# Look for how timestep_mapping is created initially rg -n "self.timestep_mapping\s*=" --type py -B 5 -A 5

Repository: flixOpt/flixopt

Length of output: 41

🏁 Script executed:

# Check position_within_segment definition rg -n "position_within_segment" --type py -B 1 -A 3 | head -80

Repository: flixOpt/flixopt

Length of output: 2006

🏁 Script executed:

# Look at the full expand_data implementation, especially the segmented case sed -n '877,920p' flixopt/clustering/base.py

Repository: flixOpt/flixopt

Length of output: 2161

🏁 Script executed:

# Find where timestep_mapping is actually created/built sed -n '100,150p' flixopt/clustering/base.py

Repository: flixOpt/flixopt

Length of output: 2365

🏁 Script executed:

# Check the full build_expansion_divisor to see how it uses timestep_mapping sed -n '924,980p' flixopt/clustering/base.py

Repository: flixOpt/flixopt

Length of output: 2670

🏁 Script executed:

# Let me trace through the actual usage context - check how _interpolate_charge_state_segmented is called rg -n "_interpolate_charge_state_segmented" --type-add 'py:*.py' -B 5 -A 5 | head -80

Repository: flixOpt/flixopt

Length of output: 1757

🏁 Script executed:

# Check what is_segmented actually means and how _interpolate_charge_state_segmented relates sed -n '2030,2045p' flixopt/transform_accessor.py

Repository: flixOpt/flixopt

Length of output: 826

🏁 Script executed:

# Let me verify the structure of segment_assignments more carefully sed -n '367,385p' flixopt/clustering/base.py

Repository: flixOpt/flixopt

Length of output: 827

🏁 Script executed:

# Check position_within_segment to understand its role sed -n '422,475p' flixopt/clustering/base.py

Repository: flixOpt/flixopt

Length of output: 2163

Fix timestep_mapping decoding for segmented systems at line 1851.

The code unconditionally uses clustering.timesteps_per_cluster to decode timestep_mapping, but for segmented systems the mapping is encoded with n_segments (as confirmed by expand_data() and build_expansion_divisor()). This produces incorrect indices.

Since this method is only called for segmented systems (line 2039: if is_state and clustering.is_segmented), the decoding will always fail. The decoded time_indices will be segment indices (0..n_segments-1) rather than within-cluster time indices (0..timesteps_per_cluster-1), causing a dimension mismatch when indexing segment_assignments and position_within_segment.

Change line 1851 to:

if clustering.is_segmented and clustering.n_segments is not None: time_dim_size = clustering.n_segments else: time_dim_size = clustering.timesteps_per_cluster

This aligns with the pattern used in expand_data() (lines 902-905) and build_expansion_divisor() (line 962).

🤖 Prompt for AI Agents

In `@flixopt/transform_accessor.py` around lines 1844 - 1858, The timestep_mapping decode wrongly always uses clustering.timesteps_per_cluster; for segmented systems use clustering.n_segments instead. In the block that computes time_dim_size (around where timestep_mapping, cluster_indices, and time_indices are computed), set time_dim_size = clustering.n_segments when clustering.is_segmented and clustering.n_segments is not None, otherwise use clustering.timesteps_per_cluster so that subsequent indexing into segment_assignments and position_within_segment uses the correct divisor (matching the logic in expand_data() and build_expansion_divisor()).

1. clustering/base.py - combine_slices() hardening (lines 52-118) - Added validation for empty input: if not slices: raise ValueError("slices cannot be empty") - Capture first array and preserve dtype: first = next(iter(slices.values())) → np.empty(shape, dtype=first.dtype) - Clearer error on missing keys with try/except: raise KeyError(f"Missing slice for key {key} (extra_dims={extra_dims})") 2. flow_system.py - Variable categories cleanup and safe enum restoration - Added self._variable_categories.clear() in _invalidate_model() (line 1692) to prevent stale categories from being reused - Hardened VariableCategory restoration (lines 922-930) with try/except to handle unknown/renamed enum values gracefully with a warning instead of crashing 3. transform_accessor.py - Correct timestep_mapping decode for segmented systems (lines 1850-1857) - For segmented systems, now uses clustering.n_segments instead of clustering.timesteps_per_cluster as the divisor - This matches the encoding logic in expand_data() and build_expansion_divisor()

…n class

…+variable-categories

…sion+variable-categories # Conflicts: # flixopt/structure.py

modeling.py:200-242 - Added category: VariableCategory = None parameter and passed it to both add_variables calls. Updated Callers ┌─────────────┬──────┬─────────────────────────┬────────────────────┐ │ File │ Line │ Variable │ Category │ ├─────────────┼──────┼─────────────────────────┼────────────────────┤ │ features.py │ 208 │ active_hours │ TOTAL │ ├─────────────┼──────┼─────────────────────────┼────────────────────┤ │ elements.py │ 682 │ total_flow_hours │ TOTAL │ ├─────────────┼──────┼─────────────────────────┼────────────────────┤ │ elements.py │ 709 │ flow_hours_over_periods │ TOTAL_OVER_PERIODS │ └─────────────┴──────┴─────────────────────────┴────────────────────┘ All expression tracking variables now properly register their categories for segment expansion handling. The pattern is consistent: callers specify the appropriate category based on what the tracked expression represents.

variable_categories property (line 1672): @Property def variable_categories(self) -> dict[str, VariableCategory]: """Variable categories for filtering and segment expansion.""" return self._variable_categories get_variables_by_category() method (line 1681): def get_variables_by_category( self, *categories: VariableCategory, from_solution: bool = True ) -> list[str]: """Get variable names matching any of the specified categories.""" Updated in statistics_accessor.py ┌───────────────┬──────────────────────────────────────────┬──────────────────────────────────────────────────┐ │ Property │ Before │ After │ ├───────────────┼──────────────────────────────────────────┼──────────────────────────────────────────────────┤ │ flow_rates │ endswith('|flow_rate') │ get_variables_by_category(FLOW_RATE) │ ├───────────────┼──────────────────────────────────────────┼──────────────────────────────────────────────────┤ │ flow_sizes │ endswith('|size') + flow_labels check │ get_variables_by_category(SIZE) + flow_labels │ ├───────────────┼──────────────────────────────────────────┼──────────────────────────────────────────────────┤ │ storage_sizes │ endswith('|size') + storage_labels check │ get_variables_by_category(SIZE) + storage_labels │ ├───────────────┼──────────────────────────────────────────┼──────────────────────────────────────────────────┤ │ charge_states │ endswith('|charge_state') │ get_variables_by_category(CHARGE_STATE) │ └───────────────┴──────────────────────────────────────────┴──────────────────────────────────────────────────┘ Benefits 1. Single source of truth - Categories defined once in VariableCategory enum 2. Easier maintenance - Adding new variable types only requires updating one place 3. Type safety - Using enum values instead of magic strings 4. Flexible filtering - Can filter by multiple categories: get_variables_by_category(SIZE, INVESTED) 5. Consistent naming - Uses rsplit('|', 1)[0] instead of replace('|suffix', '') for label extraction

1. New SIZE Sub-Categories (structure.py) - Added FLOW_SIZE and STORAGE_SIZE to differentiate flow vs storage investments - Kept SIZE for backward compatibility 2. InvestmentModel Updated (features.py) - Added size_category parameter to InvestmentModel.__init__() - Callers now specify the appropriate category 3. Variable Registrations Updated - elements.py: FlowModel uses FLOW_SIZE - components.py: StorageModel uses STORAGE_SIZE (2 locations) 4. Statistics Accessor Simplified (statistics_accessor.py) - flow_sizes: Now uses get_variables_by_category(FLOW_SIZE) directly - storage_sizes: Now uses get_variables_by_category(STORAGE_SIZE) directly - No more filtering by element labels after getting SIZE variables 5. Backward-Compatible Fallback (flow_system.py) - get_variables_by_category() handles old files: - FLOW_SIZE → matches |size suffix + flow labels - STORAGE_SIZE → matches |size suffix + storage labels 6. SOC Boundary Pattern Matching Replaced (transform_accessor.py) - Changed from endswith('|SOC_boundary') to get_variables_by_category(SOC_BOUNDARY) 7. Effect Variables Verified - PER_TIMESTEP ✓ (features.py:659) - SHARE ✓ (features.py:700 for temporal shares) - TOTAL / TOTAL_OVER_PERIODS ✓ (multiple locations) 8. Documentation Updated - _build_segment_total_varnames() marked as backwards-compatibility fallback Benefits - Cleaner code: No more string manipulation to filter by element type - Type safety: Using enum values instead of magic strings - Single source of truth: Categories defined once, used everywhere - Backward compatible: Old files still work via fallback logic

* Add collapse to IO * Temp * 1. Added dimension reduction functions in io.py: - _reduce_constant_dims(): Reduces dimensions where all values are constant - _expand_reduced_dims(): Restores original dimensions with correct order 2. Added reduce_constants parameter to FlowSystem.to_dataset(): - Default: False (opt-in) - When True: Reduces constant dimensions for memory efficiency 3. Updated FlowSystem.from_dataset(): - Expands both collapsed arrays (from NetCDF) and reduced dimensions 4. Key fixes: - Store original dimension order in attrs to preserve (cluster, time) vs (time, cluster) ordering - Skip solution variables in reduction (they're prefixed with solution|) The optimization is opt-in via to_dataset(reduce_constants=True). For file storage, save_dataset_to_netcdf still collapses constants to scalars by default. * Set reduce_constants=True by default in to_dataset() Benchmarks show 99%+ reduction in memory and file size for multi-period models, with faster IO speeds. The optimization is now enabled by default. Co-Authored-By: Claude Opus 4.5 <[email protected]> * Feature/speedup io+internals (#576) * Add safe isel wrappers for dimension-independent operations - Add _scalar_safe_isel_drop() to modeling.py for selecting from potentially reduced arrays (handles both scalar and array cases) - Update Storage validation to use _scalar_safe_isel for relative_minimum/maximum_charge_state access at time=0 - Update StorageModel._relative_charge_state_bounds to handle reduced arrays properly with dimension checks These wrappers prepare the codebase for future optimization where _expand_reduced_dims() could be removed from from_dataset(), keeping data compact throughout the system. Co-Authored-By: Claude Opus 4.5 <[email protected]> * Remove pre-broadcasting from DataConverter - data stays minimal Major change: DataConverter.to_dataarray() now validates dimensions instead of broadcasting to full target dimensions. This keeps data compact (scalars stay scalar, 1D arrays stay 1D) and lets linopy handle broadcasting at variable creation time. Changes: - core.py: Replace _broadcast_dataarray_to_target_specification with _validate_dataarray_dims in to_dataarray() method - components.py: Fix _relative_charge_state_bounds to handle scalar inputs that have no time dimension (expand to timesteps_extra) - conftest.py: Add assert_dims_compatible helper for tests - test_*.py: Update dimension assertions to use subset checking Memory impact (8760 timesteps × 20 periods): - Before: 24.1 MB dataset size - After: 137.1 KB dataset size Co-Authored-By: Claude Opus 4.5 <[email protected]> * Remove redundant dimension reduction - data is minimal from start Since DataConverter no longer broadcasts, these optimizations are redundant: - Remove _reduce_constant_dims() and _expand_reduced_dims() from io.py - Remove reduce_constants parameter from to_dataset() - Remove _expand_reduced_dims() call from from_dataset() Also ensure Dataset always has model coordinates (time, period, scenario, cluster) even if no data variable uses them. This is important for Dataset validity and downstream operations. Update benchmark_io.py to reflect the simplified API. Memory footprint now: - 24 timesteps: 568 bytes (23/24 vars are scalar) - 168 timesteps: 2.8 KB - 8760 timesteps × 20 periods: ~137 KB (vs 24 MB with old broadcasting) Co-Authored-By: Claude Opus 4.5 <[email protected]> * ⏺ Done. I've removed the _collapse_constant_arrays / _expand_collapsed_arrays optimization from the codebase. Here's a summary of the changes: Files modified: 1. flixopt/io.py: - Removed COLLAPSED_VAR_PREFIX constant - Removed _collapse_constant_arrays() function (~45 lines) - Removed _expand_collapsed_arrays() function (~40 lines) - Removed collapse_constants parameter from save_dataset_to_netcdf() - Removed expansion logic from load_dataset_from_netcdf() 2. flixopt/flow_system.py: - Removed import and call to _expand_collapsed_arrays in from_dataset() 3. benchmark_io.py: - Simplified benchmark_netcdf() to only benchmark single save/load (no more comparison) - Removed collapse_constants parameter from roundtrip function Rationale: Since data is now kept in minimal form throughout the system (scalars stay scalars, 1D arrays stay 1D), there's no need for the extra collapsing/expanding layer when saving to NetCDF. The file sizes are naturally small because the data isn't expanded to full dimensions. Test results: - All 10 IO tests pass - All 4 integration tests pass - Benchmark runs successfully * Summary of Fixes 1. _dataset_resample handling all-scalar data (transform_accessor.py) When all data variables are scalars (no time dimension), the resample function now: - Creates a dummy variable to resample the time coordinate - Preserves all original scalar data variables - Preserves all non-time coordinates (period, scenario, cluster) 2. _scalar_safe_reduce helper (modeling.py) Added a new helper function that safely performs aggregation operations (mean/sum/etc) over a dimension: - Returns reduced data if the dimension exists - Returns data unchanged if the dimension doesn't exist (scalar data) 3. Updated Storage intercluster linking (components.py) Used _scalar_safe_reduce for: - relative_loss_per_hour.mean('time') - timestep_duration.mean('time') 4. Updated clustering expansion (transform_accessor.py) Used _scalar_safe_reduce for relative_loss_per_hour.mean('time') 5. Fixed dimension order in expand_data (clustering/base.py) When expanding clustered data back to original timesteps: - Added transpose to ensure (cluster, time) dimension order before numpy indexing - Fixes IndexError when dimensions were in different order * Added reduction on loading old datasets * Summary of changes made: 1. flixopt/structure.py: Added dimension transpose in _ensure_coords() to ensure correct dimension order when data already has all dims but in wrong order 2. tests/test_io_conversion.py: Updated test_returns_same_object → test_returns_equivalent_dataset since convert_old_dataset now creates a new dataset when reducing constants 3. tests/deprecated/test_flow.py: Updated 6 assertions to expect minimal dimensions ('time',) instead of broadcasting to all model coords 4. tests/deprecated/test_component.py: Updated 2 assertions to expect minimal dimensions ('time',) 5. tests/deprecated/test_linear_converter.py: Updated 1 assertion to expect minimal dimensions ('time',) The key change is that data now stays in minimal form - a 1D time-varying array stays as ('time',) dimensions rather than being pre-broadcast to ('time', 'scenario') or other full model dimensions. Broadcasting happens at the linopy interface layer in FlowSystemModel.add_variables() when needed. --------- Co-authored-by: Claude Opus 4.5 <[email protected]> --------- Co-authored-by: Claude Opus 4.5 <[email protected]>

# Conflicts: # flixopt/clustering/base.py # flixopt/components.py

…+variable-categories # Conflicts: # flixopt/clustering/base.py # flixopt/components.py # flixopt/structure.py # flixopt/transform_accessor.py

FBumann added 10 commits January 15, 2026 10:06

Add IO for variable categories

47969a7

Removed the unnecessary lookup and use segment_indices directl

64ce6ce

FBumann marked this pull request as ready for review January 15, 2026 14:43

coderabbitai bot reviewed Jan 15, 2026

View reviewed changes

FBumann and others added 12 commits January 15, 2026 16:01

Added test_segmented_total_effects_match_solution to TestSegmentatio…

3cddeb8

…n class

Merge branch 'feature/tsam-v3+rework' into fix/segmentation-expansion…

325cbcf

…+variable-categories

Added all remaining tsam.aggregate() paramaters and missing type hint

dd671f2

Merge branch 'fix/segmentation-expansion' into fix/segmentation-expan…

b9e1608

…sion+variable-categories # Conflicts: # flixopt/structure.py

Ensure backwards compatability

cf59335

Merge branch 'main' into feature/tsam-v3+rework

872bbbd

# Conflicts: # flixopt/clustering/base.py # flixopt/components.py

Merge branch 'feature/tsam-v3+rework' into fix/segmentation-expansion…

c150fd5

…+variable-categories # Conflicts: # flixopt/clustering/base.py # flixopt/components.py # flixopt/structure.py # flixopt/transform_accessor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix/segmentation expansion+variable categories #574

Fix/segmentation expansion+variable categories #574

FBumann commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 15, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

FBumann commented Jan 15, 2026

Uh oh!

coderabbitai bot commented Jan 15, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 15, 2026

Uh oh!

coderabbitai bot Jan 15, 2026

Uh oh!

coderabbitai bot Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix/segmentation expansion+variable categories #574

Are you sure you want to change the base?

Fix/segmentation expansion+variable categories #574

Conversation

FBumann commented Jan 15, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Related Issues

Testing

Checklist

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

FBumann commented Jan 15, 2026

Uh oh!

coderabbitai bot commented Jan 15, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FBumann commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 15, 2026 •

edited

Loading