Skip to content

Conversation

@FBumann
Copy link
Member

@FBumann FBumann commented Jan 15, 2026

Description

Brief description of the changes in this PR.

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Code refactoring

Related Issues

Closes #(issue number)

Testing

  • I have tested my changes
  • Existing tests still pass

Checklist

  • My code follows the project style
  • I have updated documentation if needed
  • I have added tests for new functionality (if applicable)

Summary by CodeRabbit

Release Notes

  • New Features

    • Added a utility function for combining multi-dimensional data slices into structured arrays.
    • Introduced a new property to retrieve timestep positions within segments.
    • Added semantic categorization system for optimization variables to improve classification and handling.
  • Refactor

    • Optimized segmented data interpolation with vectorized operations for better performance.
    • Improved multi-dimensional data assembly and expansion logic.
  • Tests

    • Added comprehensive test coverage for data slice combination utility with dimensional and coordinate handling.

✏️ Tip: You can customize this high-level summary in your review settings.

  Changes Made

  1. Added VariableCategory enum (structure.py:64-77)
    - STATE - For state variables like charge_state (interpolated within segments)
    - SEGMENT_TOTAL - For segment totals like effect contributions (divided by expansion divisor)
    - RATE - For rate variables like flow_rate (expanded as-is)
    - BINARY - For binary variables like status (expanded as-is)
    - OTHER - For uncategorized variables
  2. Added variable_categories registry to FlowSystemModel (structure.py:214)
    - Dictionary mapping variable names to their categories
  3. Modified add_variables() method (structure.py:388-396)
    - Added optional category parameter
    - Automatically registers variables with their category
  4. Updated variable creation calls:
    - components.py: Storage variables (charge_state as STATE, netto_discharge as RATE)
    - elements.py: Flow variables (flow_rate as RATE, status as BINARY)
    - features.py: Effect contributions (per_timestep as SEGMENT_TOTAL, temporal shares as SEGMENT_TOTAL, startup/shutdown as BINARY)
  5. Updated expand() method (transform_accessor.py:2074-2090)
    - Uses variable_categories registry to identify segment totals and state variables
    - Falls back to pattern matching for backwards compatibility with older FlowSystems

  Benefits

  - More robust categorization: Variables are categorized at creation time, not by pattern matching
  - Extensible: New variable types can easily be added with proper category
  - Backwards compatible: Old FlowSystems without categories still work via pattern matching fallback
  New Categories (structure.py:45-103)

  class VariableCategory(Enum):
      # State variables
      CHARGE_STATE, SOC_BOUNDARY

      # Rate/Power variables
      FLOW_RATE, NETTO_DISCHARGE, VIRTUAL_FLOW

      # Binary state
      STATUS, INACTIVE

      # Binary events
      STARTUP, SHUTDOWN

      # Effect variables
      PER_TIMESTEP, SHARE, TOTAL, TOTAL_OVER_PERIODS

      # Investment
      SIZE, INVESTED

      # Counting/Duration
      STARTUP_COUNT, DURATION

      # Piecewise linearization
      INSIDE_PIECE, LAMBDA0, LAMBDA1, ZERO_POINT

      # Other
      OTHER

  Logical Groupings for Expansion

  EXPAND_INTERPOLATE = {CHARGE_STATE}      # Interpolate between boundaries
  EXPAND_DIVIDE = {PER_TIMESTEP, SHARE}    # Divide by expansion factor
  # Default: repeat within segment

  Files Modified
  ┌───────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
  │         File          │                                                          Variables Updated                                                          │
  ├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ components.py         │ charge_state, netto_discharge, SOC_boundary                                                                                         │
  ├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ elements.py           │ flow_rate, status, virtual_supply, virtual_demand                                                                                   │
  ├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ features.py           │ size, invested, inactive, startup, shutdown, startup_count, inside_piece, lambda0, lambda1, zero_point, total, per_timestep, shares │
  ├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ effects.py            │ total, total_over_periods                                                                                                           │
  ├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ modeling.py           │ duration                                                                                                                            │
  ├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ transform_accessor.py │ Updated to use EXPAND_INTERPOLATE and EXPAND_DIVIDE groupings                                                                       │
  └───────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
  Test Results

  - All 83 cluster/expand tests pass
  - Variable categories correctly populated and grouped
  Changes Made

  1. Added combine_slices() utility to flixopt/clustering/base.py (lines 52-107)
    - Simple function that stacks dict of {(dim_values): np.ndarray} into a DataArray
    - Much cleaner than the previous reverse-concat pattern
  2. Refactored 3 methods to use the new utility:
    - Clustering.expand_data() - reduced from ~25 to ~12 lines
    - Clustering.build_expansion_divisor() - reduced from ~35 to ~20 lines
    - TransformAccessor._interpolate_charge_state_segmented() - reduced from ~43 to ~27 lines
  3. Added 4 unit tests for combine_slices() in tests/test_cluster_reduce_expand.py

  Results
  ┌───────────────────────────────────┬──────────┬────────────────────────┐
  │              Metric               │  Before  │         After          │
  ├───────────────────────────────────┼──────────┼────────────────────────┤
  │ Complex reverse-concat blocks     │ 3        │ 0                      │
  ├───────────────────────────────────┼──────────┼────────────────────────┤
  │ Lines of dimension iteration code │ ~100     │ ~60                    │
  ├───────────────────────────────────┼──────────┼────────────────────────┤
  │ Test coverage                     │ 83 tests │ 87 tests (all passing) │
  └───────────────────────────────────┴──────────┴────────────────────────┘
  The Pattern Change

  Before (complex reverse-concat):
  result_arrays = slices
  for dim in reversed(extra_dims):
      grouped = {}
      for key, arr in result_arrays.items():
          rest_key = key[:-1] if len(key) > 1 else ()
          grouped.setdefault(rest_key, []).append(arr)
      result_arrays = {k: xr.concat(v, dim=...) for k, v in grouped.items()}
  result = list(result_arrays.values())[0].transpose('time', ...)

  After (simple combine):
  return combine_slices(slices, extra_dims, dim_coords, 'time', output_coord, attrs)
  1. Fully Vectorized expand_data()

  Before (~65 lines with loops):
  for combo in np.ndindex(*[len(v) for v in dim_coords.values()]):
      selector = {...}
      mapping = _select_dims(timestep_mapping, **selector).values
      data_slice = _select_dims(aggregated, **selector)
      slices[key] = _expand_slice(mapping, data_slice)
  return combine_slices(slices, ...)

  After (~25 lines, fully vectorized):
  timestep_mapping = self.timestep_mapping  # Already multi-dimensional!
  cluster_indices = timestep_mapping // time_dim_size
  time_indices = timestep_mapping % time_dim_size
  expanded = aggregated.isel(cluster=cluster_indices, time=time_indices)
  # xarray handles broadcasting across period/scenario automatically

  2. build_expansion_divisor() and _interpolate_charge_state_segmented()

  These still use combine_slices() because they need per-result segment data (segment_assignments, segment_durations) which isn't available as concatenated Clustering properties yet.

  Current State
  ┌───────────────────────────────────────┬─────────────────┬─────────────────────────────────┐
  │                Method                 │   Vectorized?   │   Uses Clustering Properties    │
  ├───────────────────────────────────────┼─────────────────┼─────────────────────────────────┤
  │ expand_data()                         │ Yes             │ timestep_mapping (fully)        │
  ├───────────────────────────────────────┼─────────────────┼─────────────────────────────────┤
  │ build_expansion_divisor()             │ No (small loop) │ cluster_assignments (partially) │
  ├───────────────────────────────────────┼─────────────────┼─────────────────────────────────┤
  │ _interpolate_charge_state_segmented() │ No (small loop) │ cluster_assignments (partially) │
  └───────────────────────────────────────┴─────────────────┴─────────────────────────────────┘
  1. _interpolate_charge_state_segmented() - Fully vectorized from ~110 lines to ~55 lines
    - Uses clustering.timestep_mapping for indexing
    - Uses clustering.results.segment_assignments, segment_durations, and position_within_segment
    - Single xarray expression instead of triple-nested loops

  Previously completed (from before context limit):
  - Added segment_assignments multi-dimensional property to ClusteringResults
  - Added segment_durations multi-dimensional property to ClusteringResults
  - Added position_within_segment property to ClusteringResults
  - Vectorized expand_data()
  - Vectorized build_expansion_divisor()

  Test results: All 130 tests pass (87 cluster/expand + 43 IO tests)

  The combine_slices utility function is still available in clustering/base.py if needed in the future, but all the main dimension-handling methods now use xarray's vectorized advanced indexing instead of the loop-based slice-and-combine pattern.
  Summary of Simplifications

  1. expand_da() in transform_accessor.py

  - Extracted duplicate "append extra timestep" logic into _append_final_state() helper
  - Reduced from ~50 lines to ~25 lines
  - Eliminated code duplication

  2. _build_multi_dim_array() → _build_property_array() in clustering/base.py

  - Replaced 6 conditional branches with unified np.ndindex() pattern
  - Now handles both simple and multi-dimensional cases in one method
  - Reduced from ~50 lines to ~25 lines
  - Preserves dtype (fixed integer indexing bug)

  3. Property boilerplate in ClusteringResults

  - 5 properties (cluster_assignments, cluster_occurrences, cluster_centers, segment_assignments, segment_durations) now use the unified _build_property_array()
  - Each property reduced from ~25 lines to ~8 lines
  - Total: ~165 lines → ~85 lines

  4. _build_timestep_mapping() in Clustering

  - Simplified to single call using _build_property_array()
  - Reduced from ~16 lines to ~9 lines

  Total lines removed: ~150+ lines of duplicated/complex code
…as fixed:

  Summary

  The IO roundtrip bug was caused by representative_weights (a variable with only ('cluster',) dimension) being copied as-is during expansion, which caused the cluster dimension to incorrectly persist in the expanded dataset.

  Fix applied in transform_accessor.py:2063-2065:
  # Skip cluster-only vars (no time dim) - they don't make sense after expansion
  if da.dims == ('cluster',):
      continue

  This skips variables that have only a cluster dimension (and no time dimension) during expansion, as these variables don't make sense after the clustering structure is removed.

  Test results:
  - All 87 tests in test_cluster_reduce_expand.py pass ✓
  - All 43 tests in test_clustering_io.py pass ✓
  - Manual IO roundtrip test passes ✓
  - Tests with different segment counts (3, 6) pass ✓
  - Tests with 2-hour timesteps pass ✓
  # Skip vars with cluster dim but no time dim - they don't make sense after expansion
  # (e.g., representative_weights with dims ('cluster',) or ('cluster', 'period'))
  if 'cluster' in da.dims and 'time' not in da.dims:
      continue

  This correctly handles:
  - ('cluster',) - simple cluster-only variables like cluster_weight
  - ('cluster', 'period') - cluster variables with period dimension
  - ('cluster', 'scenario') - cluster variables with scenario dimension
  - ('cluster', 'period', 'scenario') - cluster variables with both

  Variables with both cluster and time dimensions (like timestep_duration with dims ('cluster', 'time')) are correctly expanded since they contain time-series data that needs to be mapped back to original timesteps.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 15, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

Introduces a VariableCategory enum for semantic classification of optimization variables, propagates categories through variable creation across multiple model components, and implements vectorized expansion/interpolation logic for segmented systems. New clustering utilities (combine_slices) and properties (position_within_segment) support multi-dimensional coordinate handling.

Changes

Cohort / File(s) Summary
Variable Categorization System
flixopt/structure.py
Introduces VariableCategory enum with semantic categories (STATE, FLOW_RATE, STATUS, CHARGE_STATE, etc.) and expansion-related groupings (EXPAND_INTERPOLATE, EXPAND_DIVIDE, EXPAND_FIRST_TIMESTEP). Updates FlowSystemModel and Submodel.add_variables to track and register variable categories.
Category Annotation Across Components
flixopt/components.py, flixopt/effects.py, flixopt/elements.py, flixopt/features.py
Imports VariableCategory and annotates variables with appropriate categories during creation: CHARGE_STATE, NETTO_DISCHARGE, SOC_BOUNDARY, TOTAL, FLOW_RATE, STATUS, VIRTUAL_FLOW, SIZE, INVESTED, STARTUP, etc.
Flow System Integration
flixopt/flow_system.py, flixopt/modeling.py
Imports VariableCategory; adds _variable_categories tracking in FlowSystem; implements serialization/deserialization of variable categories into dataset attributes; preserves categories through solve operations.
Clustering Utilities & Vectorization
flixopt/clustering/base.py
Adds combine_slices utility function for constructing multi-dimensional DataArrays from slice mappings. Introduces position_within_segment public property for computing zero-based timestep indices within segments. Refactors internal data-assembly with _build_property_array for flexible multi-dimensional support.
Expansion & Interpolation Logic
flixopt/transform_accessor.py
Replaces explicit loop-based interpolation with vectorized multi-dimensional approach for segmented charge state. Adds state variable detection and final-state appending; uses EXPAND_DIVIDE registry for segment-total division. Consolidates extra-dimension handling into unified vectorized operations.
Test Coverage
tests/test_cluster_reduce_expand.py
Adds TestCombineSlices test class with four test methods covering single/multi-dimensional cases, attribute propagation, and datetime coordinate handling for the new combine_slices utility.

Sequence Diagram(s)

sequenceDiagram
    participant VarCreation as Variable Creation<br/>(add_variables)
    participant FlowSystem as FlowSystem
    participant VarCatRegistry as Variable Category<br/>Registry
    participant Serialization as Serialization<br/>(to_dataset)
    participant Expansion as Expansion Logic<br/>(transform_accessor)
    
    VarCreation->>FlowSystem: add_variables(name, category=VC.FLOW_RATE)
    FlowSystem->>VarCatRegistry: Store mapping {name: category}
    VarCatRegistry-->>FlowSystem: Registered
    
    FlowSystem->>Serialization: to_dataset()
    Serialization->>Serialization: Extract variable_categories dict
    Serialization-->>Serialization: Serialize to dataset attrs
    
    Note over Serialization: Dataset persisted with<br/>variable category metadata
    
    Expansion->>VarCatRegistry: Lookup category for variable
    VarCatRegistry-->>Expansion: Return category (e.g., EXPAND_DIVIDE)
    Expansion->>Expansion: Apply vectorized expansion<br/>based on category
    Expansion-->>Expansion: Generate expanded DataArray
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 Behold! Variables now wear their categories with pride,
Like labeled carrots sorted by size, shape, and type!
Slices combine, segments expand with vectorized grace,
And clustering utilities dance through multi-dim space.

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description consists entirely of a template with uncompleted sections—no actual description of changes, objectives, or testing is provided by the author. Replace the template placeholder with a concrete description of the changes, including what was fixed, why the changes were needed, and a summary of test results.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Fix/segmentation expansion+variable categories' is closely related to the changeset, referencing two main aspects (segmentation expansion and variable categories) that are prominent throughout the code changes.
Docstring Coverage ✅ Passed Docstring coverage is 90.74% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@FBumann FBumann marked this pull request as ready for review January 15, 2026 14:43
@FBumann
Copy link
Member Author

FBumann commented Jan 15, 2026

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 15, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@flixopt/clustering/base.py`:
- Around line 52-108: combine_slices currently can break dtypes and crash
silently on empty input and missing keys; fix by validating input non-empty at
the top (if not slices: raise ValueError("slices cannot be empty")), capture the
first array via first = next(iter(slices.values())) and use its dtype when
allocating the output buffer (data = np.empty(shape, dtype=first.dtype)), and
replace the raw dict lookup inside the loop with a try/except that raises a
clearer KeyError including the missing key and the extra_dims (e.g., try:
data[...] = slices[key] except KeyError: raise KeyError(f"Missing slice for key
{key} (extra_dims={extra_dims})")).

In `@flixopt/flow_system.py`:
- Around line 259-262: Clear the stale _variable_categories mapping when the
model is invalidated and when reset() is called: add
self._variable_categories.clear() (or reassign {}) inside _invalidate_model()
and reset() so segment expansion (transform.expand) can’t reuse old categories;
and harden enum restoration by wrapping VariableCategory(value) in a try/except
ValueError (or use a safe lookup) to map unknown/renamed values to a fallback
enum (e.g., VariableCategory.UNKNOWN or a default) instead of raising, applying
this safe restore wherever serialized values are converted back to
VariableCategory.

In `@flixopt/transform_accessor.py`:
- Around line 1844-1858: The timestep_mapping decode wrongly always uses
clustering.timesteps_per_cluster; for segmented systems use
clustering.n_segments instead. In the block that computes time_dim_size (around
where timestep_mapping, cluster_indices, and time_indices are computed), set
time_dim_size = clustering.n_segments when clustering.is_segmented and
clustering.n_segments is not None, otherwise use
clustering.timesteps_per_cluster so that subsequent indexing into
segment_assignments and position_within_segment uses the correct divisor
(matching the logic in expand_data() and build_expansion_divisor()).
🧹 Nitpick comments (3)
flixopt/structure.py (1)

1670-1698: Submodel.add_variables(..., category=...) wiring looks correct; tighten typing + consider overwrite semantics.

Two small tweaks:

  • Type hint as VariableCategory | None for clarity.
  • Decide whether re-registering the same variable.name with a different category should warn (today it silently overwrites).
flixopt/clustering/base.py (2)

326-405: _build_property_array refactor is a solid direction; one ordering edge-case to confirm.

This is much cleaner than hand-rolling multidim assembly per property, and preserving dtype via np.empty(..., dtype=...) is a good touch. Only thing to double-check: key construction assumes extra dims are ordered (period, scenario); if self._dim_names order ever differs, self._results[key] lookups can break.

Also applies to: 523-558


422-474: position_within_segment implementation looks correct; consider output_dtypes for apply_ufunc.

If this ever runs with dask-backed arrays, xr.apply_ufunc(..., vectorize=True) can benefit from explicit output_dtypes=[...] to avoid dtype inference surprises.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3336eb5 and 3bdcd1d.

📒 Files selected for processing (10)
  • flixopt/clustering/base.py
  • flixopt/components.py
  • flixopt/effects.py
  • flixopt/elements.py
  • flixopt/features.py
  • flixopt/flow_system.py
  • flixopt/modeling.py
  • flixopt/structure.py
  • flixopt/transform_accessor.py
  • tests/test_cluster_reduce_expand.py
🧰 Additional context used
🧬 Code graph analysis (4)
flixopt/modeling.py (1)
flixopt/structure.py (2)
  • Submodel (1646-1844)
  • VariableCategory (45-90)
flixopt/features.py (1)
flixopt/structure.py (9)
  • FlowSystemModel (149-413)
  • Submodel (1646-1844)
  • VariableCategory (45-90)
  • add_variables (1670-1698)
  • get_coords (358-393)
  • get_coords (1748-1753)
  • dims (293-295)
  • label_full (1214-1215)
  • label_full (1779-1780)
tests/test_cluster_reduce_expand.py (1)
flixopt/clustering/base.py (7)
  • combine_slices (52-107)
  • dims (205-207)
  • dims (685-687)
  • values (292-294)
  • values (1306-1313)
  • coords (215-221)
  • coords (690-700)
flixopt/transform_accessor.py (2)
flixopt/clustering/base.py (19)
  • timestep_mapping (802-808)
  • segment_assignments (368-382)
  • segment_assignments (841-849)
  • results (1221-1232)
  • segment_durations (385-404)
  • segment_durations (852-860)
  • position_within_segment (423-473)
  • timesteps_per_cluster (309-311)
  • timesteps_per_cluster (665-667)
  • isel (244-268)
  • is_segmented (752-758)
  • build_expansion_divisor (924-969)
  • items (284-286)
  • items (1291-1298)
  • values (292-294)
  • values (1306-1313)
  • dims (205-207)
  • dims (685-687)
  • expand_data (877-922)
flixopt/flow_system.py (3)
  • isel (2581-2607)
  • is_segmented (2243-2249)
  • dims (2137-2157)
🔇 Additional comments (24)
flixopt/structure.py (2)

158-164: FlowSystemModel.variable_categories is a good central registry—ensure it’s kept consistent in IO paths.

This is a clean “single source of truth” for expansion metadata; just ensure load/save roundtrips restore it (and don’t silently drop it).


16-104: No changes needed — serialization is already handled correctly.

The enum values are already serialization-friendly. The code explicitly converts VariableCategory enum members to their .value strings during JSON serialization (lines 751–754 in flow_system.py) and reconstructs them from strings on deserialization (lines 919–924). This is proper handling and requires no modification.

Using frozenset for the expansion constants is a reasonable optional improvement for immutability (no actual mutation risk exists today), but it is not necessary for correctness or serialization.

Likely an incorrect or invalid review comment.

flixopt/features.py (4)

14-15: Importing VariableCategory for tagging is straightforward.


67-81: Investment variables tagged sensibly (SIZE, INVESTED).


197-257: Status tracking tags (INACTIVE, STARTUP/SHUTDOWN, STARTUP_COUNT) align with the declared expansion groupings.

One thing to double-check: coords=self.get_coords() (Line 224/230) includes 'time' in all configurations where startup/shutdown constraints are built.


403-425: Piecewise-related tagging (INSIDE_PIECE, LAMBDA0/1, ZERO_POINT) looks consistent.

Also applies to: 515-520

flixopt/modeling.py (2)

7-9: Category import is fine.


296-307: Tagging the duration tracking variable as DURATION is consistent with the new registry.

Please verify duration_per_step is always an xr.DataArray here (it’s typed as scalar-or-DataArray, but sum(...) is used earlier), otherwise this path can still break irrespective of the new category.

flixopt/components.py (3)

20-21: Import update for tagging is fine.


935-949: Storage tagging matches intended expansion semantics; verify extra-timestep interpolation path.

charge_state is both extra_timestep=True and CHARGE_STATE; make sure segmented expansion handles the last boundary timestep correctly (no off-by-one / dropped final point).


1344-1351: Tagging SOC_boundary as SOC_BOUNDARY is consistent and should help expansion logic.

flixopt/effects.py (2)

20-28: Import block update is OK.


383-409: Effect total variables are categorized appropriately (TOTAL, TOTAL_OVER_PERIODS).

Just ensure expansion code doesn’t try to “expand” these when there’s no time/cluster dim (should be a no-op).

tests/test_cluster_reduce_expand.py (1)

1398-1478: Nice targeted coverage for combine_slices (dims/coords/attrs + DatetimeIndex).

These tests should catch most regressions in dimension ordering, coordinate wiring, and attribute propagation.

flixopt/elements.py (4)

19-25: Importing VariableCategory here is appropriate for centralized variable tagging.


671-677: Flow rate variable categorization looks correct (FLOW_RATE).


730-736: Status variables are consistently tagged (STATUS) for both FlowModel and ComponentModel.

Also applies to: 1043-1049


966-978: Virtual imbalance vars are correctly categorized as VIRTUAL_FLOW.

flixopt/clustering/base.py (1)

877-970: Vectorized expand_data / build_expansion_divisor is great—please sanity-check against pinned xarray.

The advanced-indexing approach should scale better and eliminates nested loops over period/scenario. The cleanup (drop_vars(['cluster','time',...]) + renaming original_time→time) is sensible, but behavior can differ subtly across xarray versions (coords retained vs. turned into non-dim coords).

Also applies to: 1072-1081

flixopt/transform_accessor.py (5)

19-20: LGTM!

Clean import of the new category constants used for registry-based expansion logic below.


1860-1877: Interpolation logic is mathematically sound.

The interpolation factor calculation and linear interpolation between segment boundaries is correct:

  • Uses mid-point positioning within segments (positions + 0.5)
  • Handles single-timestep segments with fallback to 0.5
  • Properly indexes start (seg_indices) and end (seg_indices + 1) boundary values

2058-2067: Good defensive handling for cluster-only variables.

The new skip logic at lines 2065-2066 correctly excludes variables that have a cluster dimension but no time dimension. These variables (e.g., representative_weights) are cluster-level metadata that don't make sense after expansion to original timesteps.


2031-2053: Clear control flow for variable expansion paths.

The expand_da function correctly handles the three expansion paths:

  1. Segmented state variables: Interpolate within segments, then append final boundary value
  2. Segment total variables: Expand and divide by segment duration to get hourly rates
  3. Other variables (including non-segmented state): Standard expansion with final state append for state vars

The mutual exclusivity between state variables and segment totals ensures no variable gets both treatments.


2002-2029: Good backward-compatible registry approach with pattern-matching fallback.

The implementation correctly:

  1. Attempts registry-based lookup first via variable_categories (lines 2006, 2014)
  2. Falls back to structural pattern matching for older FlowSystems without categories
  3. Properly extracts and appends final state values for state variables

The membership checks are valid: EXPAND_DIVIDE and EXPAND_INTERPOLATE are sets of VariableCategory values defined in structure.py (lines 96, 99), supporting the in operator used at lines 2006 and 2014.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines 52 to 108
def combine_slices(
slices: dict[tuple, np.ndarray],
extra_dims: list[str],
dim_coords: dict[str, list],
output_dim: str,
output_coord: Any,
attrs: dict | None = None,
) -> xr.DataArray:
"""Combine {(dim_values): 1D_array} dict into a DataArray.
This utility simplifies the common pattern of iterating over extra dimensions
(like period, scenario), processing each slice, and combining results.
Args:
slices: Dict mapping dimension value tuples to 1D numpy arrays.
Keys are tuples like ('period1', 'scenario1') matching extra_dims order.
extra_dims: Dimension names in order (e.g., ['period', 'scenario']).
dim_coords: Dict mapping dimension names to coordinate values.
output_dim: Name of the output dimension (typically 'time').
output_coord: Coordinate values for output dimension.
attrs: Optional DataArray attributes.
Returns:
DataArray with dims [output_dim, *extra_dims].
Example:
>>> slices = {
... ('P1', 'base'): np.array([1, 2, 3]),
... ('P1', 'high'): np.array([4, 5, 6]),
... ('P2', 'base'): np.array([7, 8, 9]),
... ('P2', 'high'): np.array([10, 11, 12]),
... }
>>> result = combine_slices(
... slices,
... extra_dims=['period', 'scenario'],
... dim_coords={'period': ['P1', 'P2'], 'scenario': ['base', 'high']},
... output_dim='time',
... output_coord=[0, 1, 2],
... )
>>> result.dims
('time', 'period', 'scenario')
"""
n_output = len(next(iter(slices.values())))
shape = [n_output] + [len(dim_coords[d]) for d in extra_dims]
data = np.empty(shape)

for combo in np.ndindex(*shape[1:]):
key = tuple(dim_coords[d][i] for d, i in zip(extra_dims, combo, strict=True))
data[(slice(None),) + combo] = slices[key]

return xr.DataArray(
data,
dims=[output_dim] + extra_dims,
coords={output_dim: output_coord, **dim_coords},
attrs=attrs or {},
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Find project configuration files
fd -HI 'pyproject\.toml|setup\.cfg|setup\.py|requirements.*\.txt|environment\.yml|pixi\.toml' . -type f

Repository: flixOpt/flixopt

Length of output: 229


🏁 Script executed:

# Check the actual combine_slices implementation and surrounding context
cat -n flixopt/clustering/base.py | head -120

Repository: flixOpt/flixopt

Length of output: 5161


🏁 Script executed:

# Search for Python version requirements and dependency pins
rg -n "requires-python|python_requires|Programming Language :: Python ::|xarray|numpy" . --type toml --type cfg --type txt -A 2

Repository: flixOpt/flixopt

Length of output: 86


🏁 Script executed:

# Find config files
find . -maxdepth 2 -name "pyproject.toml" -o -name "setup.py" -o -name "setup.cfg" -o -name "requirements*.txt" -o -name "pixi.toml" 2>/dev/null

Repository: flixOpt/flixopt

Length of output: 75


🏁 Script executed:

# Search for Python version and dependency information
rg "requires-python|python_requires|Programming Language :: Python" . -A 2

Repository: flixOpt/flixopt

Length of output: 537


🏁 Script executed:

# Look for tests of combine_slices
rg -n "combine_slices" . -B 2 -A 5

Repository: flixOpt/flixopt

Length of output: 9577


🏁 Script executed:

# Get full test class for combine_slices
sed -n '1398,1500p' ./tests/test_cluster_reduce_expand.py

Repository: flixOpt/flixopt

Length of output: 2955


🏁 Script executed:

# Check if there are more tests beyond line 1500
sed -n '1500,1600p' ./tests/test_cluster_reduce_expand.py

Repository: flixOpt/flixopt

Length of output: 41


🏁 Script executed:

# Check xarray and numpy versions in pyproject.toml
cat ./pyproject.toml | grep -A 20 "dependencies"

Repository: flixOpt/flixopt

Length of output: 1738


🏁 Script executed:

# Check how combine_slices is actually used in the codebase
rg -B 5 -A 10 "combine_slices\(" ./flixopt --type py | head -60

Repository: flixOpt/flixopt

Length of output: 2088


🏁 Script executed:

# Check if there are any integer or boolean dtype usages in the codebase related to clustering
rg "dtype.*int|dtype.*bool" ./flixopt/clustering -A 2 -B 2

Repository: flixOpt/flixopt

Length of output: 449


🏁 Script executed:

# Check if any code passes integer arrays to combine_slices or similar
rg "_build_property_array|combine_slices" ./flixopt/clustering -B 3 -A 15 | grep -A 15 "np.array"

Repository: flixOpt/flixopt

Length of output: 5321


🏁 Script executed:

# Check test file for any integer array tests
grep -n "np.array.*\[.*[0-9]" ./tests/test_cluster_reduce_expand.py | grep -v "1.0\|2.0\|3.0\|4.0\|5.0\|6.0\|7.0\|8.0\|9.0\|10\|11\|12" | head -20

Repository: flixOpt/flixopt

Length of output: 41


🏁 Script executed:

# Check if there are tests for edge cases like empty dict or missing keys
rg -A 5 "empty|missing|KeyError|StopIteration" ./tests/test_cluster_reduce_expand.py | grep -A 5 "TestCombineSlices"

Repository: flixOpt/flixopt

Length of output: 41


🏁 Script executed:

# Look at _build_property_array to see what arrays it creates
rg -B 5 -A 20 "_build_property_array" ./flixopt/clustering/base.py | head -100

Repository: flixOpt/flixopt

Length of output: 3636


🏁 Script executed:

# Check the _cluster_occurrences function to confirm it returns integers
rg -B 2 -A 10 "_cluster_occurrences" ./flixopt/clustering/base.py

Repository: flixOpt/flixopt

Length of output: 1035


🏁 Script executed:

# Get the full _build_property_array implementation
rg -A 30 "def _build_property_array" ./flixopt/clustering/base.py

Repository: flixOpt/flixopt

Length of output: 1433


🏁 Script executed:

# Check what happens with integer arrays - specifically look for cluster_assignments and similar
rg "cluster_assignments|cluster_occurrences" ./flixopt/clustering/base.py -B 2 -A 8 | grep -A 8 "_build_property_array"

Repository: flixOpt/flixopt

Length of output: 673


🏁 Script executed:

# Check if combine_slices is actually imported or called in the codebase
rg "from.*combine_slices|import.*combine_slices" ./flixopt --type py

Repository: flixOpt/flixopt

Length of output: 41


🏁 Script executed:

# Search for actual calls to combine_slices (not just definitions)
rg "combine_slices\(" ./flixopt --type py -B 3 | grep -v "def combine_slices"

Repository: flixOpt/flixopt

Length of output: 440


🏁 Script executed:

# Verify what _cluster_occurrences returns - is it definitely integers?
python3 << 'EOF'
from collections import Counter
import numpy as np

counts = Counter([1, 1, 2, 3, 3, 3])
result = np.array([counts.get(i, 0) for i in range(4)])
print(f"Type: {type(result[0])}")
print(f"Dtype: {result.dtype}")
print(f"Values: {result}")
EOF

Repository: flixOpt/flixopt

Length of output: 145


combine_slices: preserve dtype and add input validation.

The function uses np.empty(shape) which defaults to float64, causing integer and boolean arrays to lose their dtype. Additionally, next(iter(slices.values())) will raise StopIteration if the dict is empty, and missing keys will surface with an unhelpful KeyError.

Suggested fixes:

  • Preserve dtype: data = np.empty(shape, dtype=first.dtype) where first = next(iter(slices.values()))
  • Validate non-empty input: if not slices: raise ValueError("slices cannot be empty")
  • Enhance key lookup errors with context about which key was missing

Note: zip(..., strict=True) is valid—the project requires Python ≥3.11, which supports this since Python 3.10.

🤖 Prompt for AI Agents
In `@flixopt/clustering/base.py` around lines 52 - 108, combine_slices currently
can break dtypes and crash silently on empty input and missing keys; fix by
validating input non-empty at the top (if not slices: raise ValueError("slices
cannot be empty")), capture the first array via first =
next(iter(slices.values())) and use its dtype when allocating the output buffer
(data = np.empty(shape, dtype=first.dtype)), and replace the raw dict lookup
inside the loop with a try/except that raises a clearer KeyError including the
missing key and the extra_dims (e.g., try: data[...] = slices[key] except
KeyError: raise KeyError(f"Missing slice for key {key}
(extra_dims={extra_dims})")).

Comment on lines +259 to +262
# Variable categories for segment expansion handling
# Populated when model is built, used by transform.expand()
self._variable_categories: dict[str, VariableCategory] = {}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

rg -n --hidden --no-ignore -S "_variable_categories" .

Repository: flixOpt/flixopt

Length of output: 645


🏁 Script executed:

# Examine the from_dataset restore at lines 920-930
sed -n '920,930p' flixopt/flow_system.py

# Examine to_dataset serialization at lines 745-765
sed -n '745,765p' flixopt/flow_system.py

# Examine solve() copy at lines 1630-1640
sed -n '1630,1640p' flixopt/flow_system.py

# Look for invalidate/reset methods
rg -n "(invalidate|reset|clear)" flixopt/flow_system.py | head -20

Repository: flixOpt/flixopt

Length of output: 3312


🏁 Script executed:

# Examine _invalidate_model() method to see if it clears _variable_categories
sed -n '1674,1710p' flixopt/flow_system.py

# Also check reset() method
rg -n "def reset" flixopt/flow_system.py -A 15

Repository: flixOpt/flixopt

Length of output: 2164


🏁 Script executed:

# Get the full reset() method implementation
sed -n '1697,1730p' flixopt/flow_system.py

# Also check if VariableCategory enum can raise ValueError
rg -n "class VariableCategory" . -A 10

Repository: flixOpt/flixopt

Length of output: 2103


🏁 Script executed:

# Examine the full from_dataset context around lines 920-925 to understand error handling
sed -n '900,935p' flixopt/flow_system.py

# Check if VariableCategory enum has specific values defined
sed -n '45,100p' flixopt/structure.py

Repository: flixOpt/flixopt

Length of output: 4034


Clear _variable_categories on model invalidation and harden enum restore against unknown values.

The _variable_categories mapping is not cleared by _invalidate_model() or reset(), allowing stale categories to persist across structural changes and cause incorrect behavior in segment expansion. Additionally, VariableCategory(value) raises ValueError on unknown/renamed enum values, making deserialization brittle across enum evolution.

Hardening for enum restore with graceful fallback and cleanup
-        if 'variable_categories' in reference_structure:
-            categories_dict = json.loads(reference_structure['variable_categories'])
-            # Convert string values back to VariableCategory enum
-            flow_system._variable_categories = {
-                name: VariableCategory(value) for name, value in categories_dict.items()
-            }
+        if 'variable_categories' in reference_structure:
+            categories_dict = json.loads(reference_structure['variable_categories'])
+            restored: dict[str, VariableCategory] = {}
+            for name, value in categories_dict.items():
+                try:
+                    restored[name] = VariableCategory(value)
+                except ValueError:
+                    logger.warning("Unknown VariableCategory value %r for variable %r; skipping.", value, name)
+            flow_system._variable_categories = restored

And in _invalidate_model() add:

         self._flow_carriers = None  # Invalidate flow-to-carrier mapping
+        self._variable_categories = {}  # Clear stale variable category mappings
🤖 Prompt for AI Agents
In `@flixopt/flow_system.py` around lines 259 - 262, Clear the stale
_variable_categories mapping when the model is invalidated and when reset() is
called: add self._variable_categories.clear() (or reassign {}) inside
_invalidate_model() and reset() so segment expansion (transform.expand) can’t
reuse old categories; and harden enum restoration by wrapping
VariableCategory(value) in a try/except ValueError (or use a safe lookup) to map
unknown/renamed values to a fallback enum (e.g., VariableCategory.UNKNOWN or a
default) instead of raising, applying this safe restore wherever serialized
values are converted back to VariableCategory.

Comment on lines 1844 to 1858
# Get multi-dimensional properties from Clustering
timestep_mapping = clustering.timestep_mapping
segment_assignments = clustering.results.segment_assignments
segment_durations = clustering.results.segment_durations
position_within_segment = clustering.results.position_within_segment

# Get start and end boundary values
start_val = charge_state_data[typical_cluster_idx, seg_idx]
end_val = charge_state_data[typical_cluster_idx, seg_idx + 1]
# Decode timestep_mapping into cluster and time indices
time_dim_size = clustering.timesteps_per_cluster
cluster_indices = timestep_mapping // time_dim_size
time_indices = timestep_mapping % time_dim_size

# Linear interpolation
result[orig_cluster_idx * timesteps_per_cluster + t] = start_val + (end_val - start_val) * factor
# Get segment index and position for each original timestep
seg_indices = segment_assignments.isel(cluster=cluster_indices, time=time_indices)
positions = position_within_segment.isel(cluster=cluster_indices, time=time_indices)
durations = segment_durations.isel(cluster=cluster_indices, segment=seg_indices)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's find the file and examine the method in question
fd "transform_accessor.py" -type f

Repository: flixOpt/flixopt

Length of output: 229


🏁 Script executed:

# Check the actual _interpolate_charge_state_segmented method
sed -n '1821,1877p' flixopt/transform_accessor.py

Repository: flixOpt/flixopt

Length of output: 3019


🏁 Script executed:

# Find where timestep_mapping is built/assigned
rg -n "timestep_mapping\s*=" --type py -B 2 -A 2 | head -80

Repository: flixOpt/flixopt

Length of output: 1229


🏁 Script executed:

# Look for build_expansion_divisor to see the actual encoding logic
rg -n "def build_expansion_divisor" --type py -A 20

Repository: flixOpt/flixopt

Length of output: 1738


🏁 Script executed:

# Check expand_data method
rg -n "def expand_data" --type py -A 25 | head -60

Repository: flixOpt/flixopt

Length of output: 1934


🏁 Script executed:

# Examine segment_assignments definition and dimensions
rg -n "segment_assignments" --type py -B 1 -A 3 | head -120

Repository: flixOpt/flixopt

Length of output: 8202


🏁 Script executed:

# Look for how timestep_mapping is created initially
rg -n "self.timestep_mapping\s*=" --type py -B 5 -A 5

Repository: flixOpt/flixopt

Length of output: 41


🏁 Script executed:

# Check position_within_segment definition
rg -n "position_within_segment" --type py -B 1 -A 3 | head -80

Repository: flixOpt/flixopt

Length of output: 2006


🏁 Script executed:

# Look at the full expand_data implementation, especially the segmented case
sed -n '877,920p' flixopt/clustering/base.py

Repository: flixOpt/flixopt

Length of output: 2161


🏁 Script executed:

# Find where timestep_mapping is actually created/built
sed -n '100,150p' flixopt/clustering/base.py

Repository: flixOpt/flixopt

Length of output: 2365


🏁 Script executed:

# Check the full build_expansion_divisor to see how it uses timestep_mapping
sed -n '924,980p' flixopt/clustering/base.py

Repository: flixOpt/flixopt

Length of output: 2670


🏁 Script executed:

# Let me trace through the actual usage context - check how _interpolate_charge_state_segmented is called
rg -n "_interpolate_charge_state_segmented" --type-add 'py:*.py' -B 5 -A 5 | head -80

Repository: flixOpt/flixopt

Length of output: 1757


🏁 Script executed:

# Check what is_segmented actually means and how _interpolate_charge_state_segmented relates
sed -n '2030,2045p' flixopt/transform_accessor.py

Repository: flixOpt/flixopt

Length of output: 826


🏁 Script executed:

# Let me verify the structure of segment_assignments more carefully
sed -n '367,385p' flixopt/clustering/base.py

Repository: flixOpt/flixopt

Length of output: 827


🏁 Script executed:

# Check position_within_segment to understand its role
sed -n '422,475p' flixopt/clustering/base.py

Repository: flixOpt/flixopt

Length of output: 2163


Fix timestep_mapping decoding for segmented systems at line 1851.

The code unconditionally uses clustering.timesteps_per_cluster to decode timestep_mapping, but for segmented systems the mapping is encoded with n_segments (as confirmed by expand_data() and build_expansion_divisor()). This produces incorrect indices.

Since this method is only called for segmented systems (line 2039: if is_state and clustering.is_segmented), the decoding will always fail. The decoded time_indices will be segment indices (0..n_segments-1) rather than within-cluster time indices (0..timesteps_per_cluster-1), causing a dimension mismatch when indexing segment_assignments and position_within_segment.

Change line 1851 to:

if clustering.is_segmented and clustering.n_segments is not None:
    time_dim_size = clustering.n_segments
else:
    time_dim_size = clustering.timesteps_per_cluster

This aligns with the pattern used in expand_data() (lines 902-905) and build_expansion_divisor() (line 962).

🤖 Prompt for AI Agents
In `@flixopt/transform_accessor.py` around lines 1844 - 1858, The timestep_mapping
decode wrongly always uses clustering.timesteps_per_cluster; for segmented
systems use clustering.n_segments instead. In the block that computes
time_dim_size (around where timestep_mapping, cluster_indices, and time_indices
are computed), set time_dim_size = clustering.n_segments when
clustering.is_segmented and clustering.n_segments is not None, otherwise use
clustering.timesteps_per_cluster so that subsequent indexing into
segment_assignments and position_within_segment uses the correct divisor
(matching the logic in expand_data() and build_expansion_divisor()).

FBumann and others added 12 commits January 15, 2026 16:01
  1. clustering/base.py - combine_slices() hardening (lines 52-118)

  - Added validation for empty input: if not slices: raise ValueError("slices cannot be empty")
  - Capture first array and preserve dtype: first = next(iter(slices.values())) → np.empty(shape, dtype=first.dtype)
  - Clearer error on missing keys with try/except: raise KeyError(f"Missing slice for key {key} (extra_dims={extra_dims})")

  2. flow_system.py - Variable categories cleanup and safe enum restoration

  - Added self._variable_categories.clear() in _invalidate_model() (line 1692) to prevent stale categories from being reused
  - Hardened VariableCategory restoration (lines 922-930) with try/except to handle unknown/renamed enum values gracefully with a warning instead of crashing

  3. transform_accessor.py - Correct timestep_mapping decode for segmented systems (lines 1850-1857)

  - For segmented systems, now uses clustering.n_segments instead of clustering.timesteps_per_cluster as the divisor
  - This matches the encoding logic in expand_data() and build_expansion_divisor()
…sion+variable-categories

# Conflicts:
#	flixopt/structure.py
  modeling.py:200-242 - Added category: VariableCategory = None parameter and passed it to both add_variables calls.

  Updated Callers
  ┌─────────────┬──────┬─────────────────────────┬────────────────────┐
  │    File     │ Line │        Variable         │      Category      │
  ├─────────────┼──────┼─────────────────────────┼────────────────────┤
  │ features.py │ 208  │ active_hours            │ TOTAL              │
  ├─────────────┼──────┼─────────────────────────┼────────────────────┤
  │ elements.py │ 682  │ total_flow_hours        │ TOTAL              │
  ├─────────────┼──────┼─────────────────────────┼────────────────────┤
  │ elements.py │ 709  │ flow_hours_over_periods │ TOTAL_OVER_PERIODS │
  └─────────────┴──────┴─────────────────────────┴────────────────────┘
  All expression tracking variables now properly register their categories for segment expansion handling. The pattern is consistent: callers specify the appropriate category based on what the tracked expression represents.
  variable_categories property (line 1672):
  @Property
  def variable_categories(self) -> dict[str, VariableCategory]:
      """Variable categories for filtering and segment expansion."""
      return self._variable_categories

  get_variables_by_category() method (line 1681):
  def get_variables_by_category(
      self, *categories: VariableCategory, from_solution: bool = True
  ) -> list[str]:
      """Get variable names matching any of the specified categories."""

  Updated in statistics_accessor.py
  ┌───────────────┬──────────────────────────────────────────┬──────────────────────────────────────────────────┐
  │   Property    │                  Before                  │                      After                       │
  ├───────────────┼──────────────────────────────────────────┼──────────────────────────────────────────────────┤
  │ flow_rates    │ endswith('|flow_rate')                   │ get_variables_by_category(FLOW_RATE)             │
  ├───────────────┼──────────────────────────────────────────┼──────────────────────────────────────────────────┤
  │ flow_sizes    │ endswith('|size') + flow_labels check    │ get_variables_by_category(SIZE) + flow_labels    │
  ├───────────────┼──────────────────────────────────────────┼──────────────────────────────────────────────────┤
  │ storage_sizes │ endswith('|size') + storage_labels check │ get_variables_by_category(SIZE) + storage_labels │
  ├───────────────┼──────────────────────────────────────────┼──────────────────────────────────────────────────┤
  │ charge_states │ endswith('|charge_state')                │ get_variables_by_category(CHARGE_STATE)          │
  └───────────────┴──────────────────────────────────────────┴──────────────────────────────────────────────────┘
  Benefits

  1. Single source of truth - Categories defined once in VariableCategory enum
  2. Easier maintenance - Adding new variable types only requires updating one place
  3. Type safety - Using enum values instead of magic strings
  4. Flexible filtering - Can filter by multiple categories: get_variables_by_category(SIZE, INVESTED)
  5. Consistent naming - Uses rsplit('|', 1)[0] instead of replace('|suffix', '') for label extraction
  1. New SIZE Sub-Categories (structure.py)

  - Added FLOW_SIZE and STORAGE_SIZE to differentiate flow vs storage investments
  - Kept SIZE for backward compatibility

  2. InvestmentModel Updated (features.py)

  - Added size_category parameter to InvestmentModel.__init__()
  - Callers now specify the appropriate category

  3. Variable Registrations Updated

  - elements.py: FlowModel uses FLOW_SIZE
  - components.py: StorageModel uses STORAGE_SIZE (2 locations)

  4. Statistics Accessor Simplified (statistics_accessor.py)

  - flow_sizes: Now uses get_variables_by_category(FLOW_SIZE) directly
  - storage_sizes: Now uses get_variables_by_category(STORAGE_SIZE) directly
  - No more filtering by element labels after getting SIZE variables

  5. Backward-Compatible Fallback (flow_system.py)

  - get_variables_by_category() handles old files:
    - FLOW_SIZE → matches |size suffix + flow labels
    - STORAGE_SIZE → matches |size suffix + storage labels

  6. SOC Boundary Pattern Matching Replaced (transform_accessor.py)

  - Changed from endswith('|SOC_boundary') to get_variables_by_category(SOC_BOUNDARY)

  7. Effect Variables Verified

  - PER_TIMESTEP ✓ (features.py:659)
  - SHARE ✓ (features.py:700 for temporal shares)
  - TOTAL / TOTAL_OVER_PERIODS ✓ (multiple locations)

  8. Documentation Updated

  - _build_segment_total_varnames() marked as backwards-compatibility fallback

  Benefits

  - Cleaner code: No more string manipulation to filter by element type
  - Type safety: Using enum values instead of magic strings
  - Single source of truth: Categories defined once, used everywhere
  - Backward compatible: Old files still work via fallback logic
* Add collapse to IO

* Temp

* 1. Added dimension reduction functions in io.py:
    - _reduce_constant_dims(): Reduces dimensions where all values are constant
    - _expand_reduced_dims(): Restores original dimensions with correct order
  2. Added reduce_constants parameter to FlowSystem.to_dataset():
    - Default: False (opt-in)
    - When True: Reduces constant dimensions for memory efficiency
  3. Updated FlowSystem.from_dataset():
    - Expands both collapsed arrays (from NetCDF) and reduced dimensions
  4. Key fixes:
    - Store original dimension order in attrs to preserve (cluster, time) vs (time, cluster) ordering
    - Skip solution variables in reduction (they're prefixed with solution|)

  The optimization is opt-in via to_dataset(reduce_constants=True). For file storage, save_dataset_to_netcdf still collapses constants to scalars by default.

* Set reduce_constants=True by default in to_dataset()

Benchmarks show 99%+ reduction in memory and file size for multi-period
models, with faster IO speeds. The optimization is now enabled by default.

Co-Authored-By: Claude Opus 4.5 <[email protected]>

* Feature/speedup io+internals (#576)

* Add safe isel wrappers for dimension-independent operations

- Add _scalar_safe_isel_drop() to modeling.py for selecting from
  potentially reduced arrays (handles both scalar and array cases)
- Update Storage validation to use _scalar_safe_isel for
  relative_minimum/maximum_charge_state access at time=0
- Update StorageModel._relative_charge_state_bounds to handle
  reduced arrays properly with dimension checks

These wrappers prepare the codebase for future optimization where
_expand_reduced_dims() could be removed from from_dataset(),
keeping data compact throughout the system.

Co-Authored-By: Claude Opus 4.5 <[email protected]>

* Remove pre-broadcasting from DataConverter - data stays minimal

Major change: DataConverter.to_dataarray() now validates dimensions
instead of broadcasting to full target dimensions. This keeps data
compact (scalars stay scalar, 1D arrays stay 1D) and lets linopy
handle broadcasting at variable creation time.

Changes:
- core.py: Replace _broadcast_dataarray_to_target_specification with
  _validate_dataarray_dims in to_dataarray() method
- components.py: Fix _relative_charge_state_bounds to handle scalar
  inputs that have no time dimension (expand to timesteps_extra)
- conftest.py: Add assert_dims_compatible helper for tests
- test_*.py: Update dimension assertions to use subset checking

Memory impact (8760 timesteps × 20 periods):
- Before: 24.1 MB dataset size
- After: 137.1 KB dataset size

Co-Authored-By: Claude Opus 4.5 <[email protected]>

* Remove redundant dimension reduction - data is minimal from start

Since DataConverter no longer broadcasts, these optimizations are redundant:
- Remove _reduce_constant_dims() and _expand_reduced_dims() from io.py
- Remove reduce_constants parameter from to_dataset()
- Remove _expand_reduced_dims() call from from_dataset()

Also ensure Dataset always has model coordinates (time, period, scenario,
cluster) even if no data variable uses them. This is important for Dataset
validity and downstream operations.

Update benchmark_io.py to reflect the simplified API.

Memory footprint now:
- 24 timesteps: 568 bytes (23/24 vars are scalar)
- 168 timesteps: 2.8 KB
- 8760 timesteps × 20 periods: ~137 KB (vs 24 MB with old broadcasting)

Co-Authored-By: Claude Opus 4.5 <[email protected]>

* ⏺ Done. I've removed the _collapse_constant_arrays / _expand_collapsed_arrays optimization from the codebase. Here's a summary of the changes:

  Files modified:

  1. flixopt/io.py:
    - Removed COLLAPSED_VAR_PREFIX constant
    - Removed _collapse_constant_arrays() function (~45 lines)
    - Removed _expand_collapsed_arrays() function (~40 lines)
    - Removed collapse_constants parameter from save_dataset_to_netcdf()
    - Removed expansion logic from load_dataset_from_netcdf()
  2. flixopt/flow_system.py:
    - Removed import and call to _expand_collapsed_arrays in from_dataset()
  3. benchmark_io.py:
    - Simplified benchmark_netcdf() to only benchmark single save/load (no more comparison)
    - Removed collapse_constants parameter from roundtrip function

  Rationale: Since data is now kept in minimal form throughout the system (scalars stay scalars, 1D arrays stay 1D), there's no need for the extra collapsing/expanding layer when saving to NetCDF. The file sizes are naturally small because the data isn't expanded to full dimensions.

  Test results:
  - All 10 IO tests pass
  - All 4 integration tests pass
  - Benchmark runs successfully

* Summary of Fixes

  1. _dataset_resample handling all-scalar data (transform_accessor.py)

  When all data variables are scalars (no time dimension), the resample function now:
  - Creates a dummy variable to resample the time coordinate
  - Preserves all original scalar data variables
  - Preserves all non-time coordinates (period, scenario, cluster)

  2. _scalar_safe_reduce helper (modeling.py)

  Added a new helper function that safely performs aggregation operations (mean/sum/etc) over a dimension:
  - Returns reduced data if the dimension exists
  - Returns data unchanged if the dimension doesn't exist (scalar data)

  3. Updated Storage intercluster linking (components.py)

  Used _scalar_safe_reduce for:
  - relative_loss_per_hour.mean('time')
  - timestep_duration.mean('time')

  4. Updated clustering expansion (transform_accessor.py)

  Used _scalar_safe_reduce for relative_loss_per_hour.mean('time')

  5. Fixed dimension order in expand_data (clustering/base.py)

  When expanding clustered data back to original timesteps:
  - Added transpose to ensure (cluster, time) dimension order before numpy indexing
  - Fixes IndexError when dimensions were in different order

* Added reduction on loading old datasets

* Summary of changes made:

  1. flixopt/structure.py: Added dimension transpose in _ensure_coords() to ensure correct dimension order when data already has all dims but in wrong order
  2. tests/test_io_conversion.py: Updated test_returns_same_object → test_returns_equivalent_dataset since convert_old_dataset now creates a new dataset when reducing constants
  3. tests/deprecated/test_flow.py: Updated 6 assertions to expect minimal dimensions ('time',) instead of broadcasting to all model coords
  4. tests/deprecated/test_component.py: Updated 2 assertions to expect minimal dimensions ('time',)
  5. tests/deprecated/test_linear_converter.py: Updated 1 assertion to expect minimal dimensions ('time',)

  The key change is that data now stays in minimal form - a 1D time-varying array stays as ('time',) dimensions rather than being pre-broadcast to ('time', 'scenario') or other full model dimensions. Broadcasting happens at the linopy interface layer in FlowSystemModel.add_variables() when needed.

---------

Co-authored-by: Claude Opus 4.5 <[email protected]>

---------

Co-authored-by: Claude Opus 4.5 <[email protected]>
# Conflicts:
#	flixopt/clustering/base.py
#	flixopt/components.py
…+variable-categories

# Conflicts:
#	flixopt/clustering/base.py
#	flixopt/components.py
#	flixopt/structure.py
#	flixopt/transform_accessor.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants