Skip to content

Refactor Pipelines#251

Merged
JoshLoecker merged 42 commits intodevelopfrom
refactor-pipelines
Mar 6, 2026
Merged

Refactor Pipelines#251
JoshLoecker merged 42 commits intodevelopfrom
refactor-pipelines

Conversation

@JoshLoecker
Copy link
Member

This pull request refactors and improves the logic for combining z-score distributions across batches, sources, and contexts in main/como/combine_distributions.py. The main changes include updating the weighting methodology to use equal weights (Stouffer's Z method), removing invalid weighting by replicate counts, improving handling of missing values, and simplifying function signatures and type usage. The code is also made more robust and readable, with updated docstrings and error messages.

Statistical methodology improvements:

  • Updated _combine_z_distribution_for_source to use equal weights (Stouffer's Z method) instead of replicate counts, as replicate counts are not statistically valid weights for z-scores. This change ensures proper aggregation of standardized scores.
  • Improved handling of missing values and NaNs in z-score combination logic, including robust masking and safe division, across batch and source combination functions. [1] [2]

Code and interface simplification:

  • Removed unnecessary parameters (weighted_z_floor, weighted_z_ceiling) from batch combination functions and updated function signatures to reflect new logic. [1] [2]
  • Simplified type usage and imports, including replacing convert with get_remaining_identifiers, and updating type hints and merge logic for Scanpy matrices. [1] [2]

Data handling and output improvements:

  • Ensured all output DataFrames have integer indices and are sorted, improving downstream consistency. [1] [2] [3]
  • Changed duplicate gene handling from averaging to taking the maximum value, which may better reflect the most significant signal.

Code readability and maintainability:

  • Updated docstrings and error messages for clarity and correctness. [1] [2] [3]
  • Improved formatting and line splitting for readability throughout the file. [1] [2]

Type and import updates:

  • Added new imports and type hints in main/como/data_types.py to support the refactored logic.
  • Removed unused classes and updated dictionary comprehensions for clarity. [1] [2] [3]

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
…sibility of where errors are being raised

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
…better visibility of where errors are being raised

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
…ervices`

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
…ervices`

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
…ervices`

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
…ervices`

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
- Add `ModelBuildSettings` dataclass for solver parameters
- Remove unused PeakIdentificationParameters and _BuildResults
- Minor formatting cleanup

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
- Adds TPM dataframe output support
- !Converts all async functions to sync
- Refactors `_write_counts_matrix` to `_write_matrices
- Add better validation for output filepaths
- No longer utilize STAR gene count files

![BREAKING-CHANGE]: Converts all async functions to sync

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
- !Converts from async functions to sync
- Change `fragment_lengths` from a per-sample value to a per-gene value (np.ndarray to pd.DataFrame)
- Refactor TPM, FPKM, and zFPKM functions
- Adds fpkm output CSV file support
- Improves type hints and validations

[BREAKING-CHANGE]: This removes all async functions for synchronous counterparts

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
- Removes `_get_transcriptomic_details` function as it was unused
- Renames `_merge_xomics` to `_trinarize_data` for -1/0/+1 binning
- Simplify missing data handling

[BREAKING-CHANGE]: Removes all async functions for synchronous counterparts

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
- Fixes Z-score combination formula to use equal weights/Stouffer's method
- Adds proper handling for NaN/Inf values
- Fixes output formatting

[BREAKING-CHANGE]: Converts all async functions to their synchronous counterparts

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
- Convert all async functions to synchronous
- Remove unused imports/types
- Replace `await _read_file` with `pd.read_csv` calls

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
- Change from threshold-based (low_thresh, high_thresh) to percentile-based (low_percentile, high_percentile)
- Return tuple with (reaction_expression, min_val, low_expr, high_expr) instead of just a dictionary
- Adjust expression values to have a minimum of 0 for consistency and downstream building with Troppo
- use np.nanpercentile for threshold computing

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
- Uses cobamp for optimization handling
- Properly partition solution into high-bin and low-bin reactions
- Fix high/low index calculation to match Troppo's expected behavior
- Add validation for expected solution length
- Use 0.5 midpoint for binarizing reaction inclusion

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
- Adds `reference_model` parameter to internal `_collect_boundary_reactions` to define constraints for all boundary reactions
- Validate compartments exist in reference model

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
…exchange reactions not included in the boundary reactions input file

This option defaults to False to maintain compatibility with preivous releases

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
… reactions

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
…alization

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
…ut handling

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
…th pandas read_csv

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
…rmatting

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
… in model creation

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
…vert`

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
@JoshLoecker JoshLoecker marked this pull request as ready for review March 6, 2026 19:33
@JoshLoecker JoshLoecker merged commit 407219f into develop Mar 6, 2026
3 checks passed
@JoshLoecker JoshLoecker deleted the refactor-pipelines branch March 6, 2026 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant