Refactor Pipelines by JoshLoecker · Pull Request #251 · HelikarLab/COMO

JoshLoecker · 2026-03-06T19:32:04Z

This pull request refactors and improves the logic for combining z-score distributions across batches, sources, and contexts in main/como/combine_distributions.py. The main changes include updating the weighting methodology to use equal weights (Stouffer's Z method), removing invalid weighting by replicate counts, improving handling of missing values, and simplifying function signatures and type usage. The code is also made more robust and readable, with updated docstrings and error messages.

Statistical methodology improvements:

Updated _combine_z_distribution_for_source to use equal weights (Stouffer's Z method) instead of replicate counts, as replicate counts are not statistically valid weights for z-scores. This change ensures proper aggregation of standardized scores.
Improved handling of missing values and NaNs in z-score combination logic, including robust masking and safe division, across batch and source combination functions. [1] [2]

Code and interface simplification:

Removed unnecessary parameters (weighted_z_floor, weighted_z_ceiling) from batch combination functions and updated function signatures to reflect new logic. [1] [2]
Simplified type usage and imports, including replacing convert with get_remaining_identifiers, and updating type hints and merge logic for Scanpy matrices. [1] [2]

Data handling and output improvements:

Ensured all output DataFrames have integer indices and are sorted, improving downstream consistency. [1] [2] [3]
Changed duplicate gene handling from averaging to taking the maximum value, which may better reflect the most significant signal.

Code readability and maintainability:

Updated docstrings and error messages for clarity and correctness. [1] [2] [3]
Improved formatting and line splitting for readability throughout the file. [1] [2]

Type and import updates:

Added new imports and type hints in main/como/data_types.py to support the refactored logic.
Removed unused classes and updated dictionary comprehensions for clarity. [1] [2] [3]

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

…sibility of where errors are being raised Signed-off-by: Josh Loecker <joshloecker@icloud.com>

…better visibility of where errors are being raised Signed-off-by: Josh Loecker <joshloecker@icloud.com>

…ervices` Signed-off-by: Josh Loecker <joshloecker@icloud.com>

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

- Add `ModelBuildSettings` dataclass for solver parameters - Remove unused PeakIdentificationParameters and _BuildResults - Minor formatting cleanup Signed-off-by: Josh Loecker <joshloecker@icloud.com>

- Adds TPM dataframe output support - !Converts all async functions to sync - Refactors `_write_counts_matrix` to `_write_matrices - Add better validation for output filepaths - No longer utilize STAR gene count files ![BREAKING-CHANGE]: Converts all async functions to sync Signed-off-by: Josh Loecker <joshloecker@icloud.com>

- !Converts from async functions to sync - Change `fragment_lengths` from a per-sample value to a per-gene value (np.ndarray to pd.DataFrame) - Refactor TPM, FPKM, and zFPKM functions - Adds fpkm output CSV file support - Improves type hints and validations [BREAKING-CHANGE]: This removes all async functions for synchronous counterparts Signed-off-by: Josh Loecker <joshloecker@icloud.com>

- Removes `_get_transcriptomic_details` function as it was unused - Renames `_merge_xomics` to `_trinarize_data` for -1/0/+1 binning - Simplify missing data handling [BREAKING-CHANGE]: Removes all async functions for synchronous counterparts Signed-off-by: Josh Loecker <joshloecker@icloud.com>

- Fixes Z-score combination formula to use equal weights/Stouffer's method - Adds proper handling for NaN/Inf values - Fixes output formatting [BREAKING-CHANGE]: Converts all async functions to their synchronous counterparts Signed-off-by: Josh Loecker <joshloecker@icloud.com>

- Convert all async functions to synchronous - Remove unused imports/types - Replace `await _read_file` with `pd.read_csv` calls Signed-off-by: Josh Loecker <joshloecker@icloud.com>

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

- Change from threshold-based (low_thresh, high_thresh) to percentile-based (low_percentile, high_percentile) - Return tuple with (reaction_expression, min_val, low_expr, high_expr) instead of just a dictionary - Adjust expression values to have a minimum of 0 for consistency and downstream building with Troppo - use np.nanpercentile for threshold computing Signed-off-by: Josh Loecker <joshloecker@icloud.com>

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

- Uses cobamp for optimization handling - Properly partition solution into high-bin and low-bin reactions - Fix high/low index calculation to match Troppo's expected behavior - Add validation for expected solution length - Use 0.5 midpoint for binarizing reaction inclusion Signed-off-by: Josh Loecker <joshloecker@icloud.com>

- Adds `reference_model` parameter to internal `_collect_boundary_reactions` to define constraints for all boundary reactions - Validate compartments exist in reference model Signed-off-by: Josh Loecker <joshloecker@icloud.com>

…exchange reactions not included in the boundary reactions input file This option defaults to False to maintain compatibility with preivous releases Signed-off-by: Josh Loecker <joshloecker@icloud.com>

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

… reactions Signed-off-by: Josh Loecker <joshloecker@icloud.com>

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

…alization Signed-off-by: Josh Loecker <joshloecker@icloud.com>

…ut handling Signed-off-by: Josh Loecker <joshloecker@icloud.com>

…th pandas read_csv Signed-off-by: Josh Loecker <joshloecker@icloud.com>

…rmatting Signed-off-by: Josh Loecker <joshloecker@icloud.com>

… in model creation Signed-off-by: Josh Loecker <joshloecker@icloud.com>

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

…vert` Signed-off-by: Josh Loecker <joshloecker@icloud.com>

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

JoshLoecker added 30 commits March 6, 2026 10:24

chore: bump zFPKM version requirement

03dfc79

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

chore: allow ruff to fix imports and __all__ sections

1e2e14f

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

chore: bump bioservices requirement

a6b25e9

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

chore: remove the log_and_raise_error helper function for better vi…

027f97a

…sibility of where errors are being raised Signed-off-by: Josh Loecker <joshloecker@icloud.com>

chore: remove the internal log_and_raise_error helper function for …

fa75fc6

…better visibility of where errors are being raised Signed-off-by: Josh Loecker <joshloecker@icloud.com>

feat: add genomic conversion pipelines directly into COMO using `bios…

f59e21a

…ervices` Signed-off-by: Josh Loecker <joshloecker@icloud.com>

feat: add genomic conversion pipelines directly into COMO using `bios…

3911880

…ervices` Signed-off-by: Josh Loecker <joshloecker@icloud.com>

feat: add genomic conversion pipelines directly into COMO using `bios…

46f30cd

…ervices` Signed-off-by: Josh Loecker <joshloecker@icloud.com>

feat: add genomic conversion pipelines directly into COMO using `bios…

f2b3277

…ervices` Signed-off-by: Josh Loecker <joshloecker@icloud.com>

feat: extract gene-info building as a pipeline component

d4e7ea8

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

chore: add types for ModelBuildSettings

b75e126

- Add `ModelBuildSettings` dataclass for solver parameters - Remove unused PeakIdentificationParameters and _BuildResults - Minor formatting cleanup Signed-off-by: Josh Loecker <joshloecker@icloud.com>

refactor: remove async/await and simplify I/O operations

2db1fc4

- Convert all async functions to synchronous - Remove unused imports/types - Replace `await _read_file` with `pd.read_csv` calls Signed-off-by: Josh Loecker <joshloecker@icloud.com>

refactor: inline setting boundary reaction conditions

5a99d13

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

refactor: add and update types for model build functions

41d1b56

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

fix: update boundary reaction collection logic

78f62f8

- Adds `reference_model` parameter to internal `_collect_boundary_reactions` to define constraints for all boundary reactions - Validate compartments exist in reference model Signed-off-by: Josh Loecker <joshloecker@icloud.com>

feat: add close_unlisted_exchanges to forcibly set bounds to 0 for …

bf30d2d

…exchange reactions not included in the boundary reactions input file This option defaults to False to maintain compatibility with preivous releases Signed-off-by: Josh Loecker <joshloecker@icloud.com>

refactor: streamline GIMME model reconstruction and expression handling

b92b555

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

refactor: improve context-specific model creation logic

46b89aa

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

chore: add type hints for reconstructing with tINIT

f2d7199

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

chore: do not use match case; reduces indentation

98a5436

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

perf: pass in cobra.Model object to prevent reading multiple times

b83842c

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

refactor: enhance reaction expression mapping and logging for missing…

a419bce

… reactions Signed-off-by: Josh Loecker <joshloecker@icloud.com>

refactor: streamline reaction expression handling in model creation

3a8f131

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

refactor: improve reaction index handling and expression vector initi…

4cab288

…alization Signed-off-by: Josh Loecker <joshloecker@icloud.com>

JoshLoecker added 12 commits March 6, 2026 11:28

refactor: update reaction subsystem assignment and optimize flux outp…

e4b90b5

…ut handling Signed-off-by: Josh Loecker <joshloecker@icloud.com>

refactor: simplify DataFrame creation by replacing async file read wi…

2f6e09e

…th pandas read_csv Signed-off-by: Josh Loecker <joshloecker@icloud.com>

refactor: update parameter names for clarity and improve docstring fo…

d224639

…rmatting Signed-off-by: Josh Loecker <joshloecker@icloud.com>

refactor: improve error handling and streamline file path assignments…

e1b2e79

… in model creation Signed-off-by: Josh Loecker <joshloecker@icloud.com>

refactor: enhance model building parameters for clarity and consistency

f7de63b

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

refactor: streamline context model creation and improve logging output

d178491

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

chore: use proper NumPy types for better compatibility

cae84ea

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

chore: use mod.get_remaining_identifiers in replacement of `mod.con…

53b1830

…vert` Signed-off-by: Josh Loecker <joshloecker@icloud.com>

chore: update built-in boundary reaction bounds to match expected format

b661349

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

chore: ruff formatting

2628568

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

chore: update identifier pipeline tests

cd61f79

Signed-off-by: Josh Loecker <joshloecker@icloud.com>

Merge branch 'develop' into refactor-pipelines

54b4bc2

JoshLoecker marked this pull request as ready for review March 6, 2026 19:33

JoshLoecker merged commit 407219f into develop Mar 6, 2026
3 checks passed

JoshLoecker deleted the refactor-pipelines branch March 6, 2026 19:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Pipelines#251

Refactor Pipelines#251
JoshLoecker merged 42 commits intodevelopfrom
refactor-pipelines

JoshLoecker commented Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JoshLoecker commented Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant