FlowSets

Analysis and Visualization of Expression Patterns with Fuzzy Sets as FlowSets

FlowSets won the best poster award at ISMB/ECCB 2023 in the BioVis-Track!

Contact

Overview

FlowSets is a Python package for visualizing and analyzing gene expression patterns using fuzzy set theory. It enables the identification and visualization of gene expression flows across experimental conditions or clusters, and supports pathway enrichment analysis for genes according to a membership following specific expression patterns.

Install

You can install FlowSets using pip:

pip install flowsets

Installation Help

If you encounter issues installing polars, you may need to use the long-term support (LTS) CPU-only build:

pip uninstall polars
pip install polars-lts-cpu==1.19.0

For testing FlowSets, we recommend using a dedicated Conda environment:

conda create -n FlowSets_env python=3.12 
conda activate FlowSets_env
pip install flowsets

Quick Start Example

from flowsets import *

# Read in data as polars dataframe
data = pl.read_csv(
    './data/deseq2_results_25deg_all_comparisons_cleaned.csv',
    null_values=['NA'],
    schema={
        "baseMean": pl.Float32,
        "log2FoldChange": pl.Float32,
        "lfcSE": pl.Float32,
        "stat": pl.Float32,
        "pvalue": pl.Float32,
        "padj": pl.Float32,
        "comparison": pl.Utf8,
        "gene_id": pl.Utf8
    }
)

# Fuzzify the log2FoldChange values for each gene and comparison
# Here all states are fuzzified with the same 
explDFWide, mfFuzzy = LegacyFuzzifier.fuzzify(
    data, #df
    stepsize=0.01,
    symbol_column="gene_id", # column name refering to feature
    meancolName="log2FoldChange", # column name refering to signal
    clusterColName="comparison", # column name refering to state
    mfLevels = ["strong_down","down","neutral","up", "strong_up"], # linguistic variables which should be created
    centers=[-2, -1, 0, 1, 2], # centers for the fuzzy sets
    sdcolName=None, exprcolName=None, # these parameters are not in use, they are meant for single cell
)

# Create a FlowAnalysis (FlowSets) object for the fuzzified data
# The series is defined by tuples with the name in dataframe (clusterColName) and displayed name in FlowSets
def_series = (
    ("HSF1.KD vs Wildtype",'KO1 vs WT'), 
    ("Double.KDKO vs Wildtype",'KO1+2 vs WT'),
    ("MSN24.KO vs Wildtype",'KO2 vs WT')
)
fa = FlowAnalysis(explDFWide, "gene_id", def_series, mfFuzzy)

# Plot the flow memberships for all genes
fa.plot_flows(figsize=(15, 10),title="Data set overview \n Unrestricted FlowSets",outfile="./plots/complete_flow")

Feature centric analysis

Visualize only Specific Gene Sets

solis_genes = ["YAL005C", "YBR101C", "YDR171W", "YDR214W", "YDR258C", "YFL016C", "YGR142W", "YLL024C", "YLL026W", "YLR216C", "YMR186W", "YNL007C", "YNL064C", "YNL281W", "YOR027W", "YOR298C-A", "YPL240C", "YPR158W"]

fa.plot_flows(genes=solis_genes, title="Solis et al. 2016 - KO1 dependent genes", figsize=(10, 8), outfile="./plots/geneset_flow.png")

Pattern centric analysis

Pattern Search and Pathway Analysis

# Find genes with specific flow patterns and visualize flow + memberships
relFlow = fa.flow_finder(
    ["?","?"], 
    minLevels=[None,None,"down"], 
    maxLevels=["down","down","up"], 
    verbose=False
    )

fa.plot_flows(use_edges=relFlow,title="Restricted FlowSets \n pattern centric analysis",outfile="./plots/pattern_flow")

fa.plot_flow_memberships(
    use_edges=relFlow, 
    color_genes=solis_genes, 
    outfile="./plots/pattern_memberships.png"
    )

# Perform pathway analysis using GOslim and additional gene sets

pw_file = "./data/goslim.gmt"

pwScores = fa.analyse_pathways(
    use_edges=relFlow, 
    genesets_file=pw_file, 
    additional_genesets=[("solis annotated genes", solis_genes)]
    )

pwScores_signif = pwScores.sort_values("pw_coverage_pval", ascending=True).head(20)
display(pwScores_signif)

# Show as ORA plot
fa.plotORAresult(pwScores_signif, "GOslim", numResults=10, figsize=(6,6), outfile="./plots/goslim_pathway_analysis.png")

Paper Examples

other Examples

Method Summary

(Differential) Expression data are read in for each gene and each cluster (or state).
Values are fuzzified by user-defined membership classes, min-max scaling, or quantiles.
Relevant flows are defined using a simple grammar with flow_finder, specifying desired differences between levels.
For each flow or group of flows, gene set enrichment analysis is performed. Gene sets are binned by size, and for each bin, flow memberships are calculated. A z-score is computed for each gene set (relative to others in the bin), which is transformed into a p-value for all positive-z-score (overrepresented) gene sets.

A more detailed description is available in the working copy of our manuscript article.

License

This project is licensed under the MIT License.

Citation

If you use FlowSets in your research, please cite our manuscript (see WorkingVersionFlowsets.pdf).

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
benchmark_proc		benchmark_proc
flowsets		flowsets
paper_examples		paper_examples
scanpy_usecase		scanpy_usecase
tutorial		tutorial
.gitignore		.gitignore
README.md		README.md
pip_requirements.txt		pip_requirements.txt
pyproject.toml		pyproject.toml
seurat_util_functions.R		seurat_util_functions.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FlowSets

Analysis and Visualization of Expression Patterns with Fuzzy Sets as FlowSets

Contact

Overview

Install

Installation Help

Quick Start Example

Feature centric analysis

Visualize only Specific Gene Sets

Pattern centric analysis

Pattern Search and Pathway Analysis

Paper Examples

other Examples

Method Summary

License

Citation

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

mjoppich/FlowSets

Folders and files

Latest commit

History

Repository files navigation

FlowSets

Analysis and Visualization of Expression Patterns with Fuzzy Sets as FlowSets

Contact

Overview

Install

Installation Help

Quick Start Example

Feature centric analysis

Visualize only Specific Gene Sets

Pattern centric analysis

Pattern Search and Pathway Analysis

Paper Examples

other Examples

Method Summary

License

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages