Single sample intracellular signalling network inference#

In this notebook we showcase how to use the advanced CARNIVAL implementation available in CORNETO. This implementation extends the capabilities of the original CARNIVAL method by enabling advanced modelling and injection of knowledge for hypothesis generation. We will use a dataset consisting of 6 samples of hepatic stellate cells (HSC) where three of them were activated by the cytokine Transforming growth factor (TGF-β).

In the first part, we will show how to estimate Transcription Factor activities from gene expression data, following the Decoupler tutorial for functional analysis. Then, we will use the CARNIVAL method available in CORNETO to infer a network from TFs to receptors, assuming that we don’t really know which treatment was used. We will see how to do some basic modeling and adding information from pathway enrichment using PROGENy to bias the inferred networks.

NOTE: This tutorial uses the new Decoupler v2 version. We also use the highs solver, which has to be installed with pip install highspy

import gzip
import os
import shutil
import tempfile
import urllib.request
import pandas as pd

# https://decoupler-py.readthedocs.io
import decoupler as dc
import numpy as np

# https://omnipathdb.org/
import omnipath as op

# Pydeseq for differential expression analysis
from pydeseq2.dds import DefaultInference, DeseqDataSet
from pydeseq2.ds import DeseqStats

# https://corneto.org
import corneto as cn

dc.__version__

'2.0.4'

max_time = 300
seed = 0

# Load the dataset
adata = dc.ds.hsctgfb()
adata

AnnData object with n_obs × n_vars = 6 × 58674
    obs: 'condition', 'sample_id'

Data preprocessing#

We will use AnnData and PyDeseq2 to pre-process the data and compute differential expression between control and tretament

# Obtain genes that pass the thresholds
dc.pp.filter_by_expr(
    adata=adata,
    group="condition",
    min_count=10,
    min_total_count=15,
    large_n=10,
    min_prop=0.7,
)
adata

AnnData object with n_obs × n_vars = 6 × 19510
    obs: 'condition', 'sample_id'

# Build DESeq2 object
inference = DefaultInference(n_cpus=8)
dds = DeseqDataSet(
    adata=adata,
    design_factors="condition",
    refit_cooks=True,
    inference=inference,
)

# Compute LFCs
dds.deseq2()

# Extract contrast between conditions
stat_res = DeseqStats(dds, contrast=["condition", "treatment", "control"], inference=inference)

# Compute Wald test
stat_res.summary()

Using None as control genes, passed at DeseqDataSet initialization
Log2 fold change & Wald test p-value: condition treatment vs control
                  baseMean  log2FoldChange     lfcSE       stat  \
RP11-347I19.8    88.877997       -0.207119  0.252510  -0.820242   
TMEM170B        103.807801       -0.276960  0.210533  -1.315516   
RP11-91P24.6    185.514929       -0.305844  0.180907  -1.690620   
SLC33A1        2559.269047        0.595912  0.085251   6.990075   
SNORA61          66.168502        0.746772  0.266416   2.803032   
...                    ...             ...       ...        ...   
RP11-75I2.3     771.828248        3.812419  0.128104  29.760449   
SLC25A24       3525.523888        0.651017  0.087598   7.431841   
BTN2A2          674.018191       -0.107905  0.106212  -1.015945   
RP11-101E3.5    194.084339        0.092515  0.181943   0.508484   
RP11-91I8.3      10.761886       -3.804522  0.920080  -4.134989   

                      pvalue           padj  
RP11-347I19.8   4.120781e-01   5.540360e-01  
TMEM170B        1.883365e-01   3.052879e-01  
RP11-91P24.6    9.090941e-02   1.681656e-01  
SLC33A1         2.747400e-12   2.423226e-11  
SNORA61         5.062459e-03   1.380028e-02  
...                      ...            ...  
RP11-75I2.3    1.270300e-194  4.130591e-192  
SLC25A24        1.070963e-13   1.049974e-12  
BTN2A2          3.096555e-01   4.491264e-01  
RP11-101E3.5    6.111136e-01   7.290020e-01  
RP11-91I8.3     3.549714e-05   1.471318e-04  

[19510 rows x 6 columns]

results_df = stat_res.results_df
results_df

	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
RP11-347I19.8	88.877997	-0.207119	0.252510	-0.820242	4.120781e-01	5.540360e-01
TMEM170B	103.807801	-0.276960	0.210533	-1.315516	1.883365e-01	3.052879e-01
RP11-91P24.6	185.514929	-0.305844	0.180907	-1.690620	9.090941e-02	1.681656e-01
SLC33A1	2559.269047	0.595912	0.085251	6.990075	2.747400e-12	2.423226e-11
SNORA61	66.168502	0.746772	0.266416	2.803032	5.062459e-03	1.380028e-02
...	...	...	...	...	...	...
RP11-75I2.3	771.828248	3.812419	0.128104	29.760449	1.270300e-194	4.130591e-192
SLC25A24	3525.523888	0.651017	0.087598	7.431841	1.070963e-13	1.049974e-12
BTN2A2	674.018191	-0.107905	0.106212	-1.015945	3.096555e-01	4.491264e-01
RP11-101E3.5	194.084339	0.092515	0.181943	0.508484	6.111136e-01	7.290020e-01
RP11-91I8.3	10.761886	-3.804522	0.920080	-4.134989	3.549714e-05	1.471318e-04

19510 rows × 6 columns

data = results_df[["stat"]].T.rename(index={"stat": "treatment.vs.control"})
data

	RP11-347I19.8	TMEM170B	RP11-91P24.6	SLC33A1	SNORA61	THAP9-AS1	LIX1L	TTC8	WBSCR22	LPHN2	...	STARD4-AS1	ZNF845	NIPSNAP3B	ARL6IP5	MATN1-AS1	RP11-75I2.3	SLC25A24	BTN2A2	RP11-101E3.5	RP11-91I8.3
treatment.vs.control	-0.820242	-1.315516	-1.69062	6.990075	2.803032	1.531275	2.164138	-0.579879	-0.945672	-14.351588	...	14.484214	0.56727	-1.883272	-8.42519	-1.739921	29.760449	7.431841	-1.015945	0.508484	-4.134989

1 rows × 19510 columns

Prior knowledge with Decoupler and Omnipath#

# Retrieve CollecTRI gene regulatory network (through Omnipath).
# These are regulons that we will use for enrichment tests
# to infer TF activities
collectri = dc.op.collectri(organism="human")
collectri

	source	target	weight	resources	references	sign_decision
0	MYC	TERT	1.0	DoRothEA-A;ExTRI;HTRI;NTNU.Curated;Pavlidis202...	10022128;10491298;10606235;10637317;10723141;1...	PMID
1	SPI1	BGLAP	1.0	ExTRI	10022617	default activation
2	SMAD3	JUN	1.0	ExTRI;NTNU.Curated;TFactS;TRRUST	10022869;12374795	PMID
3	SMAD4	JUN	1.0	ExTRI;NTNU.Curated;TFactS;TRRUST	10022869;12374795	PMID
4	STAT5A	IL2	1.0	ExTRI	10022878;11435608;17182565;17911616;22854263;2...	default activation
...	...	...	...	...	...	...
42985	NFKB	hsa-miR-143-3p	1.0	ExTRI	19472311	default activation
42986	AP1	hsa-miR-206	1.0	ExTRI;GEREDB;NTNU.Curated	19721712	PMID
42987	NFKB	hsa-miR-21	1.0	ExTRI	20813833;22387281	default activation
42988	NFKB	hsa-miR-224-5p	1.0	ExTRI	23474441;23988648	default activation
42989	AP1	hsa-miR-144-3p	1.0	ExTRI	23546882	default activation

42990 rows × 6 columns

# Run
tf_acts, tf_padj = dc.mt.ulm(data=data, net=collectri)

# Filter by sign padj
msk = (tf_padj.T < 0.05).iloc[:, 0]
tf_acts = tf_acts.loc[:, msk]

tf_acts

	APEX1	ARID1A	ARID4B	ARNT	ASXL1	ATF6	BARX2	BCL11A	BCL11B	BMAL2	...	TBX10	TCF21	TFAP2C	TP53	VEZF1	WWTR1	ZBTB33	ZEB2	ZNF804A	ZXDC
treatment.vs.control	-2.65907	-3.917428	-2.808733	3.174004	3.747021	2.707775	5.35855	2.649554	-4.204618	3.869672	...	-3.705089	6.163829	-2.579863	-3.571649	4.208695	-3.968034	3.238391	5.791132	3.366525	-2.734602

1 rows × 143 columns

dc.pl.barplot(data=tf_acts, name="treatment.vs.control", top=25, figsize=(3, 5))

../_images/3cd9175d4f3ebf7e7c6c51b095a59c19be3d33c901d30d4bf1e0f5a2b1606c54.png

# We obtain ligand-receptor interactions from Omnipath, and we keep only the receptors
# This is our list of a prior potential receptors from which we will infer the network
unique_receptors = set(op.interactions.LigRecExtra.get(genesymbols=True)["target_genesymbol"].values.tolist())
len(unique_receptors)

df_de_receptors = results_df.loc[results_df.index.intersection(unique_receptors)]
df_de_receptors = df_de_receptors[df_de_receptors.pvalue <= 1e-3]
df_de_receptors = df_de_receptors.sort_values(by="stat", ascending=False)
df_de_receptors

	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
CDH6	28267.981856	3.020228	0.077434	39.003986	0.000000e+00	0.000000e+00
CRLF1	7921.375581	3.342105	0.088004	37.976732	0.000000e+00	0.000000e+00
FZD8	2720.368078	3.843472	0.102667	37.436438	9.984600e-307	1.025261e-303
CDH2	7112.916583	2.809906	0.081906	34.306464	6.285559e-258	3.606802e-255
DYSF	441.551809	4.593718	0.182110	25.225001	2.130572e-140	3.711381e-138
...	...	...	...	...	...	...
TNFRSF1B	646.930651	-3.540723	0.137121	-25.821870	5.038067e-147	9.542979e-145
CHRM2	5807.106139	-2.477402	0.086007	-28.804799	1.867752e-182	5.521189e-180
TNFRSF19	4210.866189	-3.038970	0.101233	-30.019698	5.430089e-198	1.858615e-195
LPAR1	12716.045776	-2.581318	0.070856	-36.430237	1.414312e-290	1.103729e-287
SMAD3	13965.722982	-3.073656	0.072182	-42.582053	0.000000e+00	0.000000e+00

283 rows × 6 columns

dc.pl.volcano(
    data=df_de_receptors,
    x="log2FoldChange",
    y="padj",
    top=20,
)

../_images/343bfb791c3f33ef6b5748cd9651501b46a8b449199d4fb9ad53151ba0eb7148.png

# We will take the top 20 receptors that increased the expression after treatment
df_top_receptors = df_de_receptors.head(20)
df_top_receptors

	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
CDH6	28267.981856	3.020228	0.077434	39.003986	0.000000e+00	0.000000e+00
CRLF1	7921.375581	3.342105	0.088004	37.976732	0.000000e+00	0.000000e+00
FZD8	2720.368078	3.843472	0.102667	37.436438	9.984600e-307	1.025261e-303
CDH2	7112.916583	2.809906	0.081906	34.306464	6.285559e-258	3.606802e-255
DYSF	441.551809	4.593718	0.182110	25.225001	2.130572e-140	3.711381e-138
INHBA	24333.979039	2.599031	0.104354	24.905991	6.407468e-137	1.041748e-134
CELSR1	492.430456	3.517760	0.154404	22.782885	6.777783e-115	8.815637e-113
PDGFC	5482.100375	1.895284	0.083804	22.615764	3.032358e-113	3.917967e-111
VDR	1450.584956	2.555486	0.116713	21.895517	2.866339e-106	3.389228e-104
HHIP	379.147209	6.006281	0.287273	20.907896	4.538088e-97	4.426905e-95
IGF1	361.270588	6.463104	0.311704	20.734725	1.684142e-95	1.595030e-93
NPTN	12642.896518	1.419111	0.079336	17.887451	1.477122e-71	9.062470e-70
EFNB2	3320.622767	1.560757	0.087882	17.759772	1.448216e-70	8.640581e-69
ITGB3	6531.344110	1.365175	0.078855	17.312493	3.786612e-67	2.069378e-65
TGFB1	8949.673773	1.335719	0.077680	17.195118	2.888671e-66	1.552561e-64
IL21R	259.279854	3.228392	0.191940	16.819816	1.747011e-63	8.876088e-62
ITGA1	26227.719647	1.310776	0.079834	16.418868	1.401577e-60	6.653227e-59
FAP	7232.349199	1.751321	0.115213	15.200707	3.498272e-52	1.392883e-50
EGF	144.293093	3.733752	0.251374	14.853364	6.616320e-50	2.458751e-48
ITGA11	12473.737723	3.673720	0.250064	14.691129	7.347932e-49	2.630425e-47

Inferring intracellular signalling network with CARNIVAL and CORNETO#

CORNETO is a unified framework for knowledge-driven network inference. It includes a very flexible implementation of CARNIVAL that expands its original capabilities. We will see how to use it under different assumptions to extract a network from a prior knowledge network and a set of potential receptors + our estimated TFs

cn.info()

Installed version:	v1.0.0.dev37+db611743
Available backends:	CVXPY v1.6.5
Default backend (corneto.opt):	CVXPY
Installed solvers:	CLARABEL, GUROBI, HIGHS, OSQP, SCIP, SCIPY, SCS
Graphviz version:	v0.20.3
Installed path:	/Users/pablorodriguezmier/Documents/work/repos/pablormier/corneto/corneto
Repository:	https://github.com/saezlab/corneto

from corneto.methods.future import CarnivalFlow

CarnivalFlow.show_references()

Pablo Rodriguez-Mier, Martin Garrido-Rodriguez, Attila Gabor, Julio Saez-Rodriguez. Unified knowledge-driven network inference from omics data. bioRxiv, pp. 10 (2024).
Anika Liu, Panuwat Trairatphisan, Enio Gjerga, Athanasios Didangelos, Jonathan Barratt, Julio Saez-Rodriguez. From expression footprints to causal pathways: contextualizing large signaling networks with CARNIVAL. NPJ systems biology and applications, Vol. 5, No. 1, pp. 40 (2019).

# We get only interactions from SIGNOR http://signor.uniroma2.it/
pkn = op.interactions.OmniPath.get(databases=["SIGNOR"], genesymbols=True)
pkn = pkn[pkn.consensus_direction == True]
pkn.head()

	source	target	source_genesymbol	target_genesymbol	is_directed	is_stimulation	is_inhibition	consensus_direction	consensus_stimulation	consensus_inhibition	curation_effort	references	sources	n_sources	n_primary_sources	n_references	references_stripped
0	Q13976	Q13507	PRKG1	TRPC3	True	False	True	True	False	True	9	HPRD:14983059;KEA:14983059;ProtMapper:14983059...	HPRD;HPRD_KEA;HPRD_MIMP;KEA;MIMP;PhosphoPoint;...	15	8	2	14983059;16331690
1	Q13976	Q9HCX4	PRKG1	TRPC7	True	True	False	True	True	False	3	SIGNOR:21402151;TRIP:21402151;iPTMnet:21402151	SIGNOR;TRIP;iPTMnet	3	3	1	21402151
2	Q13438	Q9HBA0	OS9	TRPV4	True	True	True	True	True	True	3	HPRD:17932042;SIGNOR:17932042;TRIP:17932042	HPRD;SIGNOR;TRIP	3	3	1	17932042
3	P18031	Q9H1D0	PTPN1	TRPV6	True	False	True	True	False	True	11	DEPOD:15894168;DEPOD:17197020;HPRD:15894168;In...	DEPOD;HPRD;IntAct;Lit-BM-17;SIGNOR;SPIKE_LC;TRIP	7	6	2	15894168;17197020
4	P63244	Q9BX84	RACK1	TRPM6	True	False	True	True	False	True	2	SIGNOR:18258429;TRIP:18258429	SIGNOR;TRIP	2	2	1	18258429

pkn["interaction"] = pkn["is_stimulation"].astype(int) - pkn["is_inhibition"].astype(int)
sel_pkn = pkn[["source_genesymbol", "interaction", "target_genesymbol"]]
sel_pkn

	source_genesymbol	interaction	target_genesymbol
0	PRKG1	-1	TRPC3
1	PRKG1	1	TRPC7
2	OS9	0	TRPV4
3	PTPN1	-1	TRPV6
4	RACK1	-1	TRPM6
...	...	...	...
61542	APC_PUF60_SIAH1_SKP1_TBL1X	-1	CTNNB1
61543	MAP2K6	1	MAPK10
61544	PRKAA1	1	TP53
61545	CNOT1_CNOT10_CNOT11_CNOT2_CNOT3_CNOT4_CNOT6_CN...	-1	NANOS2
61546	WNT16	1	FZD3

60921 rows × 3 columns

# We create the CORNETO graph by importing the edges and interaction
from corneto.io import load_graph_from_sif_tuples

G = load_graph_from_sif_tuples([(r[0], r[1], r[2]) for _, r in sel_pkn.iterrows() if r[1] != 0])
G.shape

(5442, 60034)

# As measurements, we take the estimated TFs, we will filter out TFs with p-val > 0.001
significant_tfs = tf_acts[tf_padj <= 0.001].T.dropna().sort_values(by="treatment.vs.control", ascending=False)
significant_tfs

	treatment.vs.control
SRF	7.386495
MYOCD	7.347742
SMAD3	7.088954
TCF21	6.163829
ZEB2	5.791132
SMAD2	5.618762
JUNB	5.615222
FOXD1	5.569866
SOX5	5.389520
BARX2	5.358550
HMGA2	4.922228
POU3F1	4.907625
RORA	4.752798
SMAD4	4.444091
OVOL1	4.283683
MEF2A	4.237091
FGF2	4.210589
VEZF1	4.208695
SMAD5	-4.006207
CIITA	-4.135701
NFKBIB	-4.156185
REL	-4.166855
PITX3	-4.200417
BCL11B	-4.204618
IRF3	-4.274769
PITX1	-4.343406
PARK7	-4.447971
KLF8	-4.469644
DNMT3A	-4.642777
RELB	-4.684388
NFE2L2	-4.912650
MECOM	-4.945147
KLF2	-5.097029
STAT1	-5.190882
HOXC6	-5.261961
IRF2	-5.272425
KLF11	-5.272849
SPI1	-5.424463
NR4A3	-5.726818
NR1H4	-6.058745
SKIL	-7.437972
IRF1	-9.363009

# We keep only the ones in the PKN graph
measurements = significant_tfs.loc[significant_tfs.index.intersection(G.V)].to_dict()["treatment.vs.control"]
measurements

{'SRF': 7.386495312851475,
 'MYOCD': 7.3477418488593,
 'SMAD3': 7.088953922116064,
 'ZEB2': 5.7911322785963435,
 'SMAD2': 5.618762357005816,
 'JUNB': 5.615222387859854,
 'HMGA2': 4.922227935739035,
 'POU3F1': 4.907624790924594,
 'RORA': 4.752798433353922,
 'SMAD4': 4.444090714736604,
 'MEF2A': 4.237091117534177,
 'FGF2': 4.210588858175027,
 'SMAD5': -4.006207147710557,
 'CIITA': -4.1357011230543845,
 'NFKBIB': -4.156185003186268,
 'REL': -4.1668552776199554,
 'PITX3': -4.20041706737378,
 'IRF3': -4.274768908592069,
 'PITX1': -4.3434055546060915,
 'KLF8': -4.469643880727271,
 'DNMT3A': -4.64277743087254,
 'RELB': -4.684387581686191,
 'NFE2L2': -4.912649636784848,
 'MECOM': -4.945146818217912,
 'STAT1': -5.190882487350966,
 'HOXC6': -5.261961030902404,
 'IRF2': -5.272425389933752,
 'KLF11': -5.272848781428293,
 'SPI1': -5.424462691953871,
 'NR4A3': -5.726818220249099,
 'NR1H4': -6.058745290814488,
 'SKIL': -7.437972448180294,
 'IRF1': -9.363009055573007}

# We will infer the direction, so for the inputs, we use a value of 0 (=unknown direction)
inputs = {k: 0 for k in df_top_receptors.index.intersection(G.V).values}
inputs

{'CDH6': 0,
 'CRLF1': 0,
 'FZD8': 0,
 'CDH2': 0,
 'INHBA': 0,
 'VDR': 0,
 'IGF1': 0,
 'EFNB2': 0,
 'ITGB3': 0,
 'TGFB1': 0,
 'IL21R': 0,
 'EGF': 0,
 'ITGA11': 0}

# Create the dataset in standard format
carnival_data = dict()
for inp, v in inputs.items():
    carnival_data[inp] = dict(value=v, role="input", mapping="vertex")
for out, v in measurements.items():
    carnival_data[out] = dict(value=v, role="output", mapping="vertex")
data = cn.Data.from_cdict({"sample1": carnival_data})
data

Data(n_samples=1, n_feats=[46])

CARNIVAL#

We will use now the standard CARNIVAL with a regularization penalty of 0.5. Typically, we would need additional data or information to select the optimal parameter. For hypothesis exploration, we can try different values and manually inspect the solutions

carnival = CarnivalFlow(lambda_reg=0.5)
P = carnival.build(G, data)
P.expr

Unreachable vertices for sample: 0

{'_flow': _flow: Variable((3279,), _flow),
 'edge_activates': edge_activates: Variable((3279, 1), edge_activates, boolean=True),
 'edge_inhibits': edge_inhibits: Variable((3279, 1), edge_inhibits, boolean=True),
 'const0x94da39a381f04fe': const0x94da39a381f04fe: Constant(CONSTANT, NONNEGATIVE, (947, 3279)),
 '_dag_layer': _dag_layer: Variable((947, 1), _dag_layer),
 'const0x1aa7a82deb9aa0b6': const0x1aa7a82deb9aa0b6: Constant(CONSTANT, NONNEGATIVE, (947, 3279)),
 'flow': _flow: Variable((3279,), _flow),
 'vertex_value': Expression(AFFINE, UNKNOWN, (947, 1)),
 'vertex_activated': Expression(AFFINE, NONNEGATIVE, (947, 1)),
 'vertex_inhibited': Expression(AFFINE, NONNEGATIVE, (947, 1)),
 'edge_value': Expression(AFFINE, UNKNOWN, (3279, 1)),
 'edge_has_signal': Expression(AFFINE, NONNEGATIVE, (3279, 1)),
 'vertex_max_depth': _dag_layer: Variable((947, 1), _dag_layer)}

P.solve(solver="highs", max_seconds=60, verbosity=1);

===============================================================================
                                     CVXPY                                     
                                     v1.6.5                                    
===============================================================================
(CVXPY) Jun 11 06:19:10 PM: Your problem has 10784 variables, 30852 constraints, and 1 parameters.
(CVXPY) Jun 11 06:19:10 PM: It is compliant with the following grammars: DCP, DQCP
(CVXPY) Jun 11 06:19:10 PM: CVXPY will first compile your problem; then, it will invoke a numerical solver to obtain a solution.
(CVXPY) Jun 11 06:19:10 PM: Your problem is compiled with the CPP canonicalization backend.
-------------------------------------------------------------------------------
                                  Compilation                                  
-------------------------------------------------------------------------------
(CVXPY) Jun 11 06:19:10 PM: Compiling problem (target solver=HIGHS).
(CVXPY) Jun 11 06:19:10 PM: Reduction chain: Dcp2Cone -> CvxAttr2Constr -> ConeMatrixStuffing -> HIGHS
(CVXPY) Jun 11 06:19:10 PM: Applying reduction Dcp2Cone
(CVXPY) Jun 11 06:19:10 PM: Applying reduction CvxAttr2Constr
(CVXPY) Jun 11 06:19:10 PM: Applying reduction ConeMatrixStuffing
(CVXPY) Jun 11 06:19:10 PM: Applying reduction HIGHS
(CVXPY) Jun 11 06:19:10 PM: Finished problem compilation (took 6.457e-02 seconds).
(CVXPY) Jun 11 06:19:10 PM: (Subsequent compilations of this problem, using the same arguments, should take less time.)
-------------------------------------------------------------------------------
                                Numerical solver                               
-------------------------------------------------------------------------------
(CVXPY) Jun 11 06:19:10 PM: Invoking solver HIGHS  to obtain a solution.
Running HiGHS 1.11.0 (git hash: 364c83a): Copyright (c) 2025 HiGHS under MIT licence terms
MIP  has 30852 rows; 10784 cols; 108719 nonzeros; 6558 integer variables (6558 binary)
Coefficient ranges:
  Matrix [1e+00, 9e+02]
  Cost   [5e-01, 8e+00]
  Bound  [1e+00, 1e+00]
  RHS    [1e+00, 1e+03]
Presolving model
17921 rows, 10381 cols, 91785 nonzeros  0s
13133 rows, 9253 cols, 70701 nonzeros  0s
12754 rows, 9174 cols, 70260 nonzeros  0s

Solving MIP model with:
   12754 rows
   9174 cols (5768 binary, 0 integer, 0 implied int., 3406 continuous, 0 domain fixed)
   70260 nonzeros

Src: B => Branching; C => Central rounding; F => Feasibility pump; J => Feasibility jump;
     H => Heuristic; L => Sub-MIP; P => Empty MIP; R => Randomized rounding; Z => ZI Round;
     I => Shifting; S => Solve LP; T => Evaluate node; U => Unbounded; X => User solution;
     z => Trivial zero; l => Trivial lower; u => Trivial upper; p => Trivial point

        Nodes      |    B&B Tree     |            Objective Bounds              |  Dynamic Constraints |       Work      
Src  Proc. InQueue |  Leaves   Expl. | BestBound       BestSol              Gap |   Cuts   InLp Confl. | LpIters     Time

 J       0       0         0   0.00%   -inf            0                  Large        0      0      0         0     0.4s
         0       0         0   0.00%   -100.0153165    0                  Large        0      0      0      1834     0.5s
 L       0       0         0   0.00%   -97.38159462    -90.88159462       7.15%     1663     70      0      3162     4.0s

1.5% inactive integer columns, restarting
Model after restart has 12621 rows, 9052 cols (5650 bin., 0 int., 0 impl., 3402 cont., 0 dom.fix.), and 69099 nonzeros

         0       0         0   0.00%   -97.38159462    -90.88159462       7.15%       27      0      0     12413     6.0s
         0       0         0   0.00%   -97.38159462    -90.88159462       7.15%       27     19     21     13585     6.1s
 L       0       0         0   0.00%   -97.38159462    -96.88159462       0.52%      365     58     21     13760     8.1s

11.6% inactive integer columns, restarting
Model after restart has 10990 rows, 8004 cols (4780 bin., 0 int., 0 impl., 3224 cont., 0 dom.fix.), and 57133 nonzeros

         0       0         0   0.00%   -97.38159462    -96.88159462       0.52%       27      0      0     18245     8.3s
         0       0         0   0.00%   -97.38159462    -96.88159462       0.52%       27     16      5     19546     8.3s

Symmetry detection completed in 0.0s
Found 19 generator(s)

        34       1        16  46.09%   -97.38159462    -96.88159462       0.52%      603     27   1383     43341    13.5s

Restarting search from the root node
Model after restart has 869 rows, 2591 cols (75 bin., 0 int., 0 impl., 2516 cont., 0 dom.fix.), and 6771 nonzeros

        50       0         0   0.00%   -97.37842673    -96.88159462       0.51%       10      0      0     55711    16.4s
        51       0         1 100.00%   -96.88159462    -96.88159462       0.00%       10      0      0     56356    16.4s

Solving report
  Status            Optimal
  Primal bound      -96.8815946237
  Dual bound        -96.8815946237
  Gap               0% (tolerance: 0.01%)
  P-D integral      0.338566345924
  Solution status   feasible
                    -96.8815946237 (objective)
                    0 (bound viol.)
                    2.22044604925e-16 (int. viol.)
                    0 (row viol.)
  Timing            16.42 (total)
                    0.00 (presolve)
                    0.00 (solve)
                    0.00 (postsolve)
  Max sub-MIP depth 8
  Nodes             51
  Repair LPs        0 (0 feasible; 0 iterations)
  LP iterations     56356 (total)
                    28833 (strong br.)
                    1751 (separation)
                    13653 (heuristics)
-------------------------------------------------------------------------------
                                    Summary                                    
-------------------------------------------------------------------------------
(CVXPY) Jun 11 06:19:27 PM: Problem status: optimal
(CVXPY) Jun 11 06:19:27 PM: Optimal value: 2.650e+01
(CVXPY) Jun 11 06:19:27 PM: Compilation took 6.457e-02 seconds
(CVXPY) Jun 11 06:19:27 PM: Solver (including time spent in interface) took 1.643e+01 seconds

# Error for sample is 0, meaning that all reachable TFs (from receptors) are correctly explained
for o in P.objectives:
    print(o.name, "->", o.value)

error_sample1_0 -> 1.050898412285121e-15
regularization_edge_has_signal -> 53.0

# We extract the selected edges
sol_edges = np.flatnonzero(np.abs(P.expr.edge_value.value) > 0.5)
carnival.processed_graph.plot_values(
    vertex_values=P.expr.vertex_value.value,
    edge_values=P.expr.edge_value.value,
    edge_indexes=sol_edges,
)

../_images/a2e40127fd28fbe18a04e547913cfeda8b0c3f028fd9efc3a90dd9f72eb82137.svg

# Extracting the solution graph
G_sol = carnival.processed_graph.edge_subgraph(sol_edges)
G_sol.shape

(53, 53)

pd.DataFrame(
    P.expr.vertex_value.value,
    index=carnival.processed_graph.V,
    columns=["node_activity"],
)

	node_activity
PTPRC	0.0
GEM	0.0
MYO10	0.0
ARHGEF12	0.0
TP53RK	0.0
...	...
NUAK1	0.0
CAMKK2	0.0
P0C725_UBB	0.0
CSNK2B	0.0
ANAPC1_ANAPC10_ANAPC11_ANAPC13_ANAPC2_ANAPC4_ANAPC5_ANAPC7_CDC16_CDC23_CDC26_CDC27	0.0

947 rows × 1 columns

pd.DataFrame(P.expr.edge_value.value, index=carnival.processed_graph.E, columns=["edge_activity"])

		edge_activity
(SMAD3)	(MYOD1)	0.0
(GRK2)	(BDKRB2)	0.0
(MAPK14)	(MAPKAPK2)	0.0
(DEPTOR_EEF1A1_MLST8_MTOR_PRR5_RICTOR)	(FBXW8)	0.0
(SLK)	(MAP3K5)	0.0
...	...	...
()	(FZD8)	0.0
	(IL21R)	0.0
	(EGF)	0.0
	(TGFB1)	0.0
	(CDH2)	0.0

3279 rows × 1 columns

Changing modelling assumptions#

Thanks to CORNETO’s modeling capabilities, we can manipulate the problems to add more knowledge, new objectives or different constraints. As an example, we are going to penalize the use of inhibitory interactions from the PKN to simplify the reconstructed networks in CARNIVAL

edge_interactions = np.array(carnival.processed_graph.get_attr_from_edges("interaction"))
penalties = np.zeros_like(edge_interactions)
# We add a large penalty to each interaction which is inhibitory (-1)
penalties[edge_interactions == -1] = 10
penalties.shape

(3279,)

# We use the edge_has_signal variable (1 if selected to propagate signal, 0 otherwise),
# and we multiply with  the penalties
(P.expr.edge_has_signal.T @ penalties).shape

(1,)

P.add_objectives(P.expr.edge_has_signal.T @ penalties)
P.solve(solver="highs", max_seconds=60, verbosity=1);

===============================================================================
                                     CVXPY                                     
                                     v1.6.5                                    
===============================================================================
(CVXPY) Jun 11 06:19:27 PM: Your problem has 10784 variables, 30852 constraints, and 1 parameters.
(CVXPY) Jun 11 06:19:27 PM: It is compliant with the following grammars: DCP, DQCP
(CVXPY) Jun 11 06:19:27 PM: CVXPY will first compile your problem; then, it will invoke a numerical solver to obtain a solution.
(CVXPY) Jun 11 06:19:27 PM: Your problem is compiled with the CPP canonicalization backend.
-------------------------------------------------------------------------------
                                  Compilation                                  
-------------------------------------------------------------------------------
(CVXPY) Jun 11 06:19:27 PM: Compiling problem (target solver=HIGHS).
(CVXPY) Jun 11 06:19:27 PM: Reduction chain: Dcp2Cone -> CvxAttr2Constr -> ConeMatrixStuffing -> HIGHS
(CVXPY) Jun 11 06:19:27 PM: Applying reduction Dcp2Cone
(CVXPY) Jun 11 06:19:27 PM: Applying reduction CvxAttr2Constr
(CVXPY) Jun 11 06:19:27 PM: Applying reduction ConeMatrixStuffing
(CVXPY) Jun 11 06:19:27 PM: Applying reduction HIGHS
(CVXPY) Jun 11 06:19:27 PM: Finished problem compilation (took 5.865e-02 seconds).
(CVXPY) Jun 11 06:19:27 PM: (Subsequent compilations of this problem, using the same arguments, should take less time.)
-------------------------------------------------------------------------------
                                Numerical solver                               
-------------------------------------------------------------------------------
(CVXPY) Jun 11 06:19:27 PM: Invoking solver HIGHS  to obtain a solution.
Running HiGHS 1.11.0 (git hash: 364c83a): Copyright (c) 2025 HiGHS under MIT licence terms
MIP  has 30852 rows; 10784 cols; 108719 nonzeros; 6558 integer variables (6558 binary)
Coefficient ranges:
  Matrix [1e+00, 9e+02]
  Cost   [5e-01, 2e+01]
  Bound  [1e+00, 1e+00]
  RHS    [1e+00, 1e+03]
Presolving model
17921 rows, 10381 cols, 91785 nonzeros  0s
13026 rows, 9212 cols, 70234 nonzeros  0s
12645 rows, 9131 cols, 69798 nonzeros  0s

Solving MIP model with:
   12645 rows
   9131 cols (5727 binary, 0 integer, 0 implied int., 3404 continuous, 0 domain fixed)
   69798 nonzeros

Src: B => Branching; C => Central rounding; F => Feasibility pump; J => Feasibility jump;
     H => Heuristic; L => Sub-MIP; P => Empty MIP; R => Randomized rounding; Z => ZI Round;
     I => Shifting; S => Solve LP; T => Evaluate node; U => Unbounded; X => User solution;
     z => Trivial zero; l => Trivial lower; u => Trivial upper; p => Trivial point

        Nodes      |    B&B Tree     |            Objective Bounds              |  Dynamic Constraints |       Work      
Src  Proc. InQueue |  Leaves   Expl. | BestBound       BestSol              Gap |   Cuts   InLp Confl. | LpIters     Time

 J       0       0         0   0.00%   -inf            0                  Large        0      0      0         0     0.4s
         0       0         0   0.00%   -80.01008213    0                  Large        0      0      0      1331     0.5s
 C       0       0         0   0.00%   -72.82817579    -2.006207148    3530.14%      356     21      0      1844     1.2s
 L       0       0         0   0.00%   -72.41150912    -72.41150912       0.00%     1082     50      0      2307     2.5s
         1       0         1 100.00%   -72.41150912    -72.41150912       0.00%     1082     50      0      4414     2.5s

Solving report
  Status            Optimal
  Primal bound      -72.411509124
  Dual bound        -72.411509124
  Gap               0% (tolerance: 0.01%)
  P-D integral      46.7809634959
  Solution status   feasible
                    -72.411509124 (objective)
                    0 (bound viol.)
                    0 (int. viol.)
                    0 (row viol.)
  Timing            2.53 (total)
                    0.00 (presolve)
                    0.00 (solve)
                    0.00 (postsolve)
  Max sub-MIP depth 5
  Nodes             1
  Repair LPs        0 (0 feasible; 0 iterations)
  LP iterations     4414 (total)
                    0 (strong br.)
                    976 (separation)
                    2107 (heuristics)
-------------------------------------------------------------------------------
                                    Summary                                    
-------------------------------------------------------------------------------
(CVXPY) Jun 11 06:19:30 PM: Problem status: optimal
(CVXPY) Jun 11 06:19:30 PM: Optimal value: 5.097e+01
(CVXPY) Jun 11 06:19:30 PM: Compilation took 5.865e-02 seconds
(CVXPY) Jun 11 06:19:30 PM: Solver (including time spent in interface) took 2.544e+00 seconds

# Note that the networks have a larger error (many TFs are now not explained)
for o in P.objectives:
    print(o.value)

26.970085499734868
48.0
[0.]

# We extract the selected edges. Networks are simpler,
# and they show again some activation of CDH2 and inhibition of IGF1
sol_edges = np.flatnonzero(np.abs(P.expr.edge_value.value) > 0.5)
carnival.processed_graph.plot_values(
    vertex_values=P.expr.vertex_value.value,
    edge_values=P.expr.edge_value.value,
    edge_indexes=sol_edges,
)

../_images/aebeead010ccc046a49ec68b0f4bb04c1f69bf064a8af0ac293906bc563827e7.svg

Other options#

We included an option to simplify the networks inferred by CARNIVAL, by penalizing the selection of indirect rules. These rules are:

A -> B in the PKN, and we observe B is downregulated, we explain it with downregulation of its upstream gene A
A -| B in the PKN, and we observe B is upregulated, we explain it with downregulation of A

NOTE: This is an experimental feature and was not benchmarked in the original CARNIVAL paper.

carnival = CarnivalFlow(lambda_reg=0.5, indirect_rule_penalty=10)
P = carnival.build(G, data)
P.expr

Unreachable vertices for sample: 0

{'_flow': _flow: Variable((3279,), _flow),
 'edge_inhibits': edge_inhibits: Variable((3279, 1), edge_inhibits, boolean=True),
 'edge_activates': edge_activates: Variable((3279, 1), edge_activates, boolean=True),
 'const0x1735def7420c5858': const0x1735def7420c5858: Constant(CONSTANT, NONNEGATIVE, (947, 3279)),
 '_dag_layer': _dag_layer: Variable((947, 1), _dag_layer),
 'const0xb46ccbceabe3b17': const0xb46ccbceabe3b17: Constant(CONSTANT, NONNEGATIVE, (947, 3279)),
 'flow': _flow: Variable((3279,), _flow),
 'vertex_value': Expression(AFFINE, UNKNOWN, (947, 1)),
 'vertex_activated': Expression(AFFINE, NONNEGATIVE, (947, 1)),
 'vertex_inhibited': Expression(AFFINE, NONNEGATIVE, (947, 1)),
 'edge_value': Expression(AFFINE, UNKNOWN, (3279, 1)),
 'edge_has_signal': Expression(AFFINE, NONNEGATIVE, (3279, 1)),
 'vertex_max_depth': _dag_layer: Variable((947, 1), _dag_layer)}

P.solve(solver="highs", max_seconds=60, verbosity=1);

===============================================================================
                                     CVXPY                                     
                                     v1.6.5                                    
===============================================================================
(CVXPY) Jun 11 06:19:30 PM: Your problem has 10784 variables, 30852 constraints, and 1 parameters.
(CVXPY) Jun 11 06:19:30 PM: It is compliant with the following grammars: DCP, DQCP
(CVXPY) Jun 11 06:19:30 PM: CVXPY will first compile your problem; then, it will invoke a numerical solver to obtain a solution.
(CVXPY) Jun 11 06:19:30 PM: Your problem is compiled with the CPP canonicalization backend.
-------------------------------------------------------------------------------
                                  Compilation                                  
-------------------------------------------------------------------------------
(CVXPY) Jun 11 06:19:30 PM: Compiling problem (target solver=HIGHS).
(CVXPY) Jun 11 06:19:30 PM: Reduction chain: Dcp2Cone -> CvxAttr2Constr -> ConeMatrixStuffing -> HIGHS
(CVXPY) Jun 11 06:19:30 PM: Applying reduction Dcp2Cone
(CVXPY) Jun 11 06:19:30 PM: Applying reduction CvxAttr2Constr
(CVXPY) Jun 11 06:19:30 PM: Applying reduction ConeMatrixStuffing
(CVXPY) Jun 11 06:19:30 PM: Applying reduction HIGHS
(CVXPY) Jun 11 06:19:30 PM: Finished problem compilation (took 7.756e-02 seconds).
(CVXPY) Jun 11 06:19:30 PM: (Subsequent compilations of this problem, using the same arguments, should take less time.)
-------------------------------------------------------------------------------
                                Numerical solver                               
-------------------------------------------------------------------------------
(CVXPY) Jun 11 06:19:30 PM: Invoking solver HIGHS  to obtain a solution.
Running HiGHS 1.11.0 (git hash: 364c83a): Copyright (c) 2025 HiGHS under MIT licence terms
MIP  has 30852 rows; 10784 cols; 108719 nonzeros; 6558 integer variables (6558 binary)
Coefficient ranges:
  Matrix [1e+00, 9e+02]
  Cost   [5e-01, 2e+01]
  Bound  [1e+00, 1e+00]
  RHS    [1e+00, 1e+03]
Presolving model
17921 rows, 10381 cols, 91785 nonzeros  0s
12937 rows, 9173 cols, 69770 nonzeros  0s
12555 rows, 9092 cols, 69332 nonzeros  0s

Solving MIP model with:
   12555 rows
   9092 cols (5685 binary, 0 integer, 0 implied int., 3407 continuous, 0 domain fixed)
   69332 nonzeros

Src: B => Branching; C => Central rounding; F => Feasibility pump; J => Feasibility jump;
     H => Heuristic; L => Sub-MIP; P => Empty MIP; R => Randomized rounding; Z => ZI Round;
     I => Shifting; S => Solve LP; T => Evaluate node; U => Unbounded; X => User solution;
     z => Trivial zero; l => Trivial lower; u => Trivial upper; p => Trivial point

        Nodes      |    B&B Tree     |            Objective Bounds              |  Dynamic Constraints |       Work      
Src  Proc. InQueue |  Leaves   Expl. | BestBound       BestSol              Gap |   Cuts   InLp Confl. | LpIters     Time

 J       0       0         0   0.00%   -inf            0                  Large        0      0      0         0     0.4s
         0       0         0   0.00%   -54.39752097    0                  Large        0      0     16       912     0.5s
 L       0       0         0   0.00%   -49.48644796    -48.98644796       1.02%      896     48     35      1408     3.0s

62.0% inactive integer columns, restarting
Model after restart has 4380 rows, 4101 cols (1192 bin., 0 int., 0 impl., 2909 cont., 0 dom.fix.), and 18621 nonzeros

         0       0         0   0.00%   -49.48644796    -48.98644796       1.02%       16      0      0      6197     3.4s
         0       0         0   0.00%   -49.48644796    -48.98644796       1.02%       16     13      7      6871     3.5s

Symmetry detection completed in 0.0s
Found 10 generator(s)

         5       0         3 100.00%   -48.98644796    -48.98644796       0.00%      726     19     46     13939     4.5s

Solving report
  Status            Optimal
  Primal bound      -48.9864479649
  Dual bound        -48.9864479649
  Gap               0% (tolerance: 0.01%)
  P-D integral      0.0145504962008
  Solution status   feasible
                    -48.9864479649 (objective)
                    0 (bound viol.)
                    0 (int. viol.)
                    0 (row viol.)
  Timing            4.46 (total)
                    0.00 (presolve)
                    0.00 (solve)
                    0.00 (postsolve)
  Max sub-MIP depth 5
  Nodes             5
  Repair LPs        0 (0 feasible; 0 iterations)
  LP iterations     13939 (total)
                    771 (strong br.)
                    770 (separation)
                    10556 (heuristics)
-------------------------------------------------------------------------------
                                    Summary                                    
-------------------------------------------------------------------------------
(CVXPY) Jun 11 06:19:35 PM: Problem status: optimal
(CVXPY) Jun 11 06:19:35 PM: Optimal value: 7.440e+01
(CVXPY) Jun 11 06:19:35 PM: Compilation took 7.756e-02 seconds
(CVXPY) Jun 11 06:19:35 PM: Solver (including time spent in interface) took 4.473e+00 seconds

sol_edges = np.flatnonzero(np.abs(P.expr.edge_value.value) > 0.5)
carnival.processed_graph.plot_values(
    vertex_values=P.expr.vertex_value.value,
    edge_values=P.expr.edge_value.value,
    edge_indexes=sol_edges,
)

../_images/4d627a4182090803f9c611b7f105aadb2afcface264d6e5a1471a8b74acbe181.svg

Using information from PROGENy#

In the previous examples, we used only information from TF activities and de-regulated receptors, and we tested different options. However, results can be highly variable, due to uncertainty in the intermediate signalling proteins (downstream receptors and upstream TFs), and due to the existence of multiple paths in the PKN.

Additional information can be used to bias the reconstruction towards more plausible signalling pathways. Here, we will use information from PROGENy, which infers pathway activities from gene expression data.

progeny = dc.op.progeny(organism="human")
progeny

	source	target	weight	padj
0	Androgen	TMPRSS2	11.490631	2.384806e-47
1	Androgen	NKX3-1	10.622551	2.205102e-44
2	Androgen	MBOAT2	10.472733	4.632376e-44
3	Androgen	KLK2	10.176186	1.944410e-40
4	Androgen	SARG	11.386852	2.790210e-40
...	...	...	...	...
62456	p53	ENPP2	2.771405	4.993215e-02
62457	p53	ARRDC4	3.494328	4.996747e-02
62458	p53	MYO1B	-1.148057	4.997905e-02
62459	p53	CTSC	-1.784693	4.998864e-02
62460	p53	NAA50	-1.435013	4.998884e-02

62461 rows × 4 columns

# We will use the PyDeseq2 stat as the score for the genes
de_data = results_df[["stat"]].T.rename(index={"stat": "treatment.vs.control"})
de_data

	RP11-347I19.8	TMEM170B	RP11-91P24.6	SLC33A1	SNORA61	THAP9-AS1	LIX1L	TTC8	WBSCR22	LPHN2	...	STARD4-AS1	ZNF845	NIPSNAP3B	ARL6IP5	MATN1-AS1	RP11-75I2.3	SLC25A24	BTN2A2	RP11-101E3.5	RP11-91I8.3
treatment.vs.control	-0.820242	-1.315516	-1.69062	6.990075	2.803032	1.531275	2.164138	-0.579879	-0.945672	-14.351588	...	14.484214	0.56727	-1.883272	-8.42519	-1.739921	29.760449	7.431841	-1.015945	0.508484	-4.134989

1 rows × 19510 columns

# Run
pw_acts, pw_padj = dc.mt.ulm(data=de_data, net=progeny)

# Filter by sign padj
msk = (pw_padj.T < 0.05).iloc[:, 0]
pw_acts = pw_acts.loc[:, msk]

pw_acts

	Androgen	EGFR	Estrogen	Hypoxia	JAK-STAT	MAPK	NFkB	PI3K	TGFb	TNFa	VEGF	p53
treatment.vs.control	4.784375	17.220881	11.01434	5.69894	-14.308169	25.560896	-10.195328	24.644391	59.32932	-9.129273	11.437564	-12.589745

dc.pl.barplot(data=pw_acts, name="treatment.vs.control", top=25, figsize=(3, 3))

../_images/2cf50048be85c256034a32f66c5bfadae711f411a476c04560af44aa5a28c7b9.png

We can see that the TGFb pathway is predicted to be stimulated, using the observed downstream changes in gene expression. Now that we have this info, we can bias the networks towards the selection of signalling proteins in that pathway. We can check first the up/down genes in that pathway using the PROGENy network:

dc.pl.source_targets(data=results_df, x="weight", y="stat", net=progeny, name="TGFb", top=15, figsize=(4, 4))

../_images/59bd9c7ee19f3f70180b948962cae58516b15cb27e3a6a70a76865180cff9ddf.png

df_progeny_with_de = progeny[(progeny.source == "TGFb")].set_index("target").join(results_df, rsuffix="_deseq")
df_progeny_with_de = df_progeny_with_de[(df_progeny_with_de.padj_deseq <= 1e-2) & (df_progeny_with_de.padj <= 1e-2)]

# We remove the TFs, we keep only unscored nodes in the network
df_progeny_with_de = df_progeny_with_de.loc[df_progeny_with_de.index.difference(tf_acts.columns)]
# We will calculate an score of importance, based on the weight from progeny PKN (the coefficient of importance)
# and the stat from DESeq2 for that gene in our dataset
df_progeny_with_de["score"] = df_progeny_with_de["weight"].abs() * df_progeny_with_de["stat"].abs()
df_progeny_with_de

	source	weight	padj	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj_deseq	score
ABCA6	TGFb	-3.271537	3.259776e-08	21.312146	-3.113671	0.577035	-5.395978	6.815130e-08	4.058705e-07	17.653141
ABCA8	TGFb	-1.709245	9.264710e-08	1241.936380	-2.025544	0.110001	-18.413859	1.017102e-75	6.681365e-74	31.473798
ABL1	TGFb	1.417354	6.789650e-03	11992.405040	0.654017	0.077665	8.421013	3.732418e-17	4.470195e-16	11.935557
ACOX1	TGFb	1.248375	1.198791e-04	1932.465662	0.501733	0.090449	5.547157	2.903517e-08	1.797196e-07	6.924934
ACOX3	TGFb	2.721529	3.738116e-09	1298.226294	1.249147	0.089501	13.956818	2.859170e-44	9.070311e-43	37.983882
...	...	...	...	...	...	...	...	...	...	...
ZMIZ1	TGFb	1.957381	6.461094e-05	7721.447043	0.596702	0.085546	6.975202	3.054313e-12	2.685428e-11	13.653126
ZNF281	TGFb	1.989687	3.593457e-04	2098.460557	0.586130	0.094963	6.172166	6.736069e-10	4.833421e-09	12.280681
ZNF365	TGFb	1.237670	4.895429e-03	719.298471	2.121362	0.124407	17.051793	3.389758e-65	1.797124e-63	21.104492
ZNF48	TGFb	1.745462	1.500992e-05	638.023400	0.728684	0.105677	6.895368	5.372537e-12	4.650319e-11	12.035601
ZNF592	TGFb	0.740719	5.914911e-03	1654.348595	0.258427	0.088101	2.933306	3.353729e-03	9.545039e-03	2.172756

600 rows × 10 columns

# We will favor now the inclusion of these genes. We will
# increase lambda to get smaller networks for interpretation, and
# we will use a large penalty for indirect rules.
carnival = CarnivalFlow(lambda_reg=1.5, indirect_rule_penalty=1)
P = carnival.build(G, data)

Unreachable vertices for sample: 0

# We need to select the subset of genes in the processed graph, since after building carnival using the
# inputs (receptors) and TFs, the processed graph is pruned (unreachable genes removed) and thus many
# of the genes we want to score are not anymore in the graph
tgfb_genes = df_progeny_with_de.index.intersection(carnival.processed_graph.V).tolist()
len(tgfb_genes)

# We can visualize only the genes in the pruned PKN that carnival will use
# NOTE: The processed graph is the pruned PKN containing all interactions
# connecting the selected receptors to TFs, before inferring the signalling
# networks
dc.pl.source_targets(
    data=results_df.loc[tgfb_genes], x="weight", y="stat", net=progeny, name="TGFb", top=15, figsize=(4, 4)
)

../_images/6902e78e030e16697622e671876a088a8a08fe0b81269c4e06e7654eed782868.png

import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(16, 4))
df_progeny_with_de.loc[tgfb_genes, "score"].sort_values(ascending=False).plot.bar(ax=ax)
fig.suptitle("Importance scores of PROGENY TGFb genes for the given dataset");

../_images/4702995f86e158e5fca583cf58cc1afeb6e0a562c9d82ea388030ee79c35e152.png

# Scale factor for influence in the overall CARNIVAL obj function
influence_factor = 0.1

# Create the custom obj function to include in CARNIVAL
vertex_scores = np.zeros(len(carnival.processed_graph.V))
vertex_selected_var = P.expr.vertex_activated + P.expr.vertex_inhibited
for i, v in enumerate(carnival.processed_graph.V):
    if v in tgfb_genes:
        score = df_progeny_with_de.loc[v].score
        vertex_scores[i] = score
        print(i, v, score)

# The objective is just c^T @ penalties
obj = vertex_selected_var.T @ vertex_scores

# We add a negative weight to maximize selection
P.add_objective(obj, name="progeny_genes_score", weight=-influence_factor)
P.solve(solver="highs", max_seconds=60, verbosity=1);

21 NRAS 9.055651210075709
25 GNAI1 19.972682422333143
27 RAC2 25.75672169584716
44 STK38L 71.06848321367565
84 TGFBR1 19.78318967446914
101 CDH2 112.81029585047202
126 MAP3K4 18.633117866454263
148 SMURF1 5.687767369319471
189 SMAD9 29.108495622484853
208 CAMKK1 9.853578482249983
224 KIT 60.51896993721089
287 RARA 10.473371785649258
288 ULK1 16.78905409561297
295 HES1 27.774456156272613
314 UBE3A 4.198015442492376
324 SH3PXD2A 90.03028888776223
327 NOTCH2 6.0877870556111855
335 IGF1 29.197953109742727
370 NCOR2 5.253273919130532
412 ZMIZ1 13.653126134909561
434 RARG 4.459674733843175
438 IL13RA1 7.412771495771951
470 SPRY1 16.340142523123966
480 TSHZ3 10.746459209540498
533 HBEGF 25.871592750755017
538 LMNA 4.330586475862753
556 CDC25B 24.899175740820784
608 SHC3 11.348956411416047
643 DVL1 8.484789724643731
672 NRBF2 31.751626499871815
700 FN1 103.98952712958402
715 SMAD7 117.45746080491878
730 TGFBR2 45.037266815474084
749 SKI 9.483452356595494
758 GRB10 10.21623996260473
828 BDKRB2 28.78488734209394
881 AKT3 10.323123896941524
883 TGFB1 34.09092982693877
885 MAP3K5 46.20501733040189
919 KSR1 8.971244710059793
922 BMPR2 36.36346180289949
927 OXTR 57.57638716147753
937 FERMT2 57.35922796009637
940 ABL1 11.93555671963558
942 NUAK1 247.3297776135959
===============================================================================
                                     CVXPY                                     
                                     v1.6.5                                    
===============================================================================
(CVXPY) Jun 11 06:19:42 PM: Your problem has 10784 variables, 30852 constraints, and 1 parameters.
(CVXPY) Jun 11 06:19:42 PM: It is compliant with the following grammars: DCP, DQCP
(CVXPY) Jun 11 06:19:42 PM: CVXPY will first compile your problem; then, it will invoke a numerical solver to obtain a solution.
(CVXPY) Jun 11 06:19:42 PM: Your problem is compiled with the CPP canonicalization backend.
-------------------------------------------------------------------------------
                                  Compilation                                  
-------------------------------------------------------------------------------
(CVXPY) Jun 11 06:19:42 PM: Compiling problem (target solver=HIGHS).
(CVXPY) Jun 11 06:19:42 PM: Reduction chain: Dcp2Cone -> CvxAttr2Constr -> ConeMatrixStuffing -> HIGHS
(CVXPY) Jun 11 06:19:42 PM: Applying reduction Dcp2Cone
(CVXPY) Jun 11 06:19:42 PM: Applying reduction CvxAttr2Constr
(CVXPY) Jun 11 06:19:42 PM: Applying reduction ConeMatrixStuffing
(CVXPY) Jun 11 06:19:42 PM: Applying reduction HIGHS
(CVXPY) Jun 11 06:19:42 PM: Finished problem compilation (took 6.605e-02 seconds).
(CVXPY) Jun 11 06:19:42 PM: (Subsequent compilations of this problem, using the same arguments, should take less time.)
-------------------------------------------------------------------------------
                                Numerical solver                               
-------------------------------------------------------------------------------
(CVXPY) Jun 11 06:19:42 PM: Invoking solver HIGHS  to obtain a solution.
Running HiGHS 1.11.0 (git hash: 364c83a): Copyright (c) 2025 HiGHS under MIT licence terms
MIP  has 30852 rows; 10784 cols; 108719 nonzeros; 6558 integer variables (6558 binary)
Coefficient ranges:
  Matrix [1e+00, 9e+02]
  Cost   [1e-02, 2e+01]
  Bound  [1e+00, 1e+00]
  RHS    [1e+00, 1e+03]
Presolving model
17921 rows, 10381 cols, 91785 nonzeros  0s
13093 rows, 9239 cols, 70459 nonzeros  0s
12721 rows, 9164 cols, 70013 nonzeros  0s

Solving MIP model with:
   12721 rows
   9164 cols (5762 binary, 0 integer, 0 implied int., 3402 continuous, 0 domain fixed)
   70013 nonzeros

Src: B => Branching; C => Central rounding; F => Feasibility pump; J => Feasibility jump;
     H => Heuristic; L => Sub-MIP; P => Empty MIP; R => Randomized rounding; Z => ZI Round;
     I => Shifting; S => Solve LP; T => Evaluate node; U => Unbounded; X => User solution;
     z => Trivial zero; l => Trivial lower; u => Trivial upper; p => Trivial point

        Nodes      |    B&B Tree     |            Objective Bounds              |  Dynamic Constraints |       Work      
Src  Proc. InQueue |  Leaves   Expl. | BestBound       BestSol              Gap |   Cuts   InLp Confl. | LpIters     Time

 J       0       0         0   0.00%   -inf            0                  Large        0      0      0         0     0.4s
         0       0         0   0.00%   -117.3104258    0                  Large        0      0      0      1299     0.5s
 C       0       0         0   0.00%   -113.9766863    -16.59196353     586.94%      568     24      0      1571     1.6s
 L       0       0         0   0.00%   -113.5162162    -113.5162162       0.00%      841     49      0      1753     2.4s
         1       0         1 100.00%   -113.5162162    -113.5162162       0.00%      841     49      0      2589     2.4s

Solving report
  Status            Optimal
  Primal bound      -113.51621621
  Dual bound        -113.51621621
  Gap               0% (tolerance: 0.01%)
  P-D integral      4.86057835395
  Solution status   feasible
                    -113.51621621 (objective)
                    0 (bound viol.)
                    0 (int. viol.)
                    0 (row viol.)
  Timing            2.39 (total)
                    0.00 (presolve)
                    0.00 (solve)
                    0.00 (postsolve)
  Max sub-MIP depth 2
  Nodes             1
  Repair LPs        0 (0 feasible; 0 iterations)
  LP iterations     2589 (total)
                    0 (strong br.)
                    454 (separation)
                    836 (heuristics)
-------------------------------------------------------------------------------
                                    Summary                                    
-------------------------------------------------------------------------------
(CVXPY) Jun 11 06:19:44 PM: Problem status: optimal
(CVXPY) Jun 11 06:19:44 PM: Optimal value: 9.865e+00
(CVXPY) Jun 11 06:19:44 PM: Compilation took 6.605e-02 seconds
(CVXPY) Jun 11 06:19:44 PM: Solver (including time spent in interface) took 2.406e+00 seconds

for o in P.objectives:
    print(o.name, o.value)

error_sample1_0 32.97458407452457
penalty_indirect_rules_0 [8.]
regularization_edge_has_signal 59.0
progeny_genes_score [1196.09205661]

sol_edges = np.flatnonzero(np.abs(P.expr.edge_value.value) > 0.5)
carnival.processed_graph.plot_values(
    vertex_values=P.expr.vertex_value.value,
    edge_values=P.expr.edge_value.value,
    edge_indexes=sol_edges,
)

../_images/11df107bb031e48f480da8287b6c92d653cca5c6963ae6d9669377a80eab76fc.svg

Advanced#

Finally, we will combine CVXPY with CORNETO to add a custom objective function that uses the cvxpy.max to penalize the depth of the vertex in the trees. Penalizing the depth will increase the number of different receptors used indirectly.

import cvxpy as cp

carnival = CarnivalFlow(lambda_reg=0, indirect_rule_penalty=10)
P = carnival.build(G, data)
P.add_objective(obj, name="progeny_genes_score", weight=-influence_factor)
P.add_objective(P.expr.vertex_max_depth.apply(cp.max), weight=0.1)
P.solve(solver="highs", verbosity=1)

Unreachable vertices for sample: 0
===============================================================================
                                     CVXPY                                     
                                     v1.6.5                                    
===============================================================================
(CVXPY) Jun 11 06:19:45 PM: Your problem has 17342 variables, 30852 constraints, and 1 parameters.
(CVXPY) Jun 11 06:19:45 PM: It is compliant with the following grammars: DCP, DQCP
(CVXPY) Jun 11 06:19:45 PM: CVXPY will first compile your problem; then, it will invoke a numerical solver to obtain a solution.
(CVXPY) Jun 11 06:19:45 PM: Your problem is compiled with the CPP canonicalization backend.
-------------------------------------------------------------------------------
                                  Compilation                                  
-------------------------------------------------------------------------------
(CVXPY) Jun 11 06:19:45 PM: Compiling problem (target solver=HIGHS).
(CVXPY) Jun 11 06:19:45 PM: Reduction chain: Dcp2Cone -> CvxAttr2Constr -> ConeMatrixStuffing -> HIGHS
(CVXPY) Jun 11 06:19:45 PM: Applying reduction Dcp2Cone
(CVXPY) Jun 11 06:19:45 PM: Applying reduction CvxAttr2Constr
(CVXPY) Jun 11 06:19:45 PM: Applying reduction ConeMatrixStuffing
(CVXPY) Jun 11 06:19:45 PM: Applying reduction HIGHS
(CVXPY) Jun 11 06:19:45 PM: Finished problem compilation (took 7.298e-02 seconds).
(CVXPY) Jun 11 06:19:45 PM: (Subsequent compilations of this problem, using the same arguments, should take less time.)
-------------------------------------------------------------------------------
                                Numerical solver                               
-------------------------------------------------------------------------------
(CVXPY) Jun 11 06:19:45 PM: Invoking solver HIGHS  to obtain a solution.
Running HiGHS 1.11.0 (git hash: 364c83a): Copyright (c) 2025 HiGHS under MIT licence terms
MIP  has 31799 rows; 17343 cols; 110613 nonzeros; 13116 integer variables (13116 binary)
Coefficient ranges:
  Matrix [1e+00, 9e+02]
  Cost   [1e-01, 2e+01]
  Bound  [1e+00, 1e+00]
  RHS    [1e+00, 1e+03]
Presolving model
18862 rows, 10395 cols, 93709 nonzeros  0s
13892 rows, 9200 cols, 71597 nonzeros  0s
13441 rows, 9121 cols, 71009 nonzeros  0s

Solving MIP model with:
   13441 rows
   9121 cols (5690 binary, 0 integer, 0 implied int., 3431 continuous, 0 domain fixed)
   71009 nonzeros

Src: B => Branching; C => Central rounding; F => Feasibility pump; J => Feasibility jump;
     H => Heuristic; L => Sub-MIP; P => Empty MIP; R => Randomized rounding; Z => ZI Round;
     I => Shifting; S => Solve LP; T => Evaluate node; U => Unbounded; X => User solution;
     z => Trivial zero; l => Trivial lower; u => Trivial upper; p => Trivial point

        Nodes      |    B&B Tree     |            Objective Bounds              |  Dynamic Constraints |       Work      
Src  Proc. InQueue |  Leaves   Expl. | BestBound       BestSol              Gap |   Cuts   InLp Confl. | LpIters     Time

 J       0       0         0   0.00%   -inf            -908.2539388       Large        0      0      0         0     0.4s
         0       0         0   0.00%   -980.5254038    -908.2539388       7.96%        0      0      0      1121     0.5s
 C       0       0         0   0.00%   -974.7403867    -941.8118251       3.50%      523     89      0      1403     1.4s
 L       0       0         0 100.00%   -974.7403867    -974.7403867       0.00%      892     59      0      1957     2.0s
         1       0         1 100.00%   -974.7403867    -974.7403867       0.00%      892     59      0      2666     2.0s

Solving report
  Status            Optimal
  Primal bound      -974.740386722
  Dual bound        -974.740386722
  Gap               0% (tolerance: 0.01%)
  P-D integral      0.0936236829513
  Solution status   feasible
                    -974.740386722 (objective)
                    0 (bound viol.)
                    0 (int. viol.)
                    0 (row viol.)
  Timing            2.04 (total)
                    0.00 (presolve)
                    0.00 (solve)
                    0.00 (postsolve)
  Max sub-MIP depth 1
  Nodes             1
  Repair LPs        0 (0 feasible; 0 iterations)
  LP iterations     2666 (total)
                    0 (strong br.)
                    836 (separation)
                    709 (heuristics)
-------------------------------------------------------------------------------
                                    Summary                                    
-------------------------------------------------------------------------------
(CVXPY) Jun 11 06:19:47 PM: Problem status: optimal
(CVXPY) Jun 11 06:19:47 PM: Optimal value: -8.514e+02
(CVXPY) Jun 11 06:19:47 PM: Compilation took 7.298e-02 seconds
(CVXPY) Jun 11 06:19:47 PM: Solver (including time spent in interface) took 2.060e+00 seconds

Problem(Minimize(Expression(CONVEX, UNKNOWN, (1,))), [Inequality(Constant(CONSTANT, ZERO, (3279,))), Inequality(Variable((3279,), _flow)), Inequality(Constant(CONSTANT, ZERO, (947, 1))), Inequality(Variable((947, 1), _dag_layer)), Equality(Expression(AFFINE, UNKNOWN, (947,)), Constant(CONSTANT, ZERO, ())), Inequality(Expression(AFFINE, NONNEGATIVE, (3279, 1))), Inequality(Expression(AFFINE, NONNEGATIVE, (947, 1))), Inequality(Expression(AFFINE, UNKNOWN, (3247,))), Inequality(Expression(AFFINE, UNKNOWN, (3247,))), Inequality(Constant(CONSTANT, NONNEGATIVE, (912,))), Inequality(Expression(AFFINE, NONNEGATIVE, (3279, 1))), Inequality(Expression(AFFINE, NONNEGATIVE, (3271, 1))), Inequality(Expression(AFFINE, NONNEGATIVE, (3271, 1)))])

for o in P.objectives:
    print(o.name, o.value)

error_sample1_0 56.895146658866466
penalty_indirect_rules_0 [0.]
regularization_edge_has_signal 42.0
progeny_genes_score [9094.53938757]
 12.0

sol_edges = np.flatnonzero(np.abs(P.expr.edge_value.value) > 0.5)
carnival.processed_graph.plot_values(
    vertex_values=P.expr.vertex_value.value,
    edge_values=P.expr.edge_value.value,
    edge_indexes=sol_edges,
)

../_images/3f8da1ddfd28c96b8f1aae7c0feab7a399c4ffceb4b1622506cc89488540b273.svg

Saving the processed dataset#

from corneto._data import GraphData

GraphData(G, data).save("carnival_transcriptomics_dataset")