Single sample intracellular signalling network inference#

In this notebook we showcase how to use the advanced CARNIVAL implementation available in CORNETO. This implementation extends the capabilities of the original CARNIVAL method by enabling advanced modelling and injection of knowledge for hypothesis generation. We will use a dataset consisting of 6 samples of hepatic stellate cells (HSC) where three of them were activated by the cytokine Transforming growth factor (TGF-β).

We will show how to estimate Transcription Factor activities from gene expression data, following the Decoupler tutorial for functional analysis. Then, we will use RIDDEN to estimate, in a similar fashion, receptor activities. Finally, we will use the CARNIVAL method available in CORNETO to infer a network from receptors to TFs, assuming that we don’t really know which treatment was used.

Please keep in mind that CARNIVAL, as any other method, is designed to facilitate systematic exploration of intracellular signalling hypotheses by leveraging curated knowledge-base networks. It is not intended to produce a single, definitive solution.

Parameter Sensitivity
Alterations in solver settings, threshold values, or choice of prior-knowledge networks can materially affect the inferred network.
Data Dependence
The characteristics of your dataset—experimental design, sample size, preprocessing, and normalization—will directly influence model outputs.
Knowledge-Base Limitations
Underlying pathway repositories may exhibit incomplete coverage and annotation biases.

Each network generated by CARNIVAL should be regarded as a potential, simplified hypothesis of the real signalling. Users are encouraged to compare multiple configurations and to validate the most promising models through further analysis and experimental validation.

NOTE: This tutorial uses the new Decoupler v2 version. We also use the highs free, open source solver, which has to be installed with pip install highspy

import gzip
import os
import shutil
import tempfile
import urllib.request
import pandas as pd

# https://decoupler-py.readthedocs.io
import decoupler as dc
import numpy as np

# https://omnipathdb.org/
import omnipath as op

# Pydeseq for differential expression analysis
from pydeseq2.dds import DefaultInference, DeseqDataSet
from pydeseq2.ds import DeseqStats

# https://corneto.org
import corneto as cn

dc.__version__

'2.0.6'

max_time = 300
seed = 0

# Load the dataset
adata = dc.ds.hsctgfb()
adata

AnnData object with n_obs × n_vars = 6 × 58674
    obs: 'condition', 'sample_id'

Data preprocessing#

We will use AnnData and PyDeseq2 to pre-process the data and compute differential expression between control and tretament

# Obtain genes that pass the thresholds
dc.pp.filter_by_expr(
    adata=adata,
    group="condition",
    min_count=10,
    min_total_count=15,
    large_n=10,
    min_prop=0.7,
)
adata

AnnData object with n_obs × n_vars = 6 × 19510
    obs: 'condition', 'sample_id'

# Build DESeq2 object
inference = DefaultInference(n_cpus=8)
dds = DeseqDataSet(
    adata=adata,
    design_factors="condition",
    refit_cooks=True,
    inference=inference,
)

# Compute LFCs
dds.deseq2()

# Extract contrast between conditions
stat_res = DeseqStats(dds, contrast=["condition", "treatment", "control"], inference=inference)

# Compute Wald test
stat_res.summary()

Using None as control genes, passed at DeseqDataSet initialization
Log2 fold change & Wald test p-value: condition treatment vs control
                  baseMean  log2FoldChange     lfcSE       stat  \
RP11-347I19.8    88.877997       -0.207125  0.252557  -0.820110   
TMEM170B        103.807801       -0.276960  0.210575  -1.315260   
RP11-91P24.6    185.514929       -0.305848  0.180937  -1.690352   
SLC33A1        2559.269047        0.595912  0.085253   6.989890   
SNORA61          66.168502        0.746770  0.266470   2.802454   
...                    ...             ...       ...        ...   
RP11-75I2.3     771.828248        3.812419  0.128113  29.758231   
SLC25A24       3525.523888        0.651017  0.087599   7.431750   
BTN2A2          674.018191       -0.107905  0.106223  -1.015833   
RP11-101E3.5    194.084339        0.092515  0.181973   0.508402   
RP11-91I8.3      10.761886       -3.804532  0.920191  -4.134502   

                      pvalue           padj  
RP11-347I19.8   4.121536e-01   5.540987e-01  
TMEM170B        1.884227e-01   3.054162e-01  
RP11-91P24.6    9.096068e-02   1.682470e-01  
SLC33A1         2.751015e-12   2.426415e-11  
SNORA61         5.071542e-03   1.382504e-02  
...                      ...            ...  
RP11-75I2.3    1.357073e-194  4.487540e-192  
SLC25A24        1.071704e-13   1.050701e-12  
BTN2A2          3.097088e-01   4.492492e-01  
RP11-101E3.5    6.111714e-01   7.290708e-01  
RP11-91I8.3     3.557244e-05   1.474126e-04  

[19510 rows x 6 columns]

results_df = stat_res.results_df
results_df

	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
RP11-347I19.8	88.877997	-0.207125	0.252557	-0.820110	4.121536e-01	5.540987e-01
TMEM170B	103.807801	-0.276960	0.210575	-1.315260	1.884227e-01	3.054162e-01
RP11-91P24.6	185.514929	-0.305848	0.180937	-1.690352	9.096068e-02	1.682470e-01
SLC33A1	2559.269047	0.595912	0.085253	6.989890	2.751015e-12	2.426415e-11
SNORA61	66.168502	0.746770	0.266470	2.802454	5.071542e-03	1.382504e-02
...	...	...	...	...	...	...
RP11-75I2.3	771.828248	3.812419	0.128113	29.758231	1.357073e-194	4.487540e-192
SLC25A24	3525.523888	0.651017	0.087599	7.431750	1.071704e-13	1.050701e-12
BTN2A2	674.018191	-0.107905	0.106223	-1.015833	3.097088e-01	4.492492e-01
RP11-101E3.5	194.084339	0.092515	0.181973	0.508402	6.111714e-01	7.290708e-01
RP11-91I8.3	10.761886	-3.804532	0.920191	-4.134502	3.557244e-05	1.474126e-04

19510 rows × 6 columns

data = results_df[["stat"]].T.rename(index={"stat": "treatment.vs.control"})
data

	RP11-347I19.8	TMEM170B	RP11-91P24.6	SLC33A1	SNORA61	THAP9-AS1	LIX1L	TTC8	WBSCR22	LPHN2	...	STARD4-AS1	ZNF845	NIPSNAP3B	ARL6IP5	MATN1-AS1	RP11-75I2.3	SLC25A24	BTN2A2	RP11-101E3.5	RP11-91I8.3
treatment.vs.control	-0.82011	-1.31526	-1.690352	6.98989	2.802454	1.531175	2.164055	-0.579835	-0.945628	-14.351065	...	14.484178	0.567176	-1.882852	-8.425175	-1.739571	29.758231	7.43175	-1.015833	0.508402	-4.134502

1 rows × 19510 columns

Estimation of Transcription Factor activities with Deocupler and CollecTRI#

# Retrieve CollecTRI gene regulatory network (through Omnipath).
# These are regulons that we will use for enrichment tests
# to infer TF activities
collectri = dc.op.collectri(organism="human")
collectri

	source	target	weight	resources	references	sign_decision
0	MYC	TERT	1.0	DoRothEA-A;ExTRI;HTRI;NTNU.Curated;Pavlidis202...	10022128;10491298;10606235;10637317;10723141;1...	PMID
1	SPI1	BGLAP	1.0	ExTRI	10022617	default activation
2	SMAD3	JUN	1.0	ExTRI;NTNU.Curated;TFactS;TRRUST	10022869;12374795	PMID
3	SMAD4	JUN	1.0	ExTRI;NTNU.Curated;TFactS;TRRUST	10022869;12374795	PMID
4	STAT5A	IL2	1.0	ExTRI	10022878;11435608;17182565;17911616;22854263;2...	default activation
...	...	...	...	...	...	...
42985	NFKB	hsa-miR-143-3p	1.0	ExTRI	19472311	default activation
42986	AP1	hsa-miR-206	1.0	ExTRI;GEREDB;NTNU.Curated	19721712	PMID
42987	NFKB	hsa-miR-21	1.0	ExTRI	20813833;22387281	default activation
42988	NFKB	hsa-miR-224-5p	1.0	ExTRI	23474441;23988648	default activation
42989	AP1	hsa-miR-144-3p	1.0	ExTRI	23546882	default activation

42990 rows × 6 columns

# Run
tf_acts, tf_padj = dc.mt.ulm(data=data, net=collectri)

# Filter by sign padj
msk = (tf_padj.T < 0.05).iloc[:, 0]
tf_acts = tf_acts.loc[:, msk]

tf_acts

	APEX1	ARID1A	ARID4B	ARNT	ASXL1	ATF6	BARX2	BCL11A	BCL11B	BMAL2	...	TBX10	TCF21	TFAP2C	TP53	VEZF1	WWTR1	ZBTB33	ZEB2	ZNF804A	ZXDC
treatment.vs.control	-2.659861	-3.917514	-2.807826	3.172495	3.746244	2.70642	5.357512	2.64909	-4.204186	3.868746	...	-3.7051	6.162749	-2.580217	-3.576216	4.207967	-3.967803	3.237428	5.790396	3.365598	-2.734615

1 rows × 141 columns

dc.pl.barplot(data=tf_acts, name="treatment.vs.control", top=25, figsize=(3, 5))

../../_images/d346a626104cdc6d56b04d240dc1cadcd8dfe1b1c9419586d88f5c401a626701.png

Estimation of Receptor activities with RIDDEN#

Now, we will use RIDDEN to estimate receptor activities. Here we know that the cells were treated with TGF-β, but we can assume that in general, this information might not be known. Let’s use RIDDEN with Decoupler to find out, from gene expression alone, which receptors seem to be responsible of the downstream changes in gene expression.

Barsi, S., Varga, E., Dimitrov, D., Saez-Rodriguez, J., Hunyady, L., & Szalai, B. (2025). RIDDEN: Data-driven inference of receptor activity from transcriptomic data. PLOS Computational Biology, 21(6), e1013188. https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1013188

# We obtain ligand-receptor interactions from Omnipath, and we keep only the receptors
# This is our list of a prior potential receptors from which we will infer the network
unique_receptors = set(op.interactions.LigRecExtra.get(genesymbols=True)["target_genesymbol"].values.tolist())
len(unique_receptors)

# We load the RIDDEN matrix
df_ridden = pd.read_csv(
    "https://github.com/basvaat/RIDDEN_tool/raw/aa3c9d9880f135e95a079865ee20ebec984cf54d/ridden_model/ridden_model_matrix.csv",
    index_col=0,
)
df_ridden

	GNPDA1	CDH3	HDAC6	PARP2	MAMLD1	DNAJB6	SMC4	ABCC5	ABCB6	DNM1L	...	NCAPD2	PAN2	LPGAT1	KIF14	CDC25A	CDC25B	OXSR1	MVP	CDC42	DMTF1
CALCRL	-0.009687	-0.007085	0.161641	-0.088799	0.141184	0.206650	-0.077551	0.089438	0.052901	-0.097550	...	-0.029438	0.093616	-0.111194	0.055189	-0.267147	-0.132507	0.075775	0.206272	-0.086620	0.089023
IGF1R	0.186149	-0.019531	-0.226755	0.241395	-0.006143	-0.195346	-0.074546	-0.110967	-0.093491	0.144233	...	-0.092025	-0.249430	-0.090110	-0.148717	0.507459	-0.383608	0.080747	-0.325099	0.122792	-0.085241
TEK	-0.184470	0.103735	-0.178644	0.217038	-0.030282	-0.098075	0.457752	0.060260	0.101412	0.249132	...	-0.022321	-0.144688	-0.008170	0.418425	0.345027	0.257878	0.283288	-0.180285	0.361584	0.121627
GCGR	0.154128	-0.062489	0.193690	0.018437	0.037480	0.053541	0.120568	0.101133	0.055264	-0.231345	...	-0.074465	0.546507	-0.077048	0.118103	0.106227	-0.102328	-0.023940	-0.072013	0.026697	0.201512
AXL	-0.575718	0.078760	-0.014661	0.292999	0.070583	-0.012008	0.039746	0.006026	0.060853	0.119499	...	0.236579	-0.145577	-0.109871	0.046791	0.301887	-0.046162	-0.022249	-0.197795	0.319452	0.002319
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
LAT	-0.090900	-0.085613	-0.050680	0.071466	-0.020198	-0.005307	0.152670	0.605020	0.339265	-0.044672	...	0.077842	0.541145	-0.112621	0.378760	-0.020636	-0.188372	-0.031565	-0.447521	0.152605	0.233283
KIR3DL2	-0.113160	-0.078494	0.076545	-0.078937	0.243577	-0.189946	0.286184	0.039433	-0.174081	-0.109293	...	-0.148066	0.118362	0.199176	0.109609	0.091345	0.243448	0.109597	0.032553	0.254494	0.058646
ROR1	-0.090646	0.012197	-0.271497	-0.158861	0.256878	-0.078914	-0.364736	-0.012256	-0.192445	-0.670970	...	-0.005142	-0.096561	-0.262883	-0.399374	-0.109447	-0.239205	-0.257403	0.030898	0.098747	0.123366
AGER	0.020215	0.288864	0.214776	-0.022412	-0.038031	-0.152819	0.157402	0.367379	-0.038862	-0.294257	...	-0.198762	0.227099	0.167598	0.154016	0.059484	0.124841	0.020183	0.083234	0.290658	0.194672
LPAR4	-0.207898	-0.260550	0.091162	0.236077	0.242604	-0.122191	0.661206	-0.262016	0.296903	-0.370393	...	-0.144404	-0.132523	1.057903	0.216050	0.265133	0.621863	0.151718	-0.158971	0.196567	-0.274069

229 rows × 978 columns

df_receptor_de = data.loc[:, data.columns.intersection(df_ridden.columns)]
df_receptor_de

	CDC25B	CCNB2	ABCC5	NPDC1	AGL	GMNN	BLCAP	ISOC1	SLC35B1	PAK1	...	HMGCR	LAMA3	FAM57A	SSBP2	CXCL2	GTPBP8	TRIB3	HDAC2	LGALS8	BIRC5
treatment.vs.control	-8.391308	6.554248	-4.139615	-2.506713	0.299272	6.582295	-2.918605	-9.658493	3.163861	3.969934	...	7.125527	-3.862645	7.445228	-8.977077	-6.753168	-5.144084	2.988088	5.638864	-1.495224	6.237914

1 rows × 923 columns

ridden_pkn = df_ridden.reset_index().melt(id_vars="index")
ridden_pkn.columns = ["source", "target", "weight"]
ridden_pkn

	source	target	weight
0	CALCRL	GNPDA1	-0.009687
1	IGF1R	GNPDA1	0.186149
2	TEK	GNPDA1	-0.184470
3	GCGR	GNPDA1	0.154128
4	AXL	GNPDA1	-0.575718
...	...	...	...
223957	LAT	DMTF1	0.233283
223958	KIR3DL2	DMTF1	0.058646
223959	ROR1	DMTF1	0.123366
223960	AGER	DMTF1	0.194672
223961	LPAR4	DMTF1	-0.274069

223962 rows × 3 columns

# Number of receptors in RIDDEN
len(ridden_pkn.source.unique())

# Run receptor enrichment with ULM
pw_acts, pw_padj = dc.mt.ulm(data=df_receptor_de, net=ridden_pkn)

df_ridden_stats = pd.concat([pw_acts, pw_padj]).T
df_ridden_stats.columns = ["activity", "padj"]
df_ridden_stats

	activity	padj
ABCA1	-3.200361	0.005805
ACKR3	0.021177	0.987421
ACTR2	-0.719527	0.610665
ACVR1	1.328386	0.322313
ACVR1B	3.479168	0.002621
...	...	...
TRAF2	-3.720026	0.001273
TYRO3	-2.529414	0.034332
UTS2R	-4.141094	0.000332
VIPR1	1.310146	0.327957
XCR1	-1.241386	0.341352

229 rows × 2 columns

# We inferred from gene expression that TGFBR1 is the most active receptor
df_ridden_stats[df_ridden_stats.padj <= 0.0001][["activity"]].sort_values(by="activity", ascending=True).plot.barh(
    figsize=(4, 8)
)

<Axes: >

../../_images/4ac6550ad492f44985ceb047fbb69b22e9c844db1a441493c2586ea135f5908e.png

Inferring intracellular signalling network with CARNIVAL and CORNETO#

CORNETO is a unified framework for knowledge-driven network inference. It includes a very flexible implementation of CARNIVAL that expands its original capabilities. We will see how to use it under different assumptions to extract a network from a prior knowledge network and a set of potential receptors + our estimated TFs

cn.info()

Installed version:	v1.0.0.dev43+3455818f
Available backends:	CVXPY v1.6.6
Default backend (corneto.opt):	CVXPY
Installed solvers:	CLARABEL, HIGHS, OSQP, SCIPY, SCS
Graphviz version:	v0.21
Installed path:	/Users/pablorodriguezmier/Documents/work/repos/pablormier/corneto/corneto
Repository:	https://github.com/saezlab/corneto

from corneto.methods.future import CarnivalFlow

CarnivalFlow.show_references()

Pablo Rodriguez-Mier, Martin Garrido-Rodriguez, Attila Gabor, Julio Saez-Rodriguez. Unified knowledge-driven network inference from omics data. bioRxiv, pp. 10 (2024).
Anika Liu, Panuwat Trairatphisan, Enio Gjerga, Athanasios Didangelos, Jonathan Barratt, Julio Saez-Rodriguez. From expression footprints to causal pathways: contextualizing large signaling networks with CARNIVAL. NPJ systems biology and applications, Vol. 5, No. 1, pp. 40 (2019).

# We get only interactions from SIGNOR http://signor.uniroma2.it/
pkn = op.interactions.OmniPath.get(databases=["SIGNOR"], genesymbols=True)
pkn = pkn[pkn.consensus_direction == True]
pkn.head()

	source	target	source_genesymbol	target_genesymbol	is_directed	is_stimulation	is_inhibition	consensus_direction	consensus_stimulation	consensus_inhibition	curation_effort	references	sources	n_sources	n_primary_sources	n_references	references_stripped
0	Q13976	Q13507	PRKG1	TRPC3	True	False	True	True	False	True	9	HPRD:14983059;KEA:14983059;ProtMapper:14983059...	HPRD;HPRD_KEA;HPRD_MIMP;KEA;MIMP;PhosphoPoint;...	15	8	2	14983059;16331690
1	Q13976	Q9HCX4	PRKG1	TRPC7	True	True	False	True	True	False	3	SIGNOR:21402151;TRIP:21402151;iPTMnet:21402151	SIGNOR;TRIP;iPTMnet	3	3	1	21402151
2	Q13438	Q9HBA0	OS9	TRPV4	True	True	True	True	True	True	3	HPRD:17932042;SIGNOR:17932042;TRIP:17932042	HPRD;SIGNOR;TRIP	3	3	1	17932042
3	P18031	Q9H1D0	PTPN1	TRPV6	True	False	True	True	False	True	11	DEPOD:15894168;DEPOD:17197020;HPRD:15894168;In...	DEPOD;HPRD;IntAct;Lit-BM-17;SIGNOR;SPIKE_LC;TRIP	7	6	2	15894168;17197020
4	P63244	Q9BX84	RACK1	TRPM6	True	False	True	True	False	True	2	SIGNOR:18258429;TRIP:18258429	SIGNOR;TRIP	2	2	1	18258429

pkn["interaction"] = pkn["is_stimulation"].astype(int) - pkn["is_inhibition"].astype(int)
sel_pkn = pkn[["source_genesymbol", "interaction", "target_genesymbol"]]
sel_pkn

	source_genesymbol	interaction	target_genesymbol
0	PRKG1	-1	TRPC3
1	PRKG1	1	TRPC7
2	OS9	0	TRPV4
3	PTPN1	-1	TRPV6
4	RACK1	-1	TRPM6
...	...	...	...
61542	APC_PUF60_SIAH1_SKP1_TBL1X	-1	CTNNB1
61543	MAP2K6	1	MAPK10
61544	PRKAA1	1	TP53
61545	CNOT1_CNOT10_CNOT11_CNOT2_CNOT3_CNOT4_CNOT6_CN...	-1	NANOS2
61546	WNT16	1	FZD3

60921 rows × 3 columns

# We create the CORNETO graph by importing the edges and interaction
from corneto.io import load_graph_from_sif_tuples

G = load_graph_from_sif_tuples([(r[0], r[1], r[2]) for _, r in sel_pkn.iterrows() if r[1] != 0])
G.shape

(5442, 60034)

# As measurements, we take the estimated TFs, we will filter out TFs with p-val > 0.001
significant_tfs = tf_acts[tf_padj <= 0.001].T.dropna().sort_values(by="treatment.vs.control", ascending=False)
significant_tfs

	treatment.vs.control
SRF	7.384275
MYOCD	7.346561
SMAD3	7.088023
TCF21	6.162749
ZEB2	5.790396
SMAD2	5.617299
JUNB	5.613301
FOXD1	5.569131
SOX5	5.388260
BARX2	5.357512
HMGA2	4.920654
POU3F1	4.906394
RORA	4.751216
SMAD4	4.442445
OVOL1	4.282917
MEF2A	4.234915
FGF2	4.210021
VEZF1	4.207967
EGR4	4.112604
SMAD5	-4.006291
CIITA	-4.135616
NFKBIB	-4.156422
REL	-4.167478
PITX3	-4.200405
BCL11B	-4.204186
IRF3	-4.275299
PITX1	-4.343037
PARK7	-4.447875
KLF8	-4.469565
DNMT3A	-4.642689
RELB	-4.684594
NFE2L2	-4.913963
MECOM	-4.945115
KLF2	-5.096991
STAT1	-5.192426
HOXC6	-5.261645
IRF2	-5.272422
KLF11	-5.272551
SPI1	-5.425817
NR4A3	-5.726681
NR1H4	-6.059037
SKIL	-7.437322
IRF1	-9.362766

# We keep only the ones in the PKN graph
measurements = significant_tfs.loc[significant_tfs.index.intersection(G.V)].to_dict()["treatment.vs.control"]
measurements

{'SRF': 7.384274933753678,
 'MYOCD': 7.346560585692566,
 'SMAD3': 7.088022567967861,
 'ZEB2': 5.790395603340423,
 'SMAD2': 5.617299171561388,
 'JUNB': 5.613301413598558,
 'HMGA2': 4.920654205164735,
 'POU3F1': 4.9063937479035795,
 'RORA': 4.751215803623851,
 'SMAD4': 4.442444690531756,
 'MEF2A': 4.234915033987421,
 'FGF2': 4.210020918905277,
 'SMAD5': -4.006290782113948,
 'CIITA': -4.135615580532372,
 'NFKBIB': -4.156422100656065,
 'REL': -4.167478417685659,
 'PITX3': -4.200405283105377,
 'IRF3': -4.275299453731299,
 'PITX1': -4.343036884101653,
 'KLF8': -4.469565287654229,
 'DNMT3A': -4.642688562638021,
 'RELB': -4.684594362576203,
 'NFE2L2': -4.913963130388188,
 'MECOM': -4.9451147439384044,
 'STAT1': -5.19242581697902,
 'HOXC6': -5.261644928875645,
 'IRF2': -5.272422064497416,
 'KLF11': -5.272550840567249,
 'SPI1': -5.425816853074546,
 'NR4A3': -5.726680520718154,
 'NR1H4': -6.059037198788599,
 'SKIL': -7.437321830844388,
 'IRF1': -9.362766415720282}

"TGFBR1" in G.V

True

# Here we pass the receptors. We're going to use the top receptor inferred by RIDDEN
# (TGFBR1)
inputs = {"TGFBR1": 1}  # 1=up, -1=down
inputs

{'TGFBR1': 1}

# Create the dataset in standard format
carnival_data = dict()
for inp, v in inputs.items():
    carnival_data[inp] = dict(value=v, role="input", mapping="vertex")
for out, v in measurements.items():
    carnival_data[out] = dict(value=v, role="output", mapping="vertex")
data = cn.Data.from_cdict({"sample1": carnival_data})
data

Data(n_samples=1, n_feats=[34])

CARNIVAL#

We will use now the CARNIVAL, as implemented with CORNETO, with a penalty to regularize the solution. For hypothesis exploration, we can try different values and manually inspect the solutions. We are going to penalize also the selection of indirect rules, to simplify the solution. These rules are:

A -> B in the PKN, and we observe B is downregulated, we explain it with downregulation of its upstream gene A
A -| B in the PKN, and we observe B is upregulated, we explain it with downregulation of A

carnival = CarnivalFlow(lambda_reg=1.0, indirect_rule_penalty=10)
P = carnival.build(G, data)
P.expr

Unreachable vertices for sample: 0

{'_flow': _flow: Variable((3251,), _flow),
 'const0x41332f448f64593': const0x41332f448f64593: Constant(CONSTANT, NONNEGATIVE, (937, 3251)),
 'const0xd5dbd53619f708b': const0xd5dbd53619f708b: Constant(CONSTANT, NONNEGATIVE, (937, 3251)),
 'edge_inhibits': edge_inhibits: Variable((3251, 1), edge_inhibits, boolean=True),
 'edge_activates': edge_activates: Variable((3251, 1), edge_activates, boolean=True),
 '_dag_layer': _dag_layer: Variable((937, 1), _dag_layer),
 'flow': _flow: Variable((3251,), _flow),
 'vertex_value': Expression(AFFINE, UNKNOWN, (937, 1)),
 'vertex_activated': Expression(AFFINE, NONNEGATIVE, (937, 1)),
 'vertex_inhibited': Expression(AFFINE, NONNEGATIVE, (937, 1)),
 'edge_value': Expression(AFFINE, UNKNOWN, (3251, 1)),
 'edge_has_signal': Expression(AFFINE, NONNEGATIVE, (3251, 1)),
 'vertex_max_depth': _dag_layer: Variable((937, 1), _dag_layer)}

P.solve(solver="highs", max_seconds=60, verbosity=1);

===============================================================================
                                     CVXPY                                     
                                     v1.6.6                                    
===============================================================================
-------------------------------------------------------------------------------
                                  Compilation                                  
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
                                Numerical solver                               
-------------------------------------------------------------------------------
Running HiGHS 1.11.0 (git hash: 364c83a): Copyright (c) 2025 HiGHS under MIT licence terms
MIP  has 30607 rows; 10690 cols; 108040 nonzeros; 6502 integer variables (6502 binary)
Coefficient ranges:
  Matrix [1e+00, 9e+02]
  Cost   [1e+00, 2e+01]
  Bound  [1e+00, 1e+00]
  RHS    [1e+00, 1e+03]
Presolving model
17788 rows, 10299 cols, 91260 nonzeros  0s
12833 rows, 9086 cols, 69235 nonzeros  0s
12455 rows, 9011 cols, 68752 nonzeros  0s

Solving MIP model with:
   12455 rows
   9011 cols (5625 binary, 0 integer, 0 implied int., 3386 continuous, 0 domain fixed)
   68752 nonzeros

Src: B => Branching; C => Central rounding; F => Feasibility pump; J => Feasibility jump;
     H => Heuristic; L => Sub-MIP; P => Empty MIP; R => Randomized rounding; Z => ZI Round;
     I => Shifting; S => Solve LP; T => Evaluate node; U => Unbounded; X => User solution;
     z => Trivial zero; l => Trivial lower; u => Trivial upper; p => Trivial point

        Nodes      |    B&B Tree     |            Objective Bounds              |  Dynamic Constraints |       Work      
Src  Proc. InQueue |  Leaves   Expl. | BestBound       BestSol              Gap |   Cuts   InLp Confl. | LpIters     Time

 J       0       0         0   0.00%   -inf            1                  Large        0      0      0         0     0.6s
         0       0         0   0.00%   -35.76172711    1               3676.17%        0      0      0       924     0.7s
 C       0       0         0   0.00%   -31.47852215    -1.758607953    1689.97%      169     24      0      1062     1.5s
 L       0       0         0   0.00%   -31.47852215    -30.33661578       3.76%      280     35      0      1112     2.5s

62.0% inactive integer columns, restarting
Model after restart has 1325 rows, 2777 cols (204 bin., 0 int., 0 impl., 2573 cont., 0 dom.fix.), and 8468 nonzeros

         0       0         0   0.00%   -31.47852215    -30.33661578       3.76%       15      0      0      5669     3.1s
         0       0         0   0.00%   -31.47852215    -30.33661578       3.76%       16      5      0      6412     3.1s

57.8% inactive integer columns, restarting
Model after restart has 930 rows, 2595 cols (81 bin., 0 int., 0 impl., 2514 cont., 0 dom.fix.), and 6944 nonzeros

         0       0         0   0.00%   -31.00328245    -30.33661578       2.20%        8      0      0      8365     3.3s
         0       0         0   0.00%   -31.00328245    -30.33661578       2.20%        8      5      1      8932     3.3s

7.4% inactive integer columns, restarting
Model after restart has 878 rows, 2570 cols (68 bin., 0 int., 0 impl., 2502 cont., 0 dom.fix.), and 6788 nonzeros

         0       0         0   0.00%   -30.47852215    -30.33661578       0.47%       16      0      0     10659     3.5s
         0       0         0   0.00%   -30.47852215    -30.33661578       0.47%       16     14      2     10964     3.5s

13.2% inactive integer columns, restarting
Model after restart has 829 rows, 2548 cols (52 bin., 0 int., 0 impl., 2496 cont., 0 dom.fix.), and 6636 nonzeros

         0       0         0   0.00%   -30.47852215    -30.33661578       0.47%        9      0      0     11860     3.6s
         0       0         0   0.00%   -30.47852215    -30.33661578       0.47%        9      5      2     12178     3.6s
         1       0         1 100.00%   -30.33661578    -30.33661578       0.00%      313     36     15     13592     3.8s

Solving report
  Status            Optimal
  Primal bound      -30.3366157837
  Dual bound        -30.3366157837
  Gap               0% (tolerance: 0.01%)
  P-D integral      47.8825450825
  Solution status   feasible
                    -30.3366157837 (objective)
                    0 (bound viol.)
                    0 (int. viol.)
                    0 (row viol.)
  Timing            3.76 (total)
                    0.00 (presolve)
                    0.00 (solve)
                    0.00 (postsolve)
  Max sub-MIP depth 5
  Nodes             1
  Repair LPs        0 (0 feasible; 0 iterations)
  LP iterations     13592 (total)
                    0 (strong br.)
                    1033 (separation)
                    9661 (heuristics)
-------------------------------------------------------------------------------
                                    Summary                                    
-------------------------------------------------------------------------------

# Error for sample is 0, meaning that all reachable TFs (from receptors) are correctly explained
for o in P.objectives:
    print(o.name, "->", o.value)

error_sample1_0 -> 65.03395318114363
penalty_indirect_rules_0 -> [0.]
regularization_edge_has_signal -> 28.0

# We extract the selected edges
sol_edges = np.flatnonzero(np.abs(P.expr.edge_value.value) > 0.5)
carnival.processed_graph.plot_values(
    vertex_values=P.expr.vertex_value.value,
    edge_values=P.expr.edge_value.value,
    edge_indexes=sol_edges,
)

../../_images/aa6e6e106291a2a2317eb6dedad33b40f1cd110dd3f739754bdef7e706ffe96b.svg

# Extracting the solution graph
G_sol = carnival.processed_graph.edge_subgraph(sol_edges)
G_sol.shape

(28, 28)

pd.DataFrame(
    P.expr.vertex_value.value,
    index=carnival.processed_graph.V,
    columns=["node_activity"],
)

	node_activity
P03231_UBA52	0.0
GNAI1	0.0
CDC42	0.0
RPS6KA5	0.0
P03186_UBB	0.0
...	...
SET	0.0
SYNJ1	0.0
MAP2K5	0.0
IFITM3	0.0
CDK2	0.0

937 rows × 1 columns

pd.DataFrame(P.expr.edge_value.value, index=carnival.processed_graph.E, columns=["edge_activity"])

		edge_activity
(SMAD3)	(MYOD1)	0.0
(GRK2)	(BDKRB2)	0.0
(MAPK14)	(MAPKAPK2)	1.0
(DEPTOR_EEF1A1_MLST8_MTOR_PRR5_RICTOR)	(FBXW8)	0.0
(SLK)	(MAP3K5)	0.0
...	...	...
(JUNB)	()	0.0
(SRF)	()	0.0
(ZEB2)	()	0.0
(SMAD2)	()	0.0
()	(TGFBR1)	1.0

3251 rows × 1 columns

Saving the processed dataset#

cn.GraphData(G, data).save("data/carnival_transcriptomics_dataset")