Towards Reliable Gene Regulatory Network Inference

Stockholm University

April 2019

Daniel Morgan

Why Networks?

Relational Insights:

Influential/Physical Binding

Hypothesis Generation

Intervention: Therapeutic Targets

High level overview, consolidation & general understanding

Biology

Cells replicate, grow and die
Express genes to confer trait via proteins
Responses to environment are robust but not infinite

Measure

microarray
qPCR
RNA-Seq/ transcriptome
survival assay

Directed Perturbation

siRNA
shRNA
targeted drug
eQTL
CRISPRi

Hecker 2009

More

Smet 2010

Protein-Protein Interaction

Metabolic pathways

Signaling pathways

Transcript factors/ Gene regulatory networks

Pathway diagrams

Protein-compound interactions

Protein sequence focused

Genetic interaction networks

Guthke 2015

>35 "top" methods

Tie-In to Biochemistry

Directed links but not necessarily direct binding
Offers regulatory information
- (more than coexpression)

Hypothesis

Simulation

Learn network parameters

Experiment

Analysis

Infer Network

Knowledge & Hypothesis

inspired by Bellot 2015

Feature Selection

Hierarchical clustering groups elements based on how close they are to one another. The result is a tree structure, referred to as dendrogram.
- Genie3 determines those genes directly influencing expression patterns of other genes, ranking such features via the tree building ensemble process.
K-means clustering groups elements into a specified number of clusters, which can be useful when you know or suspect the number of clusters in the data.

Mutual Info

The area contained by both circles is the joint entropy H(X,Y)
An individual circle is the individual entropy H(X) or H(Y)
- the unique area is the conditional entropy H(X|Y) or H(Y|X)
The violet is the mutual information I(X;Y)

Boolean

YES or NO

State space of a Boolean Network
- N=4 nodes
- K=1 links per node.
- 6 (orange) attractors
- 4 are fixed points
Thin (black) arrows symbolise the inputs of the Boolean function.
The thick (grey) arrows show what a synchronous update does.

Bayesian

conditional probabilities per link > DAGs

I (A; E)

I (B; D | A,E)

I (C; A,D,E | B)

I (D; B,C,E | A)

I (E; A, D)

Friedman 2000

System of Equations

coupled ODE related [mRNA] of gene to all other genes

Penfold 2011

Neural Nets

(x) hidden layers connecting input and output

supervised

infer the mapping implied by the (training) data

unsupervised

inferring a function to describe hidden structure from "unlabeled" data

Mechanistic

identify direct interactions

Influence

capture information flow to understand control system

*(not necessarily measured or direct)

Two General, Distinct Aims

Gardner 2005

Network Inference

Gardner 2005

I. GeneSPIDER

II. NestBoot

III. Perturbation based GRNI

IV. Large-Scale Comparison

Mathematical Benchmark Environment

Restrict False Links

Knock Down Dataset & Novel Interaction

Large Scale Inference & Submodules

Scoring Inferred(s) vs GS

GRN

LASSO (Glmnet),(T)LSCO,

RNI,ARACNe,Genie3,CLR

Perturbation (si/shRNA)

SNR, IAA, Rank

MCC, AUROC, wRSS

inferred network

inference methods

experimental/ data collection

data properties

scoring measures

GeneSpider

Generation and Simulation Package for Informative Data ExploRation

https://bitbucket.org/sonnhammergrni/genespider

GS case study

via some 200 networks & 600 expression sets

consisting of 4 different topologies

with varied SNR,

IAA degrees,

and sizes

GS case study

RNICO

Robust Network Inference decouples the model selection problem from parameter estimation; is very harsh but among the best methods when noise is low

ARACNe

focuses on mutual information between links in a link by link fashion rather than upon entire system as a whole. Also disregards self-regulating elements

LASSO/Glmnet

Least Absolute Shrinkage & Selection Operator: minimizes RSS by penalizing |coefficient| rather than their square, thus harshest (zeros possible)

(T)LSCO

Fit cases to regression line minimizing difference on X and Y axis

GS case study

with self loops ∴ null ARACNe

NestBoot

A Generalized Framework for Controlling FDR in Gene Regulatory Network Inference

???

NestBoot

LSCO / TLSCO

Glmnet / LASSO

RNI

Genie3,

ARACNe, CLR

A Generalized Framework for Controlling FDR in Gene Regulatory Network Inference

NestBoot

A Generalized Framework for Controlling FDR in Gene Regulatory Network Inference

NestBoot

A Generalized Framework for Controlling FDR in Gene Regulatory Network Inference

5%

45 gene in-silico network performance

https://gitlab.com/Xparx/scikit-grni.git

MATLAB -> Python

Real Data Performance

NestBoot

A Generalized Framework for Controlling FDR in Gene Regulatory Network Inference

N10 Performance

NestBoot

A Generalized Framework for Controlling FDR in Gene Regulatory Network Inference

CLR

utilize defined perturbation design to map expression to network topologies

LASSO

LSCO

RNICO

Genie3

uses a tree ensemble approach drawing on relationships between input gene expression patterns to predict those of target genes, building trees based on bootstrapped samples to return one inferred network with ranked link strengths

applies normal distribution statistics to mutual information scores in order to identify network links

NestBoot

A Generalized Framework for Controlling FDR in Gene Regulatory Network Inference

...

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

qPCR of 40 genes, singly & doubly knocked down via siRNA

3 biological replicates
2 technical replicates

= gene fold change

&

variance of expression measures

Y: expression data
A: network
P: perturbation matrix
E: input noise estimate
F: output noise estimate

Y=-A^{-1}(P+F)+E

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

Shuffle Topology
Fit parameter to data
Estimate Goodness of Fit

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

JQ1 inhibits BRD4

Novel Biochemical Regulatory Mechanism

known oncogene: regulates transcription of various targets

encodes G2/M transition regulatory protein

targets chromatin during mitosis

expansion/proliferation proceed via activation of MYC, not just thru MYC mediated protein expression

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

https://dcolin.shinyapps.io/NestBoot-Viz/

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

comparison to random networks

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

Large-Scale Network Comparison,

ie Making sense of Landmark 978 (L1000)

Subramanian et al. 2017

Large-Scale Network Comparison,

ie Making sense of Landmark 978 (L1000)

Large-Scale Network Comparison,

ie Making sense of Landmark 978 (L1000)

HEATMAP=OK

PCA=OK

Degree=OK ...ish

Normalizing Melanoma Data ------ Data ---------- Network

SNR=.0076

SNR=.005

SNR=.008

SNR=.009

SNR=.01

SNR=.05

SNR=.02

Subset Selection for Noisy Data

Inference Accuracy on Less-Noisy Subsets

Subset Selection for Noisy Data

Melanoma Network Degree

Conclusion:

Improved the confidence of inferred GRN links by the evaluation of their significance through comparison to nulls

Thanks!

Current Lab Members:

Dimitri Guala
Stefanie Friedrich
Deniz Seçilmiş
Thomas Hillerton
Emma Persson
Miguel Castresana Aguirre B.L.E.A.G.I.E.B.A.G.T.R.

Team Members:

Andreas Tjärnberg
Torbjörn Nordling

Past Lab Members:

Christoph Ogris
Mateusz Kaduk

Erik Sonnhammer

Match drug target to genes
- infer multi-P nets
Time-series, delta networks
match DE genes to GRN hubs
Pathway Enrichment

Next Step

Data

Large-Scale Network Comparison,

ie Making sense of Landmark 978 (L1000)

Subramanian et al. 2017

Large-Scale Network Comparison,

ie Making sense of Landmark 978 (L1000)

Overlap

Comparison

Large-Scale Network Comparison,

ie Making sense of Landmark 978 (L1000)

http://vcg.github.io/upset/

https://raw.githubusercontent.com/dcolinmorgan/GSnetApp/master/L1000_GRN.json

Gehlenborg Lab

Large-Scale Network Comparison,

ie Making sense of Landmark 978 (L1000)

Large-Scale Network Comparison,

ie Making sense of Landmark 978 (L1000)

subset selection method

4 RUV methods' performance quantified by 7 endpoint measures compared to 4 standard normalization methods

(1) MAD - mean absolute deviation from zero (for reference)

heatmap patterns

(2) SlopeVerti & (3) SlopeHoriz

knockdown controls

(4) AdistKS - Kolmogorov-Smirnov distance between 2 subsets

(5) Q3P - third quartile of p-values differentiating targeted knockdowns from zero

p-values

(6) UnifKS - Kolmogorov-Smirnov distance between P>0.001 subsets

(7) Lambda - inflation of median p-value

all vs all

aim: high AdistKS, low lambda, unifKS, slopeHoriz, slopeVerti & MAD

calculating fold change, increasing SNR

analysis of fold change profiles estimated by a RUV method
remove unwanted variation
separates signal from unwanted effects
This greatly improves signal-to-noise over standard methods

Platform

Large-Scale Network Comparison,

ie Making sense of Landmark 978 (L1000)