Towards Reliable Gene Regulatory Network Inference

Stockholm University

April 2019

Daniel Morgan

Why Networks?

  • Relational Insights:

Influential/Physical Binding

  • Hypothesis Generation

Intervention: Therapeutic Targets

  • High level overview, consolidation & general understanding

Biology

  • Cells replicate, grow and die
  • Express genes to confer trait via proteins
  • Responses to environment are robust but not infinite

Measure

  • microarray
  • qPCR
  • RNA-Seq/ transcriptome
  • survival assay

Directed Perturbation

  • siRNA
  • shRNA
  • targeted drug
  • eQTL
  • CRISPRi

Hecker 2009

More

Smet 2010

Protein-Protein Interaction

Metabolic pathways

Signaling pathways

Transcript factors/ Gene regulatory networks

Pathway diagrams

Protein-compound interactions

Protein sequence focused

Genetic interaction networks

Guthke 2015

>35 "top" methods

Tie-In to Biochemistry

 

  • Directed links but not necessarily direct binding
  • Offers regulatory information
    • (more than coexpression)

Hypothesis

Simulation

Learn network parameters

Experiment

Analysis

Infer Network

Knowledge & Hypothesis

inspired by Bellot 2015

Feature Selection

  • Hierarchical clustering groups elements based on how close they are to one another. The result is a tree structure, referred to as dendrogram.
    • Genie3 determines those genes directly influencing expression patterns of other genes, ranking such features via the tree building ensemble process.
  • K-means clustering groups elements into a specified number of clusters, which can be useful when you know or suspect the number of clusters in the data.

Mutual Info

 

  • The area contained by both circles is the joint entropy H(X,Y)
  • An individual circle is the individual entropy H(X) or H(Y)
    •  the unique area is the conditional entropy H(X|Y) or H(Y|X)
  • The violet is the mutual information I(X;Y)

Boolean

YES or NO

  • State space of a Boolean Network
  • Thin (black) arrows symbolise the inputs of the Boolean function.
  • The thick (grey) arrows show what a synchronous update does.

Bayesian

conditional probabilities per link > DAGs

I (A; E)

I (B; D | A,E)

I (C; A,D,E | B)

I (D; B,C,E | A)

I (E; A, D)

Friedman 2000

System of Equations

coupled ODE related [mRNA] of gene to all other genes

Penfold 2011

Neural Nets

(x) hidden layers connecting input and output

 

supervised

  • infer the mapping implied by the (training) data

unsupervised

  • inferring a function to describe hidden structure from "unlabeled" data

Mechanistic

identify direct interactions

Influence

capture information flow to understand control system 

 

 

 

 

*(not necessarily measured or direct)

Two General, Distinct Aims

Gardner 2005

Network Inference

Gardner 2005

I. GeneSPIDER

II. NestBoot        

III. Perturbation based GRNI

IV. Large-Scale Comparison

Mathematical Benchmark Environment

Restrict False Links

Knock Down Dataset & Novel Interaction

Large Scale Inference & Submodules

Scoring Inferred(s) vs GS

GRN

LASSO (Glmnet),(T)LSCO,

RNI,ARACNe,Genie3,CLR

Perturbation (si/shRNA)

SNR, IAA, Rank

MCC, AUROC, wRSS

inferred  network

 

 

inference methods

 

 

experimental/ data collection

 

data properties

 

scoring measures

GeneSpider

Generation and Simulation Package for Informative Data ExploRation

GS case study

via some 200 networks & 600 expression sets

consisting of 4 different topologies

with varied SNR,

IAA degrees,

and sizes

GS case study

RNICO

Robust Network Inference  decouples the model selection problem from parameter estimation; is very harsh but among the best methods when noise is low

ARACNe

focuses on mutual information between links in a link by link fashion rather than upon entire system as a whole. Also disregards self-regulating elements

LASSO/Glmnet

Least Absolute Shrinkage & Selection Operator: minimizes RSS by penalizing |coefficient| rather than their square, thus harshest (zeros possible)

(T)LSCO

Fit cases to regression line minimizing difference on X and Y axis

GS case study

with self loops ∴ null ARACNe

NestBoot

A Generalized Framework for Controlling FDR in Gene Regulatory Network Inference

???

NestBoot

 

LSCO / TLSCO

Glmnet / LASSO

RNI

 

Genie3,

ARACNe, CLR

A Generalized Framework for Controlling FDR in Gene Regulatory Network Inference

NestBoot

A Generalized Framework for Controlling FDR in Gene Regulatory Network Inference

NestBoot

NestBoot

A Generalized Framework for Controlling FDR in Gene Regulatory Network Inference

5%

45 gene in-silico  network performance

https://gitlab.com/Xparx/scikit-grni.git

MATLAB -> Python

Real Data Performance

NestBoot

A Generalized Framework for Controlling FDR in Gene Regulatory Network Inference

N10 Performance

NestBoot

A Generalized Framework for Controlling FDR in Gene Regulatory Network Inference

CLR

utilize defined perturbation design to map expression to network topologies

LASSO

LSCO

RNICO

Genie3

uses a tree ensemble approach drawing on relationships between input gene expression patterns to predict those of target genes, building trees based on bootstrapped samples to return one inferred network with ranked link strengths

applies normal distribution statistics to mutual information scores in order to identify network links

NestBoot

A Generalized Framework for Controlling FDR in Gene Regulatory Network Inference

...

...

...

...

...

...

...

...

...

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

qPCR of 40 genes, singly & doubly knocked down via siRNA

  • 3 biological replicates
  • 2 technical replicates

 

​= gene fold change

&

variance of expression measures

Y: expression data
A: network
P: perturbation matrix
E: input noise estimate
F: output noise estimate

Y=-A^{-1}(P+F)+E

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

  1. Shuffle Topology
  2. Fit parameter to data
  3. Estimate Goodness of Fit

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

JQ1 inhibits BRD4

Novel Biochemical Regulatory Mechanism

known oncogene: regulates transcription of various targets

encodes G2/M transition regulatory protein

targets chromatin during mitosis

expansion/proliferation proceed via activation of MYC, not just thru MYC mediated protein expression

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

https://dcolin.shinyapps.io/NestBoot-Viz/

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

comparison to random networks

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

Large-Scale Network Comparison,

ie Making sense of Landmark 978 (L1000)

Subramanian et al. 2017

Large-Scale Network Comparison,

ie Making sense of Landmark 978 (L1000)

Large-Scale Network Comparison,

ie Making sense of Landmark 978 (L1000)

HEATMAP=OK

PCA=OK

Degree=OK ...ish

Normalizing Melanoma Data ------ Data ---------- Network

SNR=.0076

SNR=.005

SNR=.008

SNR=.009

SNR=.01

SNR=.05

SNR=.02

Subset Selection for Noisy Data

Inference Accuracy on Less-Noisy Subsets

Inference Accuracy on Less-Noisy Subsets

Subset Selection for Noisy Data

Melanoma Network Degree

Conclusion:

Improved the confidence of inferred GRN links by the evaluation of their significance through comparison to nulls

Thanks!

Current Lab Members:

  • Dimitri Guala
  • Stefanie Friedrich
  • Deniz SeçilmiÅŸ
  • Thomas Hillerton
  • Emma Persson
  • Miguel Castresana Aguirre B.L.E.A.G.I.E.B.A.G.T.R.

 

Team Members:

  • Andreas Tjärnberg
  • Torbjörn Nordling

Past Lab Members:

  • Christoph Ogris
  • Mateusz Kaduk

Erik Sonnhammer

  1. Match drug target to genes
    • infer multi-P nets
  2. Time-series, delta networks
  3. match DE genes to GRN hubs
  4. Pathway Enrichment

Next Step

Data

Large-Scale Network Comparison,

ie Making sense of Landmark 978 (L1000)

Subramanian et al. 2017

Large-Scale Network Comparison,

ie Making sense of Landmark 978 (L1000)

Overlap

Comparison

Large-Scale Network Comparison,

ie Making sense of Landmark 978 (L1000)

Large-Scale Network Comparison,

ie Making sense of Landmark 978 (L1000)

Large-Scale Network Comparison,

ie Making sense of Landmark 978 (L1000)

subset selection method

4 RUV methods' performance quantified by 7 endpoint measures compared to 4 standard normalization methods

(1) MAD - mean absolute deviation from zero (for reference)

heatmap patterns

(2) SlopeVerti & (3) SlopeHoriz

knockdown controls

(4) AdistKS - Kolmogorov-Smirnov distance between 2 subsets

(5) Q3P -  third quartile of p-values differentiating targeted knockdowns from zero

p-values

(6) UnifKS -  Kolmogorov-Smirnov distance between P>0.001 subsets

(7) Lambda - inflation of median p-value

all vs all

aim: high AdistKS, low lambda, unifKS, slopeHoriz, slopeVerti & MAD

calculating fold change, increasing SNR

  • analysis of fold change profiles estimated by a RUV method
  • remove unwanted variation
  • separates signal from unwanted effects
  • This greatly improves signal-to-noise over standard methods

Platform

Large-Scale Network Comparison,

ie Making sense of Landmark 978 (L1000)