Towards Reliable Gene Regulatory Network Inference

Stockholm University

March 2019

Daniel Morgan

A disorder has been identified... what next?

Full genome sequencing?

exome, proteome, interactome?

 

... why not infer its regulatory network?

Why Networks?

  • Relational Insights: Regulatory/Influential
  • Hypothesis Generation
  • Knowledge Consolidation & General Understanding
  • Intervention: Therapeutic Targets

Hypothesis

Simulation

Learn network parameters

Experiment

Analysis

Infer Network

Knowledge & Hypothesis

inspired by Bellot 2015

Biology

  • Cells replicate, grow and die
  • Express genes to confer trait
  • Responses to environment are robust but not infinite

Post-Perturbation "Digitization"

  • microarray
  • qPCR
  • RNA-Seq/ transcriptome
  • survival/phenotypic assay

Hecker 2009

More

Smet 2010

Tie-In to Biochemistry

 

  • Directed Links
  • Not necessarily direct binding tho
  • Still offers regulatory information
    • more than coexpression anyway

Feature Selection

  • Hierarchical clustering groups elements based on how close they are to one another. The result is a tree structure, referred to as dendrogram.
    • Genie3 determines those genes directly influencing expression patterns of other genes, ranking such features via the tree building ensemble process.
  • K-means clustering groups elements into a specified number of clusters, which can be useful when you know or suspect the number of clusters in the data.

Mutual Info

 

  • The area contained by both circles is the joint entropy H(X,Y)
  • An individual circle is the individual entropy H(X) or H(Y)
    •  the unique area is the conditional entropy H(X|Y) or H(Y|X)
  • The violet is the mutual information I(X;Y)

Boolean

YES or NO

  • State space of a Boolean Network
  • Thin (black) arrows symbolise the inputs of the Boolean function.
  • The thick (grey) arrows show what a synchronous update does.

Bayesian

conditional probabilities per link > DAGs

I (A; E)

I (B; D | A,E)

I (C; A,D,E | B)

I (D; B,C,E | A)

I (E; A, D)

Friedman 2000

System of Equations

coupled ODE related [mRNA] of gene to all other genes

Penfold 2011

Neural Nets

(x) hidden layers connecting input and output

 

supervised

  • infer the mapping implied by the (training) data

unsupervised

  • inferring a function to describe hidden structure from "unlabeled" data

Mechanistic

identify direct interactions

Influence

capture information flow to understand control system 

 

 

 

 

*(not necessarily measured or direct)

Two General, Distinct Aims

Gardner 2005

Network Inference

Gardner 2005

GRN

LASSO (Glmnet),(T)LSCO,

RNI,ARACNe,Genie3,CLR

Perturbation (si/shRNA)

SNR, IAA, Rank

MCC, AUROC, wRSS

inferred  network

 

 

inference methods

 

 

experimental/ data collection

 

data properties

 

scoring measures

Scoring Inferred(s) vs GS

GeneSpider
NestBoot
MYC

L1000

GeneSpider

Generation and Simulation Package for Informative Data ExploRation

GS case study

via some 200 networks & 600 expression sets

consisting of 4 different topologies

with varied SNR,

IAA degrees,

and sizes

GS case study

RNICO

Robust Network Inference  decouples the model selection problem from parameter estimation; is very harsh but among the best methods when noise is low

ARACNe

focuses on mutual information between links in a link by link fashion rather than upon entire system as a whole. Also disregards self-regulating elements

LASSO/Glmnet

Least Absolute Shrinkage & Selection Operator: minimizes RSS by penalizing |coefficient| rather than their square, thus harshest (zeros possible)

(T)LSCO

Fit cases to regression line minimizing difference on X and XandY axis

GS case study

with self loops  null ARACNe

NestBoot

A Generalized Framework for Controlling FDR in Gene Regulatory Network Inference

???

NestBoot

 

LSCO / TLSCO

Glmnet / LASSO

RNI

 

Genie3,

ARACNe, CLR

A Generalized Framework for Controlling FDR in Gene Regulatory Network Inference

NestBoot

A Generalized Framework for Controlling FDR in Gene Regulatory Network Inference

NestBoot

NestBoot

A Generalized Framework for Controlling FDR in Gene Regulatory Network Inference

N45 Performance

Real Data Performance

N10 Performance

NestBoot

A Generalized Framework for Controlling FDR in Gene Regulatory Network Inference

CLR

utilize defined perturbation design to map expression to network topologies

LASSO

LSCO

RNICO

Genie3

uses a tree ensemble approach drawing on relationships between input gene expression patterns to predict those of target genes, building trees based on bootstrapped samples to return one inferred network with ranked link strengths

applies normal distribution statistics to mutual information scores in order to identify network links

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

qRTPCR of 40 genes, singly & doubly knocked down via siRNA

  • 3 biological replicates
  • 2 technical replicates

 

​= gene fold change

&

variance of expression measures

Y: expression data
A: network
P: perturbation matrix
E: input noise estimate
F: output noise estimate

Y=-A^{-1}(P+F)+E
Y=A1(P+F)+EY=-A^{-1}(P+F)+E

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

Perturb

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

Novel Biochemical Regulatory Mechanism

phosphoprotein which binds E box DNA consensus sequence & regulates transcription of various targets

encodes G2/M transition regulatory protein

2 bromodomain proteins --> target chromatin during mitosis

proliferation/oncogenesis proceed via activation of MYC

comparison to random networks

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

Myc

Perturbation-based gene regulatory network inference to reliably predict oncogenic mechanisms

Large-Scale Network Comparison,

ie Making sense of Landmark 978 (L1000)

Subramanian et al. 2017

Data

Large-Scale Network Comparison,

ie Making sense of Landmark 978 (L1000)

Subramanian et al. 2017

To account for:

screening plate, bead arrays, cell passage, drug batch, equipment units, personal

  • 4 RUV methods' performance quantified by
  • 7 endpoint measures compared to
  • 4 standard normalization methods

Fold Change from (expression + systematic noise + Gaussian noise) / control

Lönnstedt et al. 2017

Large-Scale Network Comparison,

ie Making sense of Landmark 978 (L1000)

Large-Scale Network Comparison,

ie Making sense of Landmark 978 (L1000)

Overlap

Comparison

Large-Scale Network Comparison,

ie Making sense of Landmark 978 (L1000)

Large-Scale Network Comparison,

ie Making sense of Landmark 978 (L1000)

  1. Match drug P to STITCH db and infer multi-P nets
  2. Delta drug expression and empty vector (control)
    • match DE genes to GRN hubs
  3. Pathway Enrichment??

Next Step

Large-Scale Network Comparison,

ie Making sense of Landmark 978 (L1000)

4 RUV methods' performance quantified by 7 endpoint measures compared to 4 standard normalization methods

(1) MAD - mean absolute deviation from zero (for reference)

heatmap patterns

(2) SlopeVerti & (3) SlopeHoriz

knockdown controls

(4) AdistKS - Kolmogorov-Smirnov distance between 2 subsets

(5) Q3P -  third quartile of p-values differentiating targeted knockdowns from zero

p-values

(6) UnifKS -  Kolmogorov-Smirnov distance between P>0.001 subsets

(7) Lambda - inflation of median p-value

all vs all

aim: high AdistKS, low lambda, unifKS, slopeHoriz, slopeVerti & MAD

Platform

Large-Scale Network Comparison,

ie Making sense of Landmark 978 (L1000)

calculating fold change, increasing SNR

  • analysis of fold change profiles estimated by a RUV method
  • remove unwanted variation
  • separates signal from unwanted effects
  • This greatly improves signal-to-noise over standard methods

Conclusion

GeneSPIDER benchmarking environment

NestBoot FDR-informed reliable inference

Perturbation based GRNI

siRNA dataset & novel interaction

Large-Scale Comparison

large scale inference & module comparison

Thanks!

Current Lab Members:

  • Dimitri Guala
  • Stefanie Friedrich
  • Miguel Castresana
  • Deniz Seçilmiş

 

Team Members:

  • Andreas Tjärnberg
  • Torbjörn Nordling

Past Lab Members:

  • Christoph Ogris
  • Mateusz Kaduk

Erik Sonnhammer

https://dcolin.shinyapps.io/NestBoot-Viz/