Network Inference

Half-Time Seminar 2017

Daniel Morgan

GeneSpider
NestBoot
MYC

"Inference" -> variable selection

GeneSPIDER

tune, run, eval inference algs:

Regularization helps fit linear model:

Glmnet/LASSO/ElasticNet,

T/LSCO

RNI, ARACNe (mutual info)

 

bootstrap -> resampling method estimating variance from approx. distr.

--> build confidence/support/reliability btwn runs

SNR, K, IAA

MCC,AUROC -> performance measure based on various ratio of TP, TN, FP, FN

Terms:

GeneSpider

Generation and Simulation Package for Informative Data ExploRation

GS case study

via some 200 networks & 600 expression sets

consisting of 4 different topologies

with varied SNR,

IAA degrees,

and sizes

RNICO

Robust Network Inference  decouples the model selection problem from parameter estimation; is very harsh but among the best methods when noise is low

ARACNe

focuses on mutual information between links in a link by link fashion rather than upon entire system as a whole. Also disregards self-regulating elements

LASSO/Glmnet

Least Absolute Shrinkage & Selection Operator: minimizes RSS by penalizing |coefficient| rather than their square, thus harshest (zeros possible)

(T)LSCO

Fit cases to regression line minimizing difference on X and XandY axis

GS case study

GS case study

with self loops  null ARACNe

N10 Performance

NestBoot

Nested Bootstrapping

for reliable GRN Inference

???

NestBoot

Nested Bootstrapping

for reliable GRN Inference

Threshold

(T)LSCO

Glmnet - LASSO

ARACNe

???

NestBoot

Nested Bootstrapping

for reliable GRN Inference

Shuffle comparison

N10 high K max MCC=.24

middle density

B. subtilis

biological time series dataset

  • 1/3 expt duplicates
  • 1/3 4 time points
  • 1/3 3 time points
  • use first and last as background & stead state

collapse from 16k genes to 28

  • w/ single perturbations and replicates
  • via Schur method to maintain network properties

B. subtilis

Bolasso

Bolsco

size: 28 
links: 89 
density: 0.1135204

MYC project

Inferring interactions of 40 genes gravitating around MYC oncogene

  1. Unique single/double knockdown dataset
  2. Linear model

MYC project

1. qRTPCR of40 genes, singly & doubly knocked down via siRNA

  • 3 biological replicates
  • 2 technical replicates

​--> gene fold change & variance of expression measures

2. Linear Model

 

Y: expression data
A: network
P: perturbation matrix
E: input noise estimate
F: output noise estimate

Y = -A^{-1}P +A^{-1}F + E
Y=A1P+A1F+EY = -A^{-1}P +A^{-1}F + E

Inferring interactions of 40 genes gravitating around MYC oncogene

MYC project

  1. Unique single/double knockdown dataset
  2. Linear model

1. qRTPCR of40 genes, singly & doubly knocked down via siRNA

  • 3 biological replicates
  • 2 technical replicates

​--> gene fold change & variance of expression measures

Inferring interactions of 40 genes gravitating around MYC oncogene

MYC project

Inferring Link & Sign

consensus of high confidence based on nested boostrap

Inferring interactions of 40 genes gravitating around MYC oncogene

MYC project

Inferring interactions of 40 genes gravitating around MYC oncogene

Bolasso

Bolsco

Different methods return networks of similar size based on 5% FDR cutoff

In Conclusion

GeneSPIDER

MYC, B.Sub, synthetic data

Future Direction

NestBoot

MYC and B.Sub data

Benchmark

add wrappers to GS