Half-Time Seminar 2017

Daniel Morgan

NestBoot

MYC

"Inference" -> variable selection

**GeneSPIDER**

tune, run, eval inference algs:

**Regularization** helps fit linear model:

Glmnet/**LASSO**/ElasticNet,

**T/LSCO**

RNI, ARACNe (mutual info)

Â

**bootstrap** -> resampling method estimating variance from approx. distr.

--> build confidence/support/reliability btwn runs

**SNR, K, IAA**

**MCC,AUROC** -> performance measure based on various ratio of TP, TN, FP, FN

Terms:

**Gene**ration and **S**imulation **P**ackage for **I**nformative **D**ata **E**xplo**R**ation

via some 200 networks & 600 expression sets

consisting of 4 different topologies

with varied SNR,

IAA degrees,

and sizes

Robust Network Inference Â decouples the model selection problem from parameter estimation; is very harsh but among the best methods when noise is low

focuses on mutual information between links in a link by link fashion rather than upon entire system as a whole. Also disregards self-regulating elements

Least Absolute Shrinkage & Selection Operator: minimizes RSS by penalizing |coefficient| rather than their square, thus harshest (zeros possible)

Fit cases to regression line minimizing difference on X and XandY axis

with self loops âˆ´Â null ARACNe

Nested Bootstrapping

for **reliable** GRN Inference

???

Nested Bootstrapping

for **reliable** GRN Inference

__Threshold__

(T)LSCO

Glmnet - LASSO

ARACNe

???

Nested Bootstrapping

for **reliable** GRN Inference

middle density

biological time series dataset

- 1/3 expt duplicates
- 1/3 4 time points
- 1/3 3 time points
- use first and last as background & stead state

collapse from 16k genes to 28

- w/ single perturbations and replicates
- via Schur method to maintain network properties

Bolasso

Bolsco

size: 28 links: 89 density: 0.1135204

Inferring interactions of 40 genes gravitating around **MYC oncogene**

- Unique single/double knockdown dataset
- Linear model

1. qRTPCR of40 genes, singly & doubly knocked down via siRNA

- 3 biological replicates
- 2 technical replicates

â€‹--> gene fold change & variance of expression measures

2. Linear Model

Â

Y: expression data

A: network

P: perturbation matrix

E: input noise estimate

F: output noise estimate

Y = -A^{-1}P +A^{-1}F + E

$Y = -A^{-1}P +A^{-1}F + E$

Inferring interactions of 40 genes gravitating around **MYC oncogene**

- Unique single/double knockdown dataset
- Linear model

1. qRTPCR of40 genes, singly & doubly knocked down via siRNA

- 3 biological replicates
- 2 technical replicates

â€‹--> gene fold change & variance of expression measures

Inferring interactions of 40 genes gravitating around **MYC oncogene**

**Inferring Link & Sign**

consensusÂ of high confidence based on nested boostrap

Inferring interactions of 40 genes gravitating around **MYC oncogene**

Inferring interactions of 40 genes gravitating around **MYC oncogene**

Bolasso

Bolsco

Different methods return networks of similar size based on 5% FDR cutoff

MYC, B.Sub, synthetic data

MYC and B.Sub data

add wrappers to GS