Hej, jag är Daniel, hur mår du?

I love data in all shapes and sizes.

Recently, Graphistry commissioned me to rebuild a sklearn-associated package (skrub, formerly known as dirty_cat) which we call cu_cat. Overall it can encode "dirty" data into machine learnable formats up to 50x faster than the original code, using CUDA code via cupy/cuml/cudf python libraries. It involved rewriting the NMF optimization mathematics, making the code robust for variable gpu's memory availability, and generally being restricted to datatypes supported by CUDA currently (no datetime support for example).

I have trained as a systems biologist with a focus on network inference with applications in integrating multi-omics data to untangle drivers of disease and aging. In June 2019 I joined the Channing Division of Network Medicine at Brigham and Women Hospital at Harvard Medical School. In the Spring of 2019 I finished my doctorate in The Department of Biochemistry & Biophysics at Stockholm University in the Sonnhammer Lab at SciLifeLab. I received my Masters of Bioinformatics from The Ohio State University, and my Bachelors in Molecular Biology from Miami University.

This page is updated regularly to include current & past research

Follow links for side research and personal interests.

Work

Research Engineer
August 2023 -
Developing capabilities to make knowledge graphs and do machine learning on data of all shapes and sizes
Research Fellow
Aug 2021 - August 2022
Investigating gut microbiome metagenomics in S.E.Asians and single cell sequencing in metastisis
June 2019 - June 2021
Developing a framework for integrating epigenetics with other -omics in a network framework to trace trajectories over subject pseduo-time
Visiting Scholar
Feb 2017 - May 2017
Designing novel systems approach to controlling and balancing input and output noise in linear models used for network reconstruction
Bioinformatics Data Analyst
May 2013 - September 2014
Remote testing and data analysis and extraction using novel web application with R back-end

Specialties

Data Viz
Integrative -omics
Network Science/Inference
Systems Dynamics

Skills

Data Analysis
Algorithm Dev
Molecular Biology
Machine Learning

Languages

Python
keras/scipy
bash
Spark
git/Travis

PRINT RESUME

Education

Stockholm University
Doctor of Philosophy
January 2015 - December 2019
I am working on several research surrounding the reverse engineering, or inference, of gene regulatory networks, with an interest in downstream drug repositioning in the Sonnhammer Lab at SciLifeLab in conjunction with Torbjörn Nordling at National Cheng Kung University
The Ohio State University
Master of Science
August 2012 - December 2014
Studied and worked with interest in drug repositioning, with research investigating primary bladder and lung cancer samples. Thesis: Gene Co-Expression Network Mining Approach for Differential Expression Analysis.
Miami University
Bachelor of Science
August 2006 - January 2011
Worked in the Fisk Lab, as well as two summers in the Letterio Lab at Case Western Reserve University

Research

An oversight in how linear ODE models infer GRN has lead to scaling issues which introduce mega-hubs, ie would be mega regulators not witnessed to occur in nature, and very propably an artifact of the inference method.

02 Perturbation-based gene regulatory network inference to unravel oncogenic mechanisms biorXiv, Dataset and Code for use in this ShinyApp

Motivation: Cancer is known to stem from multiple, independent mutations, the effects of which aggregate to drive the cell into a cancerous state. To understand the complex interplay between affected genes, their gene regulatory network (GRN) needs to be uncovered, revealing detailed insights of regulatory mechanisms. We therefore decided to infer a reliable GRN from perturbation responses of 40 genes known or suspected to have a role in human cancers yet whose regulatory interactions are poorly known. Results: siRNA knockdown experiments of each gene were done in a human squamous carcinoma cell line, after which the transcriptomic response was measured. From these data GRNs were inferred using several methods, and the false discovery rate was controlled by the NestBoot framework. The best GRN was shown to be significantly more predictive than the null model, both in crossvalidated benchmarks and for an independent dataset of the same genes but subjected to double perturbations. It agrees with many known links in addition to predicting a large number of novel interactions, a subset of which were experimentally validated. The inferred GRN captures regulatory interactions central to cancer-relevant processes and thus provides mechanistic insights that are useful for future cancer research.

03 NestBoot: A Generalized Framework for Controlling FDR in Gene Regulatory Network Inference Publication and Code

Motivation: Inference of Gene Regulatory Networks (GRNs) from perturbation data can give detailed mechanistic insights of a biological system. Many GRN inference methods exist, but the topology of their estimates tend to be sensitive to changes in method specific parameters. Even though the inferred network is optimal given the parameters, it has been shown that many links are wrong or missing if the data is not informative. To make GRN inference reliable, a method is needed to estimate the support of each predicted link as the method parameters are varied. Results: To achieve this we have developed a method called nested bootstrapping, which applies a bootstrapping protocol to GRN inference, and by repeated bootstrap runs assesses the stability of the estimated support values. To translate bootstrap support values to false discovery rates we run the same pipeline with shuffled data as input. This provides a general method to control the false discovery rate of GRN inference that can be applied to any setting of inference parameters, noise level, or data property. We evaluated nested bootstrapping on a simulated dataset spanning a range of such properties, using the LASSO, Least Squares, and RNI inference methods. An improved inference accuracy was observed in almost all situations. The method is part of the GeneSPIDER package, which was also used for generating the simulated networks and data, as well as running and analyzing the inferences.

04 GeneSPIDER: gene regulatory network inference benchmarking with controlled network and data properties Publication and Code

I have worked in collaborating with other students, namely Andreas Tjärnberg, on the GeneSpider Package for MATLAB, which hopes to tackle a few key issues in modern network inference. Inference of gene regulatory networks (GRNs) is a central goal in systems biology. It is therefore important to evaluate the accuracy of GRN inference methods in the light of network and data properties. Although several packages are available for modelling, simulate, and analyse GRN inference, they offer limited control of network topology together with system dynamics, experimental design, data properties, and noise characteristics. Independent control of these properties in simulations is key to drawing conclusions about which inference method to use in a given condition and what performance to expect from it, as well as to obtain properties representative of real biological systems.

Tutorials

01A LSTM model to predict individual TF-gene links

via lung tissue GRN across averaged subject pseudo-time

01B LSTM model to predict entire networks

via COPD and control GRN across aggregated subject pseudo-time

02 MILIPEED Tutorial

03 GPUPanda Tutorial

04 Split Violin Plots in Python

Interests

Infinite Powers: How Calculus Reveals the Secrets of the Universe
Collapse: How Societies Choose to Fail or Succeed
Fathoms: The World in the Whale
Trade Wars Are Class Wars: How Rising Inequality Distorts the Global Economy and Threatens International Peace
Unworthy Republic: The Dispossession of Native Americans and the Road to Indian Territory
The Invention of Nature: Alexander von Humboldt's New World
Destined for War: Can America and China Escape Thucydides’s Trap?
From Darwin to Derrida: Selfish Genes, Social Selves, and the Meanings of Life
How China is Reshaping the Global Economy: Development Impacts in Africa and Latin America
If We Can Keep It: How the Republic Collapsed and How it Might Be Saved
Behave: The Biology of Humans at Our Best and Worst
The Tangled Tree: A Radical New History of Life
AI Superpowers: China, Silicon Valley, and the New World Order
Capital in the Twenty-First Century
On China
China's Asian Dream: Empire Building along the New Silk Road
Who Rules the World?


What's New?

D.C. Morgan Portfolio