- Hsieh A, Morton SU, Willcox JAL, ... Srivastava D, Tristani-Firouzi M, Bruckner M, Lifton RP, Goldmuntz E, Gelb BD, Chung WK, Seidman CE, Seidman JG, Shen Y. Early post-zygotic mutations contribute to congenital heart disease. bioRxiv (Under Review) (2019)
- Potus F, Pauciulo M, Cook EK, Zhu N, Hsieh A, Welch CL, Shen Y, Tian L, Lima P, Mewburn J, D’Arsigny C, Lutz KA, Coleman AW, Damico R, Hassoun PM, Nichols WC, Chung WK, Rauh MJ. Novel mutations and decreased expression of the epigenetic regulator TET2 in pulmonary arterial hypertension. Circulation (Under Review) (2019)
- Zhou X, Astrovskaya I, Turner T, Wang T, Brueggeman L, Barnard R, Hsieh A, Snyder LG, Muzny D, Sabo A, The SPARK Consortium, Gibbs R, Eichler E, O'Roak B, Michaelson J, Volfovsky N, Shen Y, Chung WK. Exome sequencing of 457 autism families recruited online provides evidence for ASD genes. Nature Communications (2019)
- Lewis MJ, Qi H, Hsieh A, Rosenbaum M, Shen Y, Chung W. Impact of damaging de novo variants on clinical outcomes in congenital heart disease. Journal of the American College of Cardiology: Young Investigator Awards Competition (2017)
- Doan S, Lin K-W, Conway M, Ohno-Machado L, Hsieh A, Feupe SF, ... Kim H-E. PhenDisco: phenotype discovery system for the database of genotypes and phenotypes. Journal of the American Medical Informatics Association (2014)
- Lin K-W, Tharp M, Conway M, Hsieh A, Ross M, Kim J, & Kim H-E. Feasibility of Using Clinical Element Models (CEM) to Standardize Phenotype Variables in the Database of Genotypes and Phenotypes (dbGaP). PLoS ONE (2013)
Please see my Google Scholar Page for more details
Detection of mosaic single nucleotide variants and implications for congenital heart disease
Mosaicism, or genetic mutations arising after oocyte fertilization, has been implicated in developmental disorders such as overgrowth syndromes and structural brain malformations, but its role in congenital heart disease (CHD) is not yet well understood. Further, estimates of the frequency of mosaic mutations, a basic genetic question, are inconsistent in recent studies, due to difference in accuracy and power of variant calling methods and sequencing depth. To address these issues, we developed a new computational method to accurately detect mosaic mutations from exome or genome sequencing data. We applied the method to exome sequencing data of 2530 CHD proband-parent trios to estimate the number of mosaic mutations detectable in blood samples and to characterize the contribution of mosaicism to CHD.
Our main findings are summarized below:
- We developed a new method (EM-Mosaic) that jointly estimates the overall frequency of mosaic mutations using an Expectation-Maximization approach and identifies mosaic mutations from the data using a pseudo-Bayesian framework, with a 90% validation rate (among the highest of all recent major publications on mosaics).
- We estimate that each case carries about 0.14 protein-coding mosaic mutations in the blood with allele fraction above 10%, representing about a tenth of new mutations per generation.
- In CHD cases, likely-damaging mosaics have higher allele fraction than benign mutations, strongly supporting a role of mosaics in the disease.
- Analysis of a limited number of subjects (n=66) with matched blood and heart tissue available supports the notion that mosaic mutations in blood samples with relatively high allele fraction are more likely to also be found in heart tissues.
(Research Advisors: Dr. Yufeng Shen, Dr. Wendy Chung)
Predicting neurodevelopmental disorder risk in congenital heart disease patients
January 2015 - November 2015
Rates of neurodevelopmental disorder (NDD) are disproportionately high in congenital heart disease (CHD)patients compared to the general population (up to 10-fold higher prevalence), presumably due to disruption of pleiotropic genes central to many key developmental pathways. We sought to answer the question: Can we predict which CHD patients will develop NDD and which will not?
We applied different statistical and machine learning approaches (logistic regression, RandomForest, SVM, boosting) to a cohort of 2530 congenital heart disease patients. Our outcome variable was a binary NDD diagnosis and we used a combination of genetic and clinical features as our predictors. Models were evaluated using 10-fold cross validation. While model performance was middling, we found the following:
- Damaging de novo mutations showed the strongest association with NDD, with loss-of-function and damaging-missense mutations in published NDD risk genes contributing the most information
- Complex CHD cases, defined as having a CHD diagnosis and other extracardiac manifestations, were at higher risk of NDD than Isolated CHD cases (no extracardiac manifestations).
- Males CHD cases were enriched for NDD compared to females in our cohort
- Neurological, skeletal, and heart morphology related extracardiac diagnoses had the strongest association with NDD
(Rotation Advisor: Dr. Yufeng Shen)
Studying the effect of mRNA methylation on ribosome translation efficiency
September 2014 - January 2015
Using ribosomal profiling and methylation data, I wrote a tool to compare mRNA methylation sites against mRNA sites bound by ribosomes to investigate the effect of methylation on translation efficiency. (Rotation Advisor: Dr. Peter Sims)
Bioinformatics: Competitive Analysis Group
October 2013 - July 2014
I analyzed next generation sequencing QC data from competing platforms to compare performance metrics (alignment, assembly, variant calling) against those of Illumina's platforms. I also reviewed and evaluated both in-house and external applications submitted to BaseSpace.
Genomic analysis of variants in Kawasaki Disease patients and families
June 2013 - July 2014
Using whole genome sequencing data of families (trios) affected by Kawasaki Disease, I developed an analysis pipeline to discover risk variants. (Project under supervision of Jihoon Kim and Dr. Jane Burns)
Phenotype Finder IN Data Resources (PFINDR)
August 2011 - June 2013
Using phenotype datasets from dbGaP, I developed a tool called DIVER to extract and format demographic information (later integrated into PhenDisco). I also worked on a system PhenDisco that fits existing data in dbGaP to an information model that allows users to free text query dbGaP for studies of interest. (Project under supervision of Dr. Hyeoneui Kim)
Analysis of unidentified carbohydrate complexes in the PDB
June 2011 - August 2011
As part of a summer REU (NSF), with a team of 4 other students, I searched, categorized, and annotated unidentified carbohydrate complexes in the PDB.