Summer Student Program

Teaching Students the Nature of Research


The nationally renowned Jackson Laboratory Summer Student Program provides college and high school students with an opportunity to conduct independent research under the guidance of Jackson faculty. More than 2,000 students, including three Nobel laureates, have participated in the program. Students in this program have performed research in the Center since 2006.

Students work closely with a faculty mentor and begin the summer with a written proposal for their planned work. They participate in active research groups and conduct their work in a highly interactive and team oriented atmosphere under the guidance of their mentor. At the end of the summer the students present their work at the program’s one-day symposium and prepare a written research report.


Students - 2012


Jasmine Johnson

Jasmine Johnson
Rockdale Magnet School of Science and Technology, Conyers, GA
Sponsor: Gary Churchill

Identification of a Genetic Network Linking Sleep and Adiposity
Obesity is one of the most detrimental health problems of the 21st century. Over the past few decades, the obesity epidemic has become a major public health problem of the 21st century. In 2008, approximately 500 million adults were diagnosed as obese. By 2015, scientists predict that 700 million adults will be diagnosed as obese in America (World Health Organization, 2011). Past studies have shown that there may be a relationship between amount of sleep and adipose tissue, but the intricacies of this relationship have not been thoroughly studied (University of Chicago Medical Center, 2010). Circadian rhythms, controlled by the suprachiasmatic nucleus in the brain, control both the sleep-wake cycle and the feeding cycle and may be the underlying mechanism linking sleep to adipose tissue (Laposky et al, 2007). A method for studying underlying mechanisms, Quantitative Trait Loci analysis , will be conducted on gene expression data from the different regions of the brain and multiple sleep-wake traits from the B6/BALB cross in order to determine the varying relationships between the sleep-wake traits in addition to relationships between both the expression data and the sleep traits. After linkage analysis has been conducted, significant genes identified through the gene expression data as well MGI will be used to create a genetic network linking adiposity and metabolic genes to sleep QTL Preliminary results show that chromosomes 5 and 13 may be chromosomes of interest in the phenotypic data. Using the confidence intervals of chromosome 13 in the phenotypic data, 600 genes have been identified for chromosome 13 using MGI. In the expression data , there were 120 significant genes with 5 of these genes overlapping between the two. Further analysis of the relationship between the genes and sleep-wake traits will be conducted using conditional genome scans. Additionally, we will strategic equation mapping analysis between to the sleep-wake traits to determine any causal, reactive or independent relationships.

Justin HendrickJustin Hendrick
North Carolina School of Science and Mathematics, Durham, NC
Sponsor: Greg Carter

Using Pleiotrophy to Model Genetic Interaction in the B6xC3H F2 Intercross and Visualization of Results
Pleiotropy is a phenomenon in which one locus af- fects multiple phenotypes. Lean mass and fat mass are partially pleiotropic phenotypes because they have some overlapping QTL. Using a method that exploits the complementary information in these phenotypes, we can infer and interpret genetic inter- actions from QTL to QTL and QTL to phenotype. From these interactions we derived a network. This mouse data set had about 2000 mice, including both male and female. Each mouse was genotyped at 98 locations in the genome. This gave us computational power but low resolution. Therefore we modeled in- teractions for markers and were unable to resolve genes; however this method could also be applied to genes. Our work will be organized into an R pack- age and distributed so that the scienti?c community can also use this method. We are currently writ- ing scripts for network visualization, using a force- directed layout algorithm to position the nodes. To address the problem of large, dense graphs we clus- tered the results and graphed interactions between clusters. This package will allow inference and fast visualization of complex genetic networks

Jean Juang

Jean Juang
Corona del Sol High School in Tempe, Arizona
Sponsors: Joel Graber

A computational biology analysis of the genetic linkage between adrenal gland weight and anxiety behavioral assays
Chronic Lymphocytic Leukemia (CLL) is the most common form of leukemia, affecting six percent of the population ages 65-85. Currently, there are no clear genetic markers for clinical diagnostic use. Large-scale measurement of the type and relative abundance of all of the mRNA transcripts in a sample (the transcriptome) presents a potential means of developing new diagnostic approaches. In this pilot study, we analyzed transcriptome data from sixteen CLL patients, representing a breadth of clinical and prognostic parameters. Since most genes in the human genome can be expressed in more than one version (isoform), our analysis included characterization of the total abundance as well as the relative abundance of each isoform for each gene. Modern transcriptome measurements are generated through high throughput sequencing technology, generating tens of millions of short sequence reads (RNA-seq) that must be analyzed in conjunction with the genome through large-scale computational approaches. We describe these computational approaches and summarize the results of our expression analysis and the implications for distinction among classes of CLL.

Pooja Potharaju

Pooja Potharaju
North Carolina School of Science & Math, Durham, NC
Sponsors: Beverly Paigen, Ron Korstanje

Identifying novel genes that affect HDL Cholesterol Levels by verifying and testing point mutations of ENU Mutagenized HLB 388/ HLB 446 mice
The main goal of genetic research is to achieve a complete understanding of the links between genotype and phenotype of a certain organism in order to better treat and understand it. The development of ENU mutagenesis has made it significantly easier to take a “phenotypic driven” approach. By targeting a specific phenotypic change in mice, scientist can identify candidate gene involved in that phenotype. This project uses the method of mutagenesis to identify mutations that cause increased HDL cholesterol levels in two mutant strains (HLB388 and HLB446). Through exome sequencing, the analysis of sequenced exome data, the verification of mutations in HLB 338 and HLB 446, and the testing of the mutation in a segregating population to ensure the co-segregation of the mutation with HDL, specific candidate genes were targeted and tested. Although a few flaws in primer design prevented any concrete conclusions to be drawn from this project in relation to which genes co-segregated with HDL cholesterol, the data was organized and filtered to be looked at in the future with the better understanding of what to expect and from which genes.

Gabe Vela

Gabe Vela
Rockdale Magnet School of Science and Technology, Conyers, GA
Sponsors: Gary Churchill

Identification of QTL and Candidate Genes Underlying Sleep
Sleep, a fundamental behavior that occupies about one-third of the human lifespan, has been linked to various physical and behavioral diseases such as obesity or metabolic syndrome. While there is much known about sleep-wake properties, little is known about the identity or nature individual of genes behind sleep regulation. We have set out to identify individual genes that regulate sleep traits in order to further our understanding of the mechanisms underlying sleep and its effect on our bodies. We performed quantitative trait locus (QTL) mapping using the R statistical programming language in order to determine regions of the genome that influence sleep traits. Results from the analysis revealed a large number of sleep QTLs on chromosome 17. We then set out to find genes that shared a QTL with sleep traits, and to correlate expression data from said genes with sleep traits using R in order to compile a list of genes that regulate the sleep QTLs on chromosome 17. We identified Lta, a gene known to be a cause of abnormal sleep patterns, and Twsg1, which may be further researched as a gene involved in sleep mechanisms.

David WangDavid Wang
North Carolina School of Science and Mathematics, Durham, NC
Sponsors: Elissa Chesler

Analysis of Time Series Gene Expression Following Nerve Injury
Nerve injury and neuropathic pain affect millions of people each year. The current treatment options are limited and often ineffective. Understanding the transcriptional networks behind injury and recovery may lead to new treatment avenues. Mice serve as excellent models to study nerve injury in humans and both a have a similar transcriptional response to nerve injury. However, the time course of this response is poorly characterized. We analyzed data consisting of expression levels for many time points over a period of 10 days, allowing us to reconstruct the temporal network of transcriptional events after injury. C57BL/6J and C3H/HeJ were chosen as the subjects for their difference in responses after nerve injury. Data was gathered from two tissues, the dorsal horn and dorsal root ganglion, using microarrays. Using many R statistical software packages and the generalized logical network (GLN) program, data was filtered, dimensionally reduced and then used to create a causal network linking together gene expression events with recovery of sensory function. Results indicate that the gene clusters most proximal to recovery in mice have roles in neurological and immunological biological processes.

Lisa Zheng

Lisa Zheng
North Carolina School of Science & Math, Durham, NC
Sponsors: Joel Graber

Analysis of RNA-seq Data in Fasting of C57Bl6/J and C3H/HeJ strains of mice
The availability of food regulates basal metabolism and effects the progression of many diseases. The molecular mechanisms that mediate the response to dietary changes, and specifically fasting, are not well understood. In this study, we examined gene expression changes in two tissues (muscle and bone) drawn from two homozygous strains of laboratory mouse (C57Bl6/J and C3H/HeJ) under three distinct stages of food availability. The current state-of-the-art in large-scale measurement of gene expression is RNA-seq, a procedure that produces millions of short sequence reads, with each representing a part an RNA transcript that was generated from an actively expressed gene. Interpretation of this data requires large-scale computation in conjunction with genomic sequence and annotation. Comparison of samples among tissues, strains, and dietary conditions includes determination of changes in gene expression and significance, including both the total abundance of each gene’s transcripts as well as the relative use of the different transcript versions (isoforms) produced by each gene. We will describe the computational tools and approaches used for this analysis, as well as summarize the identified differences in the gene expression program among tissues, strains and stage of fasting.

 


Students - 2011


Robert Costa
Maine School of Science & Math, Limestone, ME
Sponsors: Joel Graber and Daniela Kamir

Visualizing Differential Expression of RNA Isoforms
The focus of this project has been the creation of a visualization program to flexibly and dynamically display evidence of transcript isoform variation. Isoforms are different transcripts generated by a common gene, and are the result of alternative processing, such as splicing, polyadenylation, or transcription initiation. A more complete understanding of isoforms is necessary for a more complete understanding of genetic disease. The visualization program was developed with the language Processing, an open-source object-oriented computer programming language designed for rapid prototyping and generation of graphical displays. The new program displays a gene’s arrangement on the chromosome, along with all known isoforms, and evidence of alternative expression of isoforms, based on probe-level analysis of microarray data. Dynamic capabilities of the program include interactive features such as zooming, panning, and control of strand direction. Use of this program will allow scientists to better study isoforms and their connection to genetic diseases, and will lead to the development of improved treatments for such diseases.

Carter Harwood
Groton School, Groton, MA
Sponsors: Elissa Chesler and Raymond Robledo

The Search for Depression-related Genes in Diversity Outbred Mice
Depression affects nine percent of the American adults. However, the genetic causes of depression are relatively unknown and many patients do not respond to existing medications. To identify novel mechanisms of depression related behavior through the identification of genetic sources of variation, we initiated a genetic analysis of depression related behavior in the new diversity outbred population. Previous studies have utilized recombinant inbred mouse populations to find quantitative trait loci (QTL) for the tail suspension test, and automated scoring methods are being developed. Video observations of this test are analyzed for duration, frequency and latencies of mobility, immobility and tail climbing. Results were analyzed to evaluate automated scoring accuracy relative to manual scoring. An automated mobility threshold was determined that correlated with real time manual scoring. Differences between inbred strains and sex differences were also analyzed. No sex differences on these measures were detected, but strain differences were extensive. Data obtained in the Diversity Outbred mice were subject to QTL mapping. Preliminary QTL mapping of duration of immobility shows potential loci on chromosome 7 and X for the immobility measure, and an additional locus was detected for tail climbing.

Lillian Kang

Lillian Kang
North Carolina School of Science & Math, Durham, NC
Sponsor: Elissa Chesler

Time-series Analysis of Neuropathic Pain
Time series gene expression analysis was performed to evaluate the temporal response to nerve damage and resulting neuropathic pain. Dense time series expression data from two strains of mice were used to evaluate genetic differences in the response to injury: C57BL/6J, which exhibit higher pain sensitivity; and C3H/HeJ mice, which have lower pain sensitivity after nerve injury. These data were used to identify temporal-causal relationships among expressed genes and their ultimate effect on recovery of function and the development of neuropathic pain. R statistical software was used for microarray ANOVA and co-expression clustering, along with additional software for Generalized Logical Network construction and comparison. Results indicate that distinct sub-networks of genes are co-expressed across the post nerve injury process. Moreover, these gene clusters exhibit causal interactions amenable to validation studies.

Sangeetha Kumar

Sangeetha Kumar
North Carolina School of Science & Math, Durham, NC
Sponsors: Greg Carter

A computational biology analysis of the genetic linkage between adrenal gland weight and anxiety behavioral assays
The Adrenal glands, mainly the inner medullas, are responsible for the secretion of stress related hormones. We used computational biology tools to find a genetic linkage between adrenal gland weight and behavioral anxiety assays, representations of the fight or flight response, in order to investigate if indirectly related phenotypes may be influenced by the same genetic factors. This analysis was completed using singular value decomposition (SVD), a method to reduce complexity in data sets and find common patterns in expression, paired with scans of quantitative trait loci (QTL). To identify biological processes that contribute to this linkage, gene expression data analysis was done on the BXD inbred mouse cross for Cerebellum mRNA. In the end, a set of genes with its corresponding gene ontology annotations was formed that represented common patterns found by SVD and relate to the QTLs of the phenotype traits of interest. We showed that in the male BXD mice QTL on chromosome 15 have positive effects on adrenal weight, entries in closed quadrants of a zero maze, and entries in open quadrants of a zero maze. These traits are correlated with increased expression of genes involved in the regulation of the MAPKKK cascade and regulation of neurogenesis. These results suggest that these specific biological processes underlie the processes by which the QTL affect the adrenal weight and anxiety traits.

Rani Patel

Rani Patel

North Carolina School of Science & Math, Durham, NC
Sponsor: Joel Graber

Visualization of Position-Dependent Regulatory Sequences
Proper gene expression and processing into mature mRNA transcripts requires control by molecular apparatus. This control is typically mediated by short sequences, referred to as regulatory elements or motifs. These motifs can be constrained by both sequence content and positioning relative to a functional site. Our research group recently developed a novel means of identifying motifs with constrained positioning based on non-negative matrix factorization (NMF). NMF generates large datasets that are not easily interpreted in their raw form. Therefore, this project focused on the creation of a visualization program which effectively displayed the NMF output in a graphical design. This visualization program was developed with Processing, an open source language tool that is focused on rapid development of graphical interfaces. The program displays the summary output of the net analysis of a group of related sequences and allows mapping of the output back to the individual sequences in order to identify the most likely functional elements for each gene. We followed a trial and error method to allow dynamic flexibility in how the data was displayed and ultimately create the ideal program. The development of this program will allow for improved understanding of regulatory sequences and more specifically the consequences of mutation and normal sequence variability in such sequences.

Lauren Reagin

Lauren Reagin

Rockdale Magnet School for Science & Technology, Conyers, GA
Sponsors: Gary Churchill and Susan McClatchy

QTL Mapping Analysis of a Diabetic Mouse Backcross (NZO x NON)
There are 23.6 million people that have diabetes within the United States (ADA, 2007). Previous studies have shown a genetic link to this devastating disease. The main purpose of this experiment is to identify QTL’s (quantitative trait loci) that have an effect on the phenotypes of diabetes and to make graphical models. The phenotypic effects of diabetes include leptin resistance, insulin resistance, and high glucose levels. Also, often obesity and diabetes are correlated with each other, so body weight and the fat pad traits (mesenteric, inguinal, peritoneal, and gonadal fat) are used as phenotypes. The research hypothesis: There is a correlation that can be found between the phenotypes and QTL’s found in the data set and a graphical model can be made. The analysis of the data set uses a statistical, code-based program called R, using the r/QTL package, which allows genetic data to be analyzed. Using the results from the analysis, two separate models have been crafted. One model is over the interaction between the fat pad traits, leptin, and the QTL’s. The second model is over body weight and the fat pad traits with their interactions with glucose and the QTL’s.

Kavya Sekar
University of North Carolina, Chapel Hill, NC
Sponsor: Greg Carter

Inferring Genetic Interactions in a Mouse Model for Diabesity
Type II Diabetes is a complex metabolic syndrome mediated by genetic and environmental interactions. To identify relevant genetic loci and their interactions, we examined data from a backcross between the F1 offspring of NZO (New England Obese)/Hilt strain and the NON/Lt (Nonobese Nondiabetic) with the NON parental strain resulting in a continous distribution of diabetes and obesity related phenotypes in the BC1 population. To infer the validity, magnitude and direction of genetic interactions we used the genetic influences decomposition method. Genetic influences decomposition was established in yeast cultures to infer interactions between QTL of interest using linear decomposition of gene expression data. Our preliminary results in the BC1 mouse population identify main effect QTL for multiple diabetic traits including body weight, adiposity and plasma glucose levels on chromosomes 1, 12, 13 and 18 which interact with QTL on chromosomes 4, 5, 6, 7, 15 and 17. Through analyzing subnetworks of interacting loci we can build a model of how multiple genes affect diabetes related traits.

Casey Thornton
Maine School of Science & Math, Limestone
Sponsors: Gary Churchill and Susan McClatchy

Quantitative Trait Loci analysis of insulin and gene expression in pancreatic ß-cells in a BTBRxB6BL/6J intercross
Previous research indicates that chronic activation of endoplasmic reticulum (ER) stress leads to insulin resistance and diabetes in obesity. My objective is to continue this research by developing graphical models for the type II diabetes related ER stress response pathway. This correlates specific genes in the pancreas to how they affect the clinical phenotypes insulin, leptin and LIF (Leukemia Inhibitory Factor). QTL analysis was conducted on 40,574 transcripts though my research narrowed to 13 candidate genes of interest that were derived from ER stress literature. To do this I used R, a command prompt open source statistical software console. In R, QTL, or Quantitative Trait Loci, analysis determines the extent to which a region of the genome influences a quantitative phenotype. In addition, chromosomes 2, 5, 6, 16, and 19 were marked as QTL of interest for the clinical phenotypes insulin, leptin and LIF. BIC (Bayesian Information Criterion) analysis was then used in the Rqtl package as a predictor of likely causal models that relate the QTL, genes, and clinical phenotypes. From this mapping I was able to identify likely candidate relationships that can be linked to the direct cause of insulin resistance in type II diabetes.

Kali Xu
North Carolina School of Science & Math, Durham, NC
Sponsors: Gary A. Churchill and Susan McClatchy

Quantitative trait loci analysis of the relationship between atherosclerotic indicators and gene expression in an MRL/MpJxSM/J F2 intercross
Atherosclerosis is a chronic disease deriving from a combination of inflammatory and lipid metabolism pathways. It is a precursor of cardiovascular disease, the number one cause of death and disability in North America. This study evaluated the genetic factors underlying variation in key atherosclerotic risk factors such as total cholesterol. Using the statistical software R, I conducted a quantitative trait loci (QTL) analysis of liver genotype, phenotype, and transcript data from an MRL/MpJxSM/J F2 mouse intercross. This analysis yielded cholesterol QTL peaks on chromosomes 1, 4, 7, and 18, and after examining the QTL confidence interval on chromosome 18 for cis expression QTL, I found five transcripts to be significant in regulating this QTL. I created linear regression models for these gene, phenotype, and QTL relationships based on the results of Bayesian information criterion analysis and then built a graphical model to represent these relationships. I confirmed one previously known candidate gene affecting cholesterol level (Apoa2) and identified a new candidate gene (Isoc1) whose role in cholesterol regulation may be further explored in future studies. I also found a relationship between cholesterol and bone mineral density that, if validated, may bring up interesting new implications for cholesterol lowering treatments.

 

Students - 2010


Michael Jones

Michael Jones

University of North Carolina at Chapel Hill
Sponsor: Matt Hibbs

The accumulation of a wide range of Quantitative Trait Loci (QTL) and Genome Wide Association Study (GWAS) data for a variety of phenotypes makes the analysis and interpretation of these data difficult. This is particularly true when attempting to graphically represent phenotype-genotype association data. In such cases, the importance of a system that can automatically and selectively display specified information becomes evident. We have developed a tool to address the challenges of visualizing multiple QTL and GWAS datasets. The result is QGV, a QTL study and GWAS results viewer, which allows users to easily interact with QTL and GWAS data. Using the click of a button, users of QGV can rapidly compare the LOD score curves of multiple QTL studies, alongside GWAS results in order to identify common and unique patterns, and to elucidate genetic phenomena that might not otherwise be discovered.

Lukas Jordan

University of Maine at Orono, ME
Sponsor: Joel Graber

The activity of any cell is to a large extent determined by the specific set of genes that are activated in that cell. One means of characterizing the molecular network in a cell is the characterization of the genes that are actively producing transcripts. Modern techniques allow for the simultaneous determination of the relative transcript abundance of tens of thousands of genes. Expression of a single gene can vary either in the total number of transcripts created or in the selection between different versions, or isoforms, of that gene. Existing analysis of large transcript experiments has focused primarily on the relative activity of each gene, counting the total number of transcripts, but largely neglecting the role of relative usage of the variant isoforms. Standard visualization tools also do not highlight this information, hindering analysis and interpretation of this type of data. Improved and automated isoform visualization will facilitate greater understanding of mRNA processing and the biological roles of variant isoforms and how they can be disrupted in disease processes. We have generated a C++ class that is designed to make visual representations of isoforms that are related by exon structure, and that can be further extended to display experimental transcript measurements.

Renee Symonds

Olutoyosi Oyelowo and Renée Symonds

North Carolina School of Science & Math, Durham, NC; Bowdoin College, Brunswick, ME
Sponsors: Ricardo Verdugo and Gary Churchill

Natural variation is an important component underlying multiple traits of medical and commercial relevance. The objective of this study was to use genotype and microarray data to predict unobserved phenotypes. Methodology/Principal Findings: Linear models representing trait networks were built in soybeans (Glycine max) from the training dataset, which consists of genotype data at 941 markers and microarray data for 28,395 genes. The approach used natural genetic variation to infer statistical models of complex networks for genetic and genomic data. Numerous models that incorporate genotype data, microarray data, and a combination of both data sets were estimated to predict complex disease phenotypes related to a major soybean pathogen (Phytophthora sojae). Linear Models were adjusted by two methods: the least squares regression and a regularized multivariate regression approach called Elastic Net. Their predictive ability was assessed by 10-fold cross validation to measure prediction. For genotype data, QTL analysis followed by step-wise variable selection outperformed Elastic Net in terms of the prediction accuracy of the two phenotypes (Err=0.823, 0.670 vs. Err=1.00, 0.982). For microarray data, Elastic Net produced the most predictive model (0.332 and 0.327 vs. 1.210 and 0.980). Combining genotype and microarray data showed that Elastic Net produces the best predictive model for Phenotype 1, while linear modeling produced the best model for Phenotype 2 (Err=0.926, 0.982 vs. Err=1.79, 0.769). Results were consistent for two disease resistance phenotypes analyzed separately. Conclusions: Genome-wide gene expression data best captures information about genetic effects on disease resistance in soybean. The model was not improved by the addition of QTL genotypes in the model.

 

Students - 2009


Alan Bohn

Alan Bohn

North Carolina School of Science & Math, Durham, NC
Sponsors: Rachael Hageman and Gary Churchill

NSAIDs such as ibuprofen are commonly used to control inflammation byinhibiting COX-2. However, the side effects of COX-2 inhibition not fully understood. Transcriptional effects of COX-2 inhibition were explored by comparing the liver and adipose tissues of genetically altered COX-1 > COX-2 exchange mice (B6.129(FVB)-Ptgs2tm2.1(Ptgs1)Fun/J) to C57BL/6NJ controls. ANOVA models were used to identify differentially expressed genes between strains for each tissue. Significantly enriched pathways were determined using Gene Set Enrichment information. GenMapp was used to visualize enriched pathways and patterns of differentially expressed genes. Results show no significant difference in the Ptgs1 or the Ptgs2 locus between the COX-1 > COX-2 mice and the controls, nor are there downstream consequences in prostaglandin synthesis. Results indicate up-regulation in the cholesterol biosynthesis pathway in both liver and adipose tissues. Liver tissue has up-regulated bile acid metabolism and down-regulated gluconeogenesis. In the Reverse Cholesterol Transport pathway, Scarb (SR-B1) and Cel are significantly up-regulated. This suggests an increase in selective uptake of cholesterol esters from HDL. In adipose tissue, there is up-regulation in fatty acid metabolism, ketone degradation, and glycolysis. Therefore, several pathways that produce Acetyl-CoenzymeA are up-regulated, yet Cholesterol Biosynthesis is the only enriched pathway that uses Acetyl-CoenzymeA as a substrate.

Sarah Benjamin

Maine School of Science & Math, Limestone, ME
Sponsors: Peter Vedell and Gary Churchill

Type II diabetes is a disease that affects over 17.9 million people in the United States alone. The goal of this study is to develop a gene expression model that shows the relationships between specific genes in the liver, and how they affect the phenotypes insulin and glucose. The first part of my project involved analyzing all the transcript data for liver tissue that I was given. I used R, a statistical software, to identify quantitative trait loci, or regions of the genome that have a strong relationship with the clinical trait in question, and correlated the transcripts with insulin and glucose. This produced a gene list that I was then able to use to produce a graphical model showing how these genes interact with each other. I identified chromosomes 1,17, and 6 as chromosomes of interest using QTL analysis of insulin and glucose. I also identified approximately forty genes of interest. The next step in this project is to map these genes to the insulin signaling pathway to determine their effect on insulin resistance, and identify strong candidates that can be linked to the direct cause of insulin resistance and type II diabetes.

Minna Chen

Minna Chen

Wayzata High School, Plymouth, MN
Sponsors: Joel Graber and Nicole Leahy

We demonstrated that there are significant differences between genes and biological processes associated with conserved and non-conserved gene deserts. Using a novel means of determining gene deserts based on local protein-coding gene density, we tested the hypothesis that genes and biological processes associated with conserved gene deserts are different from those associated with non-conserved gene deserts. We separated our deserts into conserved and non-conserved groups based upon the level of overlap between our deserts and syntenic blocks, which are regions of the genome that contain the same genes and gene order in evolutionarily diverged organisms. We investigated two different classes of conservation: deep conservation, represented by conservation between mouse and zebrafish genomes, and mammalian conservation, represented by conservation between mouse and human genomes. We used the gene ontology to analyze and compare genes located in both types of deserts to identify the overrepresented biological processes. Initial evaluations indicated an over-representation of genes involved in the regulation of transcription and gene regulation in deeply conserved deserts. Analysis of mammalian conservation revealed that non-conserved deserts have an over-representation of genes involved in cell-cell communication and nervous system development.

Justin Huang

Justin Huang

North Carolina School of Science & Math, Durham, NC
Sponsors: Ricardo Verdugo and Gary Churchill

Many cases of obesity are caused by sedentary lifestyle, but research has also shown that there is a significant genetic factor in the onset of obesity. The goal of this study was to use a Systems Biology approach to identify candidate genes causing obesity in mice that can be tested as drug targets of the treatment of obesity. Quantitative Trait Loci (QTL) analysis was complemented with genome-wide gene expression profiling to discover networks of gene co-expression that are associated to the Fat Percentage (FP) phenotype in an F2 cross between the C57BL/6J and C3H/HeJ mouse strains. An over-representation test was performed to identify biochemical pathways that are over-represented with genes with high correlation to FP and that share one or more QTL with this phenotype. Co-expression networks were then enriched with positional candidates genes that were members of the most significant pathways. Networks were also enriched with genes with LOD score profile highly similar to FP. As a result, I propose Crla2 and Icos as best candidates for the FP QTL on chromosome 1 and the topology of the co-expression networks suggest Rab27b and Sult1e1 as best candidates for drug manipulation for the treatment of obesity.

Sheetal Rajagopal

Sheetal Rajagopal

Univ. of Pennsylvania, Philadelphia, PA
Sponsors: Matt Hibbs

Given the recent production of massive amounts of data, we can utilize computational methods to accurately predict genes associated with phenotypes or diseases. Machine learning methods can make novel inferences of relationships between genes and phenotypes by training on existing data. We trained a Bayesian network (BN) based on a compendium of microarray data and a “gold standard” constructed from all experimentally proven gene-phenotype associations in mice. We used this BN to infer the probabilities that gene pairs are phenotypically related. We then constructed a phenotypic relationship network (PRN), in which the probability of phenotypic relationship is the weight that connects each gene to every other gene. We mined our PRN to make novel predictions of genes associated with the phenotype “abnormal DNA repair.” Also, we created a second PRN focused on the key phenotypes of ovarian cancer. This second BN was trained on an ovarian cancer specific gold standard, and the resulting PRN was mined to determine which genes are most associated with ovarian cancer’s key phenotypes. Initial evaluations indicate that our predictions are promising candidates for further experimental research and could potentially be used to research the causes and treatment of ovarian cancer and abnormal DNA repair.

 

Students - 2008


Elizabeth Deerhanke

Marion Elizabeth Deerhanke

The North Carolina School of Science and Mathematics, Durham, NC
Sponsors: Gary Churchill and Randy Von Smith

Elizabeth investigated hypertension as a complex phenotype and searched for the genetic basis of this widespread disease using quantitative trait loci analysis. In her study she conducted a QTL analysis of systolic blood pressure in F2 males from eight intercrosses comprising fourteen inbred mouse strains. The use of multiple crosses allows for greater precision in narrowing QTL regions through identification of concordant peaks. The search for individual and interacting pairs of loci affecting systolic blood pressure indicated fourteen significant QTL, three epistatic interactions, and linked QTL on Chr 3. Through multiple regression analysis, she developed multiple QTL models for each of the eight intercrosses accounting for as much as 35% of phenotypic variance. These novel loci affecting blood pressure contributed to our understanding of the complex genetic basis of hypertension.

Alex Ellison

Connecticut College
Sponsor: Joel Graber

Alex Ellison identified genes within the deserts identified by Cheryl Zapata (Summer Student in 2007) and tested these for over-representation of biological processes and molecular functions within in the deserts. The research identified genes involved in cell-cell adhesion and neurogenesis occurring more frequently than expected under a random model.

Ryan Keating

Ryan Keating

The Maine School of Science and Mathematics, Limeston, ME
Sponsors: Gary Churchill and Randy Von Smith

Ryan investigated chronic kidney disease (CKD). CKD, a complex trait, is affected by developmental, environmental, and genetic factors. In his study, eight intercrosses from fourteen inbred mouse strains were analyzed for genetic factors that influence CKD. He performed quantitative trait loci (QTL) analysis of kidney weight with a covariate of body weight to identify regions of the mouse genome that affect CKD. From these eight crosses, he identified twenty significant (P <.05) QTL and eight interactive QTL pairs and accounted for variance in kidney weight, ranging from 51.7 to 73.0 percent. Identification of candidate genes from the significant QTL could then be used to locate orthologous regions in humans.

 

Students - 2007

Arielle Torres

Arielle Torres

Brandeis University
Sponsor: Joel Graber

Arielle created an interface in Perl that allowed her to input gene lists and interaction thresholds. It then searched a pre-existing data set detailing patterns of correlated inheritance (represented by linkage disequibrium of pairs of single nucleotide polymorphisms or SNPs) and extracted interacting pairs. Such interaction webs can later be integrated with gene expression profiles and other data sets, enabling a broad network and profile analysis.

David Witmer

David Witmer

The Maine School of Science and Mathematics, Limestone, ME
Sponsor: Gary Churchill

David investigated gene expression patterns in multi-factorial DNA microarray data. The core of his study was the utilization of ANOVA-based statistical tests to test general and focused research hypotheses through overall F-tests and specific contrasts. Groups of co-expressed genes were resolved through hierarchical and k-means clustering analysis. Important biological processes associated with key factors were determined by statistical tests for association with gene ontology (GO) terms. Finally, he investigated relationships between gene expression levels and phenotypic response patterns.

Cheryl Zapata

Cheryl Zapata

North Carolina State University
Sponsor: Joel Graber

Cheryl Zapata constructed maps of gene deserts under the definition of gene sparsity. Five definitions of “gene” were used: complete transcript, transcription start sites, exons, coding exons, and exons plus introns. She found that different definitions were either similar to coding exons or exons plus introns. Next, it was demonstrated that deserts are not significantly more or less conserved than the rest of the genome, but some individual chromosomes had greater than expected conservation.

 

Students - 2006

Arielle Torres

Arielle Torres

Brandeis University
Sponsor: Joel Graber

Arielle developed a web-based browser for viewing networks. This tool was developed in the context of the linkage disequilibrium data generated in the Center, but will be generalized to be independent of the underlying data type. This web-based tool allows a user to submit one or more “seed” regions or genes and returns tabular and graphical representations of the sub-network surrounding the seeds. Work is being performed to cast this analysis into a form that can be viewed with existing interfaces such as Cytoscape and N-browse.

Luis Zapata

Luis Zapata

North Carolina School of Science and Math
Sponsor: Gary Churchill

Luis worked on analyzing, quantifying, and assessing the gene expression in 12 inbred mouse strains of each sex raised on high fat or chow diets using Genechip arrays. Luis assessed probe level quality, normalized probe intensity, adjusted background, and performed graphical diagnostics on microarray images. He then assessed the influence and interactions of strain, sex, and diet on the overall intensity value of each gene on the array. He finally generated lists of genes that will be studied for functional associations using available genomic annotation software tools.


Summer Students White Water Rafting

The Summer Student Program is designed to help students understand the nature of research science. The emphasis of this program is on methods of discovery and communication of knowledge, not the mastery of established facts.

[ Application Information ]

Stipend

Students are awarded a stipend while experiencing real science and research.

Independent Research

Under the guidance of a mentor, students develop an independent research project, implement their plan, analyze the data, and report their results. At the end of the summer, students present their findings to researchers, peers, and parents.

Dynamic Students

Each year, the program consists of about thirty students from around the United States, from both high school and undergraduate institutions. Their varied interests and backgrounds create a lively, well-rounded atmosphere at the lab.

Stimulating Environment

Nestled in the outskirts of Acadia National Park, The Jackson Lab is surrounded with possibilities for outdoor adventure. Between hiking, swimming, biking, and bird watching, lab employees and locals are continuously inspired by the pristine landscape.