Summer Student Program
Teaching Students the Nature of Research
The nationally renowned Jackson Laboratory Summer Student Program provides college and high school students with an opportunity to conduct independent research under the guidance of Jackson faculty. More than 2,000 students, including three Nobel laureates, have participated in the program. Students in this program have performed research in the Center since 2006.
Students work closely with a faculty mentor and begin the summer with a written proposal for their planned work. They participate in active research groups and conduct their work in a highly interactive and team oriented atmosphere under the guidance of their mentor. At the end of the summer the students present their work at the program’s one-day symposium and prepare a written research report.
Students - 2011
Robert Costa
Maine School of Science & Math, Limestone
Sponsors: Joel Graber and Daniela Kamir
Visualizing Differential Expression of RNA Isoforms
The focus of this project has been the creation of a visualization program to flexibly and dynamically display evidence of transcript isoform variation. Isoforms are different transcripts generated by a common gene, and are the result of alternative processing, such as splicing, polyadenylation, or transcription initiation. A more complete understanding of isoforms is necessary for a more complete understanding of genetic disease. The visualization program was developed with the language Processing, an open-source object-oriented computer programming language designed for rapid prototyping and generation of graphical displays. The new program displays a gene’s arrangement on the chromosome, along with all known isoforms, and evidence of alternative expression of isoforms, based on probe-level analysis of microarray data. Dynamic capabilities of the program include interactive features such as zooming, panning, and control of strand direction. Use of this program will allow scientists to better study isoforms and their connection to genetic diseases, and will lead to the development of improved treatments for such diseases.
Carter Harwood
Groton School, Groton, MA
Sponsors: Elissa Chesler and Raymond Robledo
The Search for Depression-related Genes in Diversity Outbred Mice
Depression affects nine percent of the American adults. However, the genetic causes of depression are relatively unknown and many patients do not respond to existing medications. To identify novel mechanisms of depression related behavior through the identification of genetic sources of variation, we initiated a genetic analysis of depression related behavior in the new diversity outbred population. Previous studies have utilized recombinant inbred mouse populations to find quantitative trait loci (QTL) for the tail suspension test, and automated scoring methods are being developed. Video observations of this test are analyzed for duration, frequency and latencies of mobility, immobility and tail climbing. Results were analyzed to evaluate automated scoring accuracy relative to manual scoring. An automated mobility threshold was determined that correlated with real time manual scoring. Differences between inbred strains and sex differences were also analyzed. No sex differences on these measures were detected, but strain differences were extensive. Data obtained in the Diversity Outbred mice were subject to QTL mapping. Preliminary QTL mapping of duration of immobility shows potential loci on chromosome 7 and X for the immobility measure, and an additional locus was detected for tail climbing.
Lillian Kang
North Carolina School of Science & Math, Durham, NC
Sponsor: Elissa Chesler
Time-series Analysis of Neuropathic Pain
Time series gene expression analysis was performed to evaluate the temporal response to nerve damage and resulting neuropathic pain. Dense time series expression data from two strains of mice were used to evaluate genetic differences in the response to injury: C57BL/6J, which exhibit higher pain sensitivity; and C3H/HeJ mice, which have lower pain sensitivity after nerve injury. These data were used to identify temporal-causal relationships among expressed genes and their ultimate effect on recovery of function and the development of neuropathic pain. R statistical software was used for microarray ANOVA and co-expression clustering, along with additional software for Generalized Logical Network construction and comparison. Results indicate that distinct sub-networks of genes are co-expressed across the post nerve injury process. Moreover, these gene clusters exhibit causal interactions amenable to validation studies.
Sangeetha Kumar
North Carolina School of Science & Math, Durham, NC
Sponsors: Greg Carter
A computational biology analysis of the genetic linkage between adrenal gland weight and anxiety behavioral assays
The Adrenal glands, mainly the inner medullas, are responsible for the secretion of stress related hormones. We used computational biology tools to find a genetic linkage between adrenal gland weight and behavioral anxiety assays, representations of the fight or flight response, in order to investigate if indirectly related phenotypes may be influenced by the same genetic factors. This analysis was completed using singular value decomposition (SVD), a method to reduce complexity in data sets and find common patterns in expression, paired with scans of quantitative trait loci (QTL). To identify biological processes that contribute to this linkage, gene expression data analysis was done on the BXD inbred mouse cross for Cerebellum mRNA. In the end, a set of genes with its corresponding gene ontology annotations was formed that represented common patterns found by SVD and relate to the QTLs of the phenotype traits of interest. We showed that in the male BXD mice QTL on chromosome 15 have positive effects on adrenal weight, entries in closed quadrants of a zero maze, and entries in open quadrants of a zero maze. These traits are correlated with increased expression of genes involved in the regulation of the MAPKKK cascade and regulation of neurogenesis. These results suggest that these specific biological processes underlie the processes by which the QTL affect the adrenal weight and anxiety traits.
Rani Patel
North Carolina School of Science & Math, Durham, NC
Sponsor: Joel Graber
Visualization of Position-Dependent Regulatory Sequences
Proper gene expression and processing into mature mRNA transcripts requires control by molecular apparatus. This control is typically mediated by short sequences, referred to as regulatory elements or motifs. These motifs can be constrained by both sequence content and positioning relative to a functional site. Our research group recently developed a novel means of identifying motifs with constrained positioning based on non-negative matrix factorization (NMF). NMF generates large datasets that are not easily interpreted in their raw form. Therefore, this project focused on the creation of a visualization program which effectively displayed the NMF output in a graphical design. This visualization program was developed with Processing, an open source language tool that is focused on rapid development of graphical interfaces. The program displays the summary output of the net analysis of a group of related sequences and allows mapping of the output back to the individual sequences in order to identify the most likely functional elements for each gene. We followed a trial and error method to allow dynamic flexibility in how the data was displayed and ultimately create the ideal program. The development of this program will allow for improved understanding of regulatory sequences and more specifically the consequences of mutation and normal sequence variability in such sequences.
Lauren Reagin
Rockdale Magnet School for Science & Technology, Conyers, GA
Sponsors: Gary Churchill and Susan McClatchy
QTL Mapping Analysis of a Diabetic Mouse Backcross (NZO x NON)
There are 23.6 million people that have diabetes within the United States (ADA, 2007). Previous studies have shown a genetic link to this devastating disease. The main purpose of this experiment is to identify QTL’s (quantitative trait loci) that have an effect on the phenotypes of diabetes and to make graphical models. The phenotypic effects of diabetes include leptin resistance, insulin resistance, and high glucose levels. Also, often obesity and diabetes are correlated with each other, so body weight and the fat pad traits (mesenteric, inguinal, peritoneal, and gonadal fat) are used as phenotypes. The research hypothesis: There is a correlation that can be found between the phenotypes and QTL’s found in the data set and a graphical model can be made. The analysis of the data set uses a statistical, code-based program called R, using the r/QTL package, which allows genetic data to be analyzed. Using the results from the analysis, two separate models have been crafted. One model is over the interaction between the fat pad traits, leptin, and the QTL’s. The second model is over body weight and the fat pad traits with their interactions with glucose and the QTL’s.
Kavya Sekar
University of North Carolina, Chapel Hill, NC
Sponsor: Greg Carter
Inferring Genetic Interactions in a Mouse Model for Diabesity
Type II Diabetes is a complex metabolic syndrome mediated by genetic and environmental interactions. To identify relevant genetic loci and their interactions, we examined data from a backcross between the F1 offspring of NZO (New England Obese)/Hilt strain and the NON/Lt (Nonobese Nondiabetic) with the NON parental strain resulting in a continous distribution of diabetes and obesity related phenotypes in the BC1 population. To infer the validity, magnitude and direction of genetic interactions we used the genetic influences decomposition method. Genetic influences decomposition was established in yeast cultures to infer interactions between QTL of interest using linear decomposition of gene expression data. Our preliminary results in the BC1 mouse population identify main effect QTL for multiple diabetic traits including body weight, adiposity and plasma glucose levels on chromosomes 1, 12, 13 and 18 which interact with QTL on chromosomes 4, 5, 6, 7, 15 and 17. Through analyzing subnetworks of interacting loci we can build a model of how multiple genes affect diabetes related traits.
Casey Thornton
Maine School of Science & Math, Limestone
Sponsors: Gary Churchill and Susan McClatchy
Quantitative Trait Loci analysis of insulin and gene expression in pancreatic ß-cells in a BTBRxB6BL/6J intercross
Previous research indicates that chronic activation of endoplasmic reticulum (ER) stress leads to insulin resistance and diabetes in obesity. My objective is to continue this research by developing graphical models for the type II diabetes related ER stress response pathway. This correlates specific genes in the pancreas to how they affect the clinical phenotypes insulin, leptin and LIF (Leukemia Inhibitory Factor). QTL analysis was conducted on 40,574 transcripts though my research narrowed to 13 candidate genes of interest that were derived from ER stress literature. To do this I used R, a command prompt open source statistical software console. In R, QTL, or Quantitative Trait Loci, analysis determines the extent to which a region of the genome influences a quantitative phenotype. In addition, chromosomes 2, 5, 6, 16, and 19 were marked as QTL of interest for the clinical phenotypes insulin, leptin and LIF. BIC (Bayesian Information Criterion) analysis was then used in the Rqtl package as a predictor of likely causal models that relate the QTL, genes, and clinical phenotypes. From this mapping I was able to identify likely candidate relationships that can be linked to the direct cause of insulin resistance in type II diabetes.
Kali Xu
North Carolina School of Science & Math, Durham, NC
Sponsors: Gary A. Churchill and Susan McClatchy
Quantitative trait loci analysis of the relationship between atherosclerotic indicators and gene expression in an MRL/MpJxSM/J F2 intercross
Atherosclerosis is a chronic disease deriving from a combination of inflammatory and lipid metabolism pathways. It is a precursor of cardiovascular disease, the number one cause of death and disability in North America. This study evaluated the genetic factors underlying variation in key atherosclerotic risk factors such as total cholesterol. Using the statistical software R, I conducted a quantitative trait loci (QTL) analysis of liver genotype, phenotype, and transcript data from an MRL/MpJxSM/J F2 mouse intercross. This analysis yielded cholesterol QTL peaks on chromosomes 1, 4, 7, and 18, and after examining the QTL confidence interval on chromosome 18 for cis expression QTL, I found five transcripts to be significant in regulating this QTL. I created linear regression models for these gene, phenotype, and QTL relationships based on the results of Bayesian information criterion analysis and then built a graphical model to represent these relationships. I confirmed one previously known candidate gene affecting cholesterol level (Apoa2) and identified a new candidate gene (Isoc1) whose role in cholesterol regulation may be further explored in future studies. I also found a relationship between cholesterol and bone mineral density that, if validated, may bring up interesting new implications for cholesterol lowering treatments.
Students - 2010

Michael Jones
University of North Carolina at Chapel Hill
Sponsor: Matt Hibbs
The accumulation of a wide range of Quantitative Trait Loci (QTL) and Genome Wide Association Study (GWAS) data for a variety of phenotypes makes the analysis and interpretation of these data difficult. This is particularly true when attempting to graphically represent phenotype-genotype association data. In such cases, the importance of a system that can automatically and selectively display specified information becomes evident. We have developed a tool to address the challenges of visualizing multiple QTL and GWAS datasets. The result is QGV, a QTL study and GWAS results viewer, which allows users to easily interact with QTL and GWAS data. Using the click of a button, users of QGV can rapidly compare the LOD score curves of multiple QTL studies, alongside GWAS results in order to identify common and unique patterns, and to elucidate genetic phenomena that might not otherwise be discovered.
Lukas Jordan
University of Maine at Orono, ME
Sponsor: Joel Graber
The activity of any cell is to a large extent determined by the specific set of genes that are activated in that cell. One means of characterizing the molecular network in a cell is the characterization of the genes that are actively producing transcripts. Modern techniques allow for the simultaneous determination of the relative transcript abundance of tens of thousands of genes. Expression of a single gene can vary either in the total number of transcripts created or in the selection between different versions, or isoforms, of that gene. Existing analysis of large transcript experiments has focused primarily on the relative activity of each gene, counting the total number of transcripts, but largely neglecting the role of relative usage of the variant isoforms. Standard visualization tools also do not highlight this information, hindering analysis and interpretation of this type of data. Improved and automated isoform visualization will facilitate greater understanding of mRNA processing and the biological roles of variant isoforms and how they can be disrupted in disease processes. We have generated a C++ class that is designed to make visual representations of isoforms that are related by exon structure, and that can be further extended to display experimental transcript measurements.

Olutoyosi Oyelowo and Renée Symonds
North Carolina School of Science & Math, Durham, NC; Bowdoin College, Brunswick, ME
Sponsors: Ricardo Verdugo and Gary Churchill
Natural variation is an important component underlying multiple traits of medical and commercial relevance. The objective of this study was to use genotype and microarray data to predict unobserved phenotypes. Methodology/Principal Findings: Linear models representing trait networks were built in soybeans (Glycine max) from the training dataset, which consists of genotype data at 941 markers and microarray data for 28,395 genes. The approach used natural genetic variation to infer statistical models of complex networks for genetic and genomic data. Numerous models that incorporate genotype data, microarray data, and a combination of both data sets were estimated to predict complex disease phenotypes related to a major soybean pathogen (Phytophthora sojae). Linear Models were adjusted by two methods: the least squares regression and a regularized multivariate regression approach called Elastic Net. Their predictive ability was assessed by 10-fold cross validation to measure prediction. For genotype data, QTL analysis followed by step-wise variable selection outperformed Elastic Net in terms of the prediction accuracy of the two phenotypes (Err=0.823, 0.670 vs. Err=1.00, 0.982). For microarray data, Elastic Net produced the most predictive model (0.332 and 0.327 vs. 1.210 and 0.980). Combining genotype and microarray data showed that Elastic Net produces the best predictive model for Phenotype 1, while linear modeling produced the best model for Phenotype 2 (Err=0.926, 0.982 vs. Err=1.79, 0.769). Results were consistent for two disease resistance phenotypes analyzed separately. Conclusions: Genome-wide gene expression data best captures information about genetic effects on disease resistance in soybean. The model was not improved by the addition of QTL genotypes in the model.
Students - 2009

Alan Bohn
North Carolina School of Science & Math, Durham, NC
Sponsors: Rachael Hageman and Gary Churchill
NSAIDs such as ibuprofen are commonly used to control inflammation byinhibiting COX-2. However, the side effects of COX-2 inhibition not fully understood. Transcriptional effects of COX-2 inhibition were explored by comparing the liver and adipose tissues of genetically altered COX-1 > COX-2 exchange mice (B6.129(FVB)-Ptgs2tm2.1(Ptgs1)Fun/J) to C57BL/6NJ controls. ANOVA models were used to identify differentially expressed genes between strains for each tissue. Significantly enriched pathways were determined using Gene Set Enrichment information. GenMapp was used to visualize enriched pathways and patterns of differentially expressed genes. Results show no significant difference in the Ptgs1 or the Ptgs2 locus between the COX-1 > COX-2 mice and the controls, nor are there downstream consequences in prostaglandin synthesis. Results indicate up-regulation in the cholesterol biosynthesis pathway in both liver and adipose tissues. Liver tissue has up-regulated bile acid metabolism and down-regulated gluconeogenesis. In the Reverse Cholesterol Transport pathway, Scarb (SR-B1) and Cel are significantly up-regulated. This suggests an increase in selective uptake of cholesterol esters from HDL. In adipose tissue, there is up-regulation in fatty acid metabolism, ketone degradation, and glycolysis. Therefore, several pathways that produce Acetyl-CoenzymeA are up-regulated, yet Cholesterol Biosynthesis is the only enriched pathway that uses Acetyl-CoenzymeA as a substrate.
Sarah Benjamin
Maine School of Science & Math, Limestone, ME
Sponsors: Peter Vedell and Gary Churchill
Type II diabetes is a disease that affects over 17.9 million people in the United States alone. The goal of this study is to develop a gene expression model that shows the relationships between specific genes in the liver, and how they affect the phenotypes insulin and glucose. The first part of my project involved analyzing all the transcript data for liver tissue that I was given. I used R, a statistical software, to identify quantitative trait loci, or regions of the genome that have a strong relationship with the clinical trait in question, and correlated the transcripts with insulin and glucose. This produced a gene list that I was then able to use to produce a graphical model showing how these genes interact with each other. I identified chromosomes 1,17, and 6 as chromosomes of interest using QTL analysis of insulin and glucose. I also identified approximately forty genes of interest. The next step in this project is to map these genes to the insulin signaling pathway to determine their effect on insulin resistance, and identify strong candidates that can be linked to the direct cause of insulin resistance and type II diabetes.

Minna Chen
Wayzata High School, Plymouth, MN
Sponsors: Joel Graber and Nicole Leahy
We demonstrated that there are significant differences between genes and biological processes associated with conserved and non-conserved gene deserts. Using a novel means of determining gene deserts based on local protein-coding gene density, we tested the hypothesis that genes and biological processes associated with conserved gene deserts are different from those associated with non-conserved gene deserts. We separated our deserts into conserved and non-conserved groups based upon the level of overlap between our deserts and syntenic blocks, which are regions of the genome that contain the same genes and gene order in evolutionarily diverged organisms. We investigated two different classes of conservation: deep conservation, represented by conservation between mouse and zebrafish genomes, and mammalian conservation, represented by conservation between mouse and human genomes. We used the gene ontology to analyze and compare genes located in both types of deserts to identify the overrepresented biological processes. Initial evaluations indicated an over-representation of genes involved in the regulation of transcription and gene regulation in deeply conserved deserts. Analysis of mammalian conservation revealed that non-conserved deserts have an over-representation of genes involved in cell-cell communication and nervous system development.

Justin Huang
North Carolina School of Science & Math, Durham, NC
Sponsors: Ricardo Verdugo and Gary Churchill
Many cases of obesity are caused by sedentary lifestyle, but research has also shown that there is a significant genetic factor in the onset of obesity. The goal of this study was to use a Systems Biology approach to identify candidate genes causing obesity in mice that can be tested as drug targets of the treatment of obesity. Quantitative Trait Loci (QTL) analysis was complemented with genome-wide gene expression profiling to discover networks of gene co-expression that are associated to the Fat Percentage (FP) phenotype in an F2 cross between the C57BL/6J and C3H/HeJ mouse strains. An over-representation test was performed to identify biochemical pathways that are over-represented with genes with high correlation to FP and that share one or more QTL with this phenotype. Co-expression networks were then enriched with positional candidates genes that were members of the most significant pathways. Networks were also enriched with genes with LOD score profile highly similar to FP. As a result, I propose Crla2 and Icos as best candidates for the FP QTL on chromosome 1 and the topology of the co-expression networks suggest Rab27b and Sult1e1 as best candidates for drug manipulation for the treatment of obesity.

Sheetal Rajagopal
Univ. of Pennsylvania, Philadelphia, PA
Sponsors: Matt Hibbs
Given the recent production of massive amounts of data, we can utilize computational methods to accurately predict genes associated with phenotypes or diseases. Machine learning methods can make novel inferences of relationships between genes and phenotypes by training on existing data. We trained a Bayesian network (BN) based on a compendium of microarray data and a “gold standard” constructed from all experimentally proven gene-phenotype associations in mice. We used this BN to infer the probabilities that gene pairs are phenotypically related. We then constructed a phenotypic relationship network (PRN), in which the probability of phenotypic relationship is the weight that connects each gene to every other gene. We mined our PRN to make novel predictions of genes associated with the phenotype “abnormal DNA repair.” Also, we created a second PRN focused on the key phenotypes of ovarian cancer. This second BN was trained on an ovarian cancer specific gold standard, and the resulting PRN was mined to determine which genes are most associated with ovarian cancer’s key phenotypes. Initial evaluations indicate that our predictions are promising candidates for further experimental research and could potentially be used to research the causes and treatment of ovarian cancer and abnormal DNA repair.
Students - 2008

Marion Elizabeth Deerhanke
The North Carolina School of Science and Mathematics, Durham, NC
Sponsors: Gary Churchill and Randy Von Smith
Elizabeth investigated hypertension as a complex phenotype and searched for the genetic basis of this widespread disease using quantitative trait loci analysis. In her study she conducted a QTL analysis of systolic blood pressure in F2 males from eight intercrosses comprising fourteen inbred mouse strains. The use of multiple crosses allows for greater precision in narrowing QTL regions through identification of concordant peaks. The search for individual and interacting pairs of loci affecting systolic blood pressure indicated fourteen significant QTL, three epistatic interactions, and linked QTL on Chr 3. Through multiple regression analysis, she developed multiple QTL models for each of the eight intercrosses accounting for as much as 35% of phenotypic variance. These novel loci affecting blood pressure contributed to our understanding of the complex genetic basis of hypertension.
Alex Ellison
Connecticut College
Sponsor: Joel Graber
Alex Ellison identified genes within the deserts identified by Cheryl Zapata (Summer Student in 2007) and tested these for over-representation of biological processes and molecular functions within in the deserts. The research identified genes involved in cell-cell adhesion and neurogenesis occurring more frequently than expected under a random model.

Ryan Keating
The Maine School of Science and Mathematics, Limeston, ME
Sponsors: Gary Churchill and Randy Von Smith
Ryan investigated chronic kidney disease (CKD). CKD, a complex trait, is affected by developmental, environmental, and genetic factors. In his study, eight intercrosses from fourteen inbred mouse strains were analyzed for genetic factors that influence CKD. He performed quantitative trait loci (QTL) analysis of kidney weight with a covariate of body weight to identify regions of the mouse genome that affect CKD. From these eight crosses, he identified twenty significant (P <.05) QTL and eight interactive QTL pairs and accounted for variance in kidney weight, ranging from 51.7 to 73.0 percent. Identification of candidate genes from the significant QTL could then be used to locate orthologous regions in humans.
Students - 2007
Arielle TorresBrandeis University Arielle created an interface in Perl that allowed her to input gene lists and interaction thresholds. It then searched a pre-existing data set detailing patterns of correlated inheritance (represented by linkage disequibrium of pairs of single nucleotide polymorphisms or SNPs) and extracted interacting pairs. Such interaction webs can later be integrated with gene expression profiles and other data sets, enabling a broad network and profile analysis. |
David WitmerThe Maine School of Science and Mathematics, Limestone, ME
David investigated gene expression patterns in multi-factorial DNA microarray data. The core of his study was the utilization of ANOVA-based statistical tests to test general and focused research hypotheses through overall F-tests and specific contrasts. Groups of co-expressed genes were resolved through hierarchical and k-means clustering analysis. Important biological processes associated with key factors were determined by statistical tests for association with gene ontology (GO) terms. Finally, he investigated relationships between gene expression levels and phenotypic response patterns. |
Cheryl ZapataNorth Carolina State University
Cheryl Zapata constructed maps of gene deserts under the definition of gene sparsity. Five definitions of “gene” were used: complete transcript, transcription start sites, exons, coding exons, and exons plus introns. She found that different definitions were either similar to coding exons or exons plus introns. Next, it was demonstrated that deserts are not significantly more or less conserved than the rest of the genome, but some individual chromosomes had greater than expected conservation.
|
Students - 2006 |
Arielle TorresBrandeis University
Arielle developed a web-based browser for viewing networks. This tool was developed in the context of the linkage disequilibrium data generated in the Center, but will be generalized to be independent of the underlying data type. This web-based tool allows a user to submit one or more “seed” regions or genes and returns tabular and graphical representations of the sub-network surrounding the seeds. Work is being performed to cast this analysis into a form that can be viewed with existing interfaces such as Cytoscape and N-browse. |
Luis Zapata North Carolina School of Science and Math
Luis worked on analyzing, quantifying, and assessing the gene expression in 12 inbred mouse strains of each sex raised on high fat or chow diets using Genechip arrays. Luis assessed probe level quality, normalized probe intensity, adjusted background, and performed graphical diagnostics on microarray images. He then assessed the influence and interactions of strain, sex, and diet on the overall intensity value of each gene on the array. He finally generated lists of genes that will be studied for functional associations using available genomic annotation software tools. |


