read.madata Function

Read Microarray data

Description

This is the function to read Microarray experiment data from a TAB delimited text file or matrix object.

Usage

read.madata(datafile=datafile, designfile=designfile, covM = covM, arrayType=c("oneColor", "twoColor"), header=TRUE, spotflag=FALSE, n.rep=1, avgreps=0, log.trans=FALSE, metarow, metacol, row, col, probeid, intensity, matchDataToDesign=FALSE, ...)

Arguments

Value

An object of class madata, which is a list of following components:

Preparing data file

Before using the package, user need to prepare the input data file.

  1. The data file can be a matrix type R object, such as the output of exprs() from array or beadarray package. It is assumed that the intensity is started from the first column and row name is probe ID. Otherwise, column number containing probe ID and intensity should be specified.

  2. The data file can be a TAB delimited text file. In this file, each row corresponds to a gene. In the columns, you can put some gene specific information, e.g., the Probe ID, Gene Bank ID, etc. and the grid location of the spot. But most importantly you need to put the intensity data after that. Most of the Microarray gridding software generate one file for each slide. At this point, you need to manually combine them into the data file. You need to decide which data you want to use in analysis, e.g., mean versus median, background subtracted or not, etc. For N-dye array, your intensity data should have N columns for each array. These N columns need to be adjacent to each other. You can put the spot flag as a column after intensity data for each array. (Note that if you have flag, you will have N+1 columns data for each array.) If you have replicates, replicated measurements of the same probe (clone) on the same array should appear in adjacent rows.

For example, for a 2-dye cDNA array, you have four slides scanned by Gene Pix and you get four files. First you open your favorite Spread Sheet editor, e.g., MS Excel. Copy your probe ID and Cluster ID to the first 2 columns. Then open one of the files generated by Gene Pix, copy the grid location into next 4 columns (you only need to do this once because they are all the same for four slides). Then for all four files, copy the two columns of foreground median value (if you want to use it) and one column of flag to the file in the order of Cy5, Cy3, flag. Then select the whole file and row sort it according to probe ID. Save the file as tab delimited text file and you are done.

The data file must be "full", that is, all rows have to have the same number of fields. When you have missing data in your datafile, you need to check the data or use fill.missing to fill in missing variable.

Sometimes leading and trailing TAB in the text file will bring problems, depends on the operating system. So user need to be careful about that. Preparing design file

Design file can be data.frame or matrix R object or TAB delimited text file. Number of rows of this file equals number of arrays times N (the number of dyes) (plus one for column header, if design file is a TAB delimited file and header = T). The row of design file *MUST* be organized by the order of datafile unless the matchDataToDesign parameter is set to TRUE. For example, if the datafile stores the intensity from array1, array11, array2,..., then the row of designfile must follow this order. Number of columns of this file depends on the experimental design. For example, you can have "Strain", "Diet", "Sex", etc. in your design file. You *MUST* have a column named "Array" in the design file. For two-color array, in addition to the "Array" column, you must have "Sample" and "Dye" columns (case sensitive) in the design file. "Sample" should be integers representing biological individuals. Reference samples should have Sample number to be zero(0). Reference sample will always be treated as fixed factor in mixed model and it will not be involved in any test.

You must NOT have "Spot", "Label" and "covM" columns. They are reserved for spotting, labeling and covariance effects.

Note that you DO NOT have to use all factors in design file. You can put all factors in design file but turn them on/off in formula in fitmaanova. Preparing covariate file

If you have array specific covariate, it should be included in the design matrix. If you have gene specific covariate, you need to prepare matrix type R object or TAB delimited text file, "covM". The size of "covM" equals to the size of intensity data (and TAB delimited text file must have column header if header = T, but NO row name). Specify covM only if you have gene specific covariate variable. Covariate variable must be a numeric value and need to be specified in the fitmaanova.

Author(s)

Hao Wu

Examples

# note that .CEL files are not distributed with the package, thus following

# code does not work. This shows how to read data from affy (or beadarray)

# package, when TAB delimited design file is ready.

library(affy)

beforeRma ← ReadAffy()

rmaData ← rma(beforeRma)

datafile ← exprs(rmaData)

abf1 ← read.madata(datafile=datafile,designfile="design.txt")

# make and read designfile (data.frame type R object) from R

design.table ← data.frame(Array=row.names(pData(beforeRma) ));

Strain ← rep(c("Aj', "B6', "B6xAJ'), each=6)

Sample ← rep(c(1:9), each=2)

designfile ← cbind(design.table, Strain, Sample)

abf1 ← read.madata(datafile, designfile=designfile)

# read in a TAB delimited file with spot flag - for two color array

# HAVE TO SPECIFY that the data is from two color array

kidney.raw ← read.madata("kidney.txt", designfile="kidneydesign.txt", metarow=1, metacol=2, col=3, row=4, probeid=6, intensity=7, arrayType="twoColor',log.trans=T, spotflag=T)