Sequence specific probe signals on SNP microarrays

Glomb, Torsten
Abstract in English: 
Single nucleotide polymorphism (SNP) arrays are important tools widely used for genotyping and copy number estimation. This technology utilizes the specific affinity of fragmented DNA for binding to surface-attached oligonucleotide DNA probes. This thesis contemplates the variability of the probe signals of Affymetrix GeneChip SNP arrays as a function of the probe sequence to identify relevant sequence motifs which potentially cause systematic biases of genotyping and copy number estimates. The probe design of GeneChip SNP arrays affords the identification of different sources of intensity modulations such as the number of mismatches per duplex, perfect match and mismatch base pairings including nearest neighbors and base triples and their position along the probe sequence. Probe sequence effects are estimated in terms of triple motifs with central matches and mismatches including all combinations of possible base pairings. The probe/target interactions on the chip can be decomposed into nearest neighbor contributions which correlate well with free energy terms of DNA/DNA-interactions in solution. The effect of mismatches is about twice as large as that of canonical pairings. Runs of guanines (G) and the particular type of mismatch pairings formed in cross-allelic probe/target duplexes constitute sources of systematic biases of the probe signals with consequences for genotyping and copy number estimates. The poly-G effect seems to be related to the crowded arrangement of probes which facilitates complex formation of neighboring probes with at least three adjacent G’s in their sequence. The applied method of ‘‘triple averaging’’ represents a model-free approach to estimate the mean intensity contributions of different sequence motifs which can be applied in calibration algorithms to correct signal values for sequence effects. Rules for appropriate corrections of the probe intensities are suggested.
GlombTorsten_Thesis.pdf5.93 MB