![]() |
BIOLOGICAL SEQUENCE ANALYSIS |
| Math/Stats 547 (Lecture) and 548 (Lab) | MWF, 9-10 (1084 East Hall); Fri, 10-11 (B743 East Hall) |
|
|
[These are the background notes for the end of term project for 2003. They will be updated for 2004, but are left here for the general information of students in 2004.] Recall the basic outlines of the project:
It is certainly possible to use a paper for your project which is not online, but then we will have to have copies xeroxed for the other participants beforehand.
Suggested Topics and Papers: a) Scoring and statistical methods: 1. S.Karlin and S.F. Altschul, Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes, Proc. Nat. Acad. Sci. USA 87(1990), 2264-2268. 2. R.Schwartz and Y.-L.Chow, The N-best algorithm: an efficient and exact procedure for finding the n most likely hypotheses. In Proceedings of ICAASP '90, 81-84. 3. L.R.Cardon, C.Burge, D.A.Clayton and S.Karlin, Pervasive CpG suppression in animal mitochondrial genomes, Proc. Nat. Acad. Sci. USA 91(1994), 3799-3803. 4. K. Sjölander, K. Karplus, M. Brown, R. Hughey, A. Krogh, I. S. Mian and D. Haussler, Dirichlet mixtures: A method for improved detection of weak but significant protein sequence homology, CABIOS, 12(1996), 327-345. 5. S.Govindarajan, R.Recabarren and R.A.Goldstein, Estimating the total number of protein folds, Proteins 35(1999), 408-414. 6. Bin Qian and R.A.Goldstein, Distribution of indel lengths, Proteins 45(2001), 102-104.
b) Multiple sequence alignment and analysis: 1. G.Hertz and G. Stormo, Identifying DNA and protein patterns with statistically significant alignments of multiple sequences, Bioinformatics 15(1999), 563-577. 2. A.Bateman, E.Birney, L.Cerruti, R.Durbin, L.Etwiller, S.Eddy, S.Griffiths-Jones, K.L.Howe, M.Marshall and E.L.Sonnhammer, The Pfam protein families database, Nucleic Acids Research 30(2002):276-80.
c) Gene finding, gene structure: 1. N.Miyajima, C.Burge and T.Saito, Computational and experimental analysis identifies many novel human genes, Biochemical & Biophysical Research Communications 272(2000), 801-807. 2. A. Krogh, Using database matches with HMMgene for automated gene detection in Drosophila, Genome Research 10(2000), 523-528. 3. M.Das, C.Burge, E.Park, J.Colinas and J. Pelletier, Assessment of the total number of human transcription units, Genomics 77(2001), 71-78. 4. L.P.Lim and C.Burge, A computational analysis of sequence features involved in recognition of short introns, Proc. Nat. Acad. Sci. USA 98(2001),11193-11198. 5. M. Skovgaard,
L. J. Jensen, S. Brunak, D. Ussery and A. Krogh, 6. W.Fairbrother, R.Yeh, P.Sharp and C.Burge, Predictive identification of exonic splicing enhancers in human genes, Science 297(2002),1007-1013. 7. L. J. Jensen, R. Gupta, N. Blom, D. Devos, J. Tamames, C. Kesmir, H. Nielsen, H. H. Staerfeldt, K. Rapacki, C. Workman, C. A. Andersen, S. Knudsen, A. Krogh, A. Valencia, and S. Brunak, Prediction of human protein function from post-translational modifications and localization features, Journal of Molecular Biology, 319(2002),1257-1265. d) Protein structure: 1. P. L.
Martelli, P. Fariselli, A. Krogh, and R. Casadio. 2. E. L.L.
Sonnhammer, G. von Heijne and A. Krogh, A hidden Markov model for predicting
transmembrane helices in protein sequences.
e) Phylogeny: 1. J.H.Huelsenbeck and R.Nielsen, Variation in the pattern of nucleotide substitution rate, Journal of Molecular Evolution, 48(2000), 86-93. 2. T.Mueller and M.Vingron, Modeling amino acid replacement, Journal of Computational Biology, 7(2000), 761-776. 3. J.A.Eisen, Phylogenomics: improving functional predictions by evolutionary analysis, Genome Research, 8(1998), 163-167.
f) Finding RNA genes: 1. T.Lowe and S.Eddy, A computational screen for methylation guide snoRNAs in yeast, Science 283(1999),1168-1171. 2. E.Rivas and S.R.Eddy, Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs, Bioinformatics 16(2000), 583-605. 3. E.Rivas, R.J.Klein, T.A.Jones and S.R.Eddy, Computational identification of noncoding RNAs in E. coli by comparative genomics, Current Biology 11(2001),1369-73. 4. R.J.Klein, Z.Misulovin and S.Eddy, Noncoding RNA genes identified in AT-rich hyperthermophiles, Proc. Nat. Acad. Sci. USA 99(2002), 7542-7547.
g) Attempted medical applications: 1. S.Karlin and C.Burge, Trinucleotide repeats and long homopeptides in genes and proteins associated with nervous system disease and development, Proc. Nat.Acad. Sci. USA 93 (1996), 1560-1565.
|
![]() |
![]() |
![]() |