BIOLOGICAL SEQUENCE ANALYSIS

Math/Stats 547 (Lecture) and 548 (Lab) MWF, 9-10 (1084 East Hall); Fri, 10-11 (B743 East Hall)

 

Syllabus

Assignments

Lab Worksheets

548 Resources

Web Resources

Term Project

Speaker Schedule

Outside Seminars

Contact Instructor


Under Construction!

[These are the background notes for the end of term project for 2003. They will be updated for 2004, but are left here for the general information of students in 2004.]

Recall the basic outlines of the project:

  • You can work in teams of two, or alone, if you prefer.
  • The papers and topics below are suggestions, you are not required to choose one of these.
  • You should let me know what topic you want to treat, so that we have only one report on any topic. (By March 17.)
  • Your team (or you) will present the topic to the class. I hope we can achieve two presentations for each class hour.
  • We will discuss the presentation requirements later. Please let me know if you are unable to have a laptop equipped with PowerPoint to make your presentations.
  • You should write a summary of about one page for the other students who will be listening to your presentation. This should be distributed to the group before you begin your presentation.
  • We will post a schedule of the talks and topics linked to this page beforehand, with links to your papers, if they are recent enough to be available online.
  • I have made links below for most papers. Some, however, were listed through ScienceDirect, in which case I have linked to the search menu for the journal in question. Similarly, some papers are on servers at laboratories, and there was a difficulty getting permission from outside to link directly to the paper's reprint. In these cases you will be left at the closest link for which I could get permission.
  • I have linked to the PDF version of a paper whenever available. This is for the convenience of the class as a whole. The speakers will likely want an HTML version of the paper so they can download appropriate figures from the article to illustrate their presentations.

It is certainly possible to use a paper for your project which is not online, but then we will have to have copies xeroxed for the other participants beforehand.

 

Suggested Topics and Papers:

a) Scoring and statistical methods:

1. S.Karlin and S.F. Altschul, Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes, Proc. Nat. Acad. Sci. USA 87(1990), 2264-2268.

2. R.Schwartz and Y.-L.Chow, The N-best algorithm: an efficient and exact procedure for finding the n most likely hypotheses. In Proceedings of ICAASP '90, 81-84.

3. L.R.Cardon, C.Burge, D.A.Clayton and S.Karlin, Pervasive CpG suppression in animal mitochondrial genomes, Proc. Nat. Acad. Sci. USA 91(1994), 3799-3803.

4. K. Sjölander, K. Karplus, M. Brown, R. Hughey, A. Krogh, I. S. Mian and D. Haussler, Dirichlet mixtures: A method for improved detection of weak but significant protein sequence homology, CABIOS, 12(1996), 327-345.

5. S.Govindarajan, R.Recabarren and R.A.Goldstein, Estimating the total number of protein folds, Proteins 35(1999), 408-414.

6. Bin Qian and R.A.Goldstein, Distribution of indel lengths, Proteins 45(2001), 102-104.

 

b) Multiple sequence alignment and analysis:

1. G.Hertz and G. Stormo, Identifying DNA and protein patterns with statistically significant alignments of multiple sequences, Bioinformatics 15(1999), 563-577.

2. A.Bateman, E.Birney, L.Cerruti, R.Durbin, L.Etwiller, S.Eddy, S.Griffiths-Jones, K.L.Howe, M.Marshall and E.L.Sonnhammer, The Pfam protein families database, Nucleic Acids Research 30(2002):276-80.

 

c) Gene finding, gene structure:

1. N.Miyajima, C.Burge and T.Saito, Computational and experimental analysis identifies many novel human genes, Biochemical & Biophysical Research Communications 272(2000), 801-807.

2. A. Krogh, Using database matches with HMMgene for automated gene detection in Drosophila, Genome Research 10(2000), 523-528.

3. M.Das, C.Burge, E.Park, J.Colinas and J. Pelletier, Assessment of the total number of human transcription units, Genomics 77(2001), 71-78.

4. L.P.Lim and C.Burge, A computational analysis of sequence features involved in recognition of short introns, Proc. Nat. Acad. Sci. USA 98(2001),11193-11198.

5. M. Skovgaard, L. J. Jensen, S. Brunak, D. Ussery and A. Krogh,
On the total number of genes and their length distribution in complete microbial genomes, Trends in Genetics 17(2001), 425-428.

6. W.Fairbrother, R.Yeh, P.Sharp and C.Burge, Predictive identification of exonic splicing enhancers in human genes, Science 297(2002),1007-1013.

7. L. J. Jensen, R. Gupta, N. Blom, D. Devos, J. Tamames, C. Kesmir, H. Nielsen, H. H. Staerfeldt, K. Rapacki, C. Workman, C. A. Andersen, S. Knudsen, A. Krogh, A. Valencia, and S. Brunak, Prediction of human protein function from post-translational modifications and localization features, Journal of Molecular Biology, 319(2002),1257-1265.

d) Protein structure:

1. P. L. Martelli, P. Fariselli, A. Krogh, and R. Casadio.
A sequence-profile-based HMM for predicting and discriminating beta barrel membrane proteins, Bioinformatics, 18(2002),S46-S53.

2. E. L.L. Sonnhammer, G. von Heijne and A. Krogh, A hidden Markov model for predicting transmembrane helices in protein sequences.
In J. Glasgow, T. Littlejohn, F. Major, R. Lathrop, D. Sankoff, and C. Sensen, editors, Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology, pages 175-182, Menlo Park, CA, 1998. AAAI Press.

 

e) Phylogeny:

1. J.H.Huelsenbeck and R.Nielsen, Variation in the pattern of nucleotide substitution rate, Journal of Molecular Evolution, 48(2000), 86-93.

2. T.Mueller and M.Vingron, Modeling amino acid replacement, Journal of Computational Biology, 7(2000), 761-776.

3. J.A.Eisen, Phylogenomics: improving functional predictions by evolutionary analysis, Genome Research, 8(1998), 163-167.

 

f) Finding RNA genes:

1. T.Lowe and S.Eddy, A computational screen for methylation guide snoRNAs in yeast, Science 283(1999),1168-1171.

2. E.Rivas and S.R.Eddy, Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs, Bioinformatics 16(2000), 583-605.

3. E.Rivas, R.J.Klein, T.A.Jones and S.R.Eddy, Computational identification of noncoding RNAs in E. coli by comparative genomics, Current Biology 11(2001),1369-73.

4. R.J.Klein, Z.Misulovin and S.Eddy, Noncoding RNA genes identified in AT-rich hyperthermophiles, Proc. Nat. Acad. Sci. USA 99(2002), 7542-7547.

 

g) Attempted medical applications:

1. S.Karlin and C.Burge, Trinucleotide repeats and long homopeptides in genes and proteins associated with nervous system disease and development, Proc. Nat.Acad. Sci. USA 93 (1996), 1560-1565.

 

Return to class home page.