CS580 Bioinformatics Fall 2001
Sept
10.
1.
Introduction SIMON
·
What
is Bioinformatics ? How does it relate
to Computational Biology?
Why study it?
·
Overview
of the course
Computational
Problems to be solved:
1. Homology, alignments and phylogeny
2. Mapping
3. Gene Finding
4. Determinants of protein structure
5. Identification of genetic
structural/metabolic lesions from gene expression data
Strategies
Computational Molecular Biology vs
BioInformatics
Strings vs Large Data Arrays
Review of Probability LAZAR
One event- Simple probability
* Densities and
distributions
Uniform,
Exponential family (including
Exponential, Gaussian,
Dirichlet/Beta)
* The Shannon entropy,
Minimal description length
More than one event- Joint probability
* Conditional and Unconditional (independent) Probability
* The Markov assumption
Finite Markov
chains, Transition probabilities, Simplifying
properties of the
Markov assumption
* Bayes theorem
·
Missing
parameters- Expectation Maximization
2.
OPTIONAL Biology MiniCourse Part
1. MOSAVI
OPTIONAL:
Mon Sept 17 6:30-7:30pm Biology MiniCourse Part 2 MOSAVI
STRINGS
Sept
17. Begin
7:30 pm
3. BIOLOGY REVISIT
Review of key problems in biology requiring solutions based on algorithms
on strings SIMON
·
The
genetic code
·
The
informational structure of prokaryotes and eukaryotes
o Introns,Exons,ORFs
o Starts, stops, splices, CpG regions
4. Algorithms in this course SIMON
·
The
computational complexities/considerations of algorithms particularly those
applied to Computational Molecular Biology
·
Nondeterministic
Polynomial time solutions of problems
·
The
Satisfiability Problem –NP Complete
·
Cook’s
Theorem: Polynomial time mapping to SAT
·
NP
Hard Problems: the TSP, the partition problem, the shortest
common superstring
o Mapping the
double-digest problem to an NP complete problem
·
Review
NP Hard techniques -Solution vs.
Heuristic vs.
Approximation
+ Branch and
Bound
+ Minimum
Spanning trees
+ Genetic
algorithms
+ Simulated
Annealing
-Special Cases-Dynamic Programming
GENOMICS
Sept
24.
5. Algorithms on Strings SIMON
·
Pattern
recognition as applied to strings – matching vs alignment
-Finite Automaton models of string searching
-Hashing functions, prime numbers and hashing schemes for
searching
Regular Expressions and RE processing engines
-Suffix Trees (very brief intro)
BIOLOGICAL
APPLICATION OF STRING ALGORITHMS:
ALIGNMENT-SIMILARITY
AND HOMOLOGY SEARCHING
6.
Sequence Alignment SIMON
* Rationale for aligning
* Dynamic Programming
Needleman Wuncsh and Smith Waterman
algorithms –dynamic
programming in action
* Sequence alignment
o Global alignment
o Local alignment and gaps
Multiple sequence alignment
* Sum of pairs scoring scheme
* Heuristics- Star alignment
Oct
1
7. BIOLOGY REVISIT
SCHILLER
Cloning
* Molecular Cloning-uses
* Practicalities of cloning DNA and cDNA
o Libraries
o Amplifying DNA- the polymerase chain
reaction
o Electronic Cloning
* Subcloning MOSAVI
o Vectors
o Enzymes
o Purification
o DNA sequencing
8. Gene Expression
* Promotors, Regulation
* Mutations
* Low Complexity/Repetitive Sequences
Oct
8
9.
Homology SIMON
*
* Introduction to substitution matrices
Annotated sequences
* Why annotate and how?
* Survey of ‘the big’ annotated databases
* Searching the big databases-Introduction
to BLAST and FastA
10.BLAST SIMON
* What BLAST does
* How it works
* Substitution matrices- continued
o Point Accepted Mutations
o Blocks and Prosite
·
Extreme
Value Distribution and BLAST statistics
·
Oct
15
11.
FASTA-how it works and why use it SIMON
12
Pattern Discovery SIMON
Suffix
Trees: Some thoughts about a new searching tool
MEME
Oct
22
13. PHYLOGENY SIMON
* About trees
* Character Based Phylogenies-Parsimony,
Fitch's algorithm
Distance Based Phylogenies- Unweighted
Pair Group Method with
Arithmetic Mean
MAPPING
GENOMES – SEQUENCING DNA
14.
Genetic Maps
* Physical Maps SIMON
o Restriction mapping: an NP complete algorithm
o Hybridization mapping.
Oct
29
15. Genetic Maps continued -MOSAVI
* DNA Sequencing
* Fragment Assembly-Shotgun sequencing
* The Human Genome Project
DETECTING
GENES IN SEQUENCES OF DNA
16.
Gene finding SIMON
* Prokaryotes: open reading frames
* ORF finding-problems in Eukaryotes
o automata (HMMs) vs dynamic
programming
* Markov chains for sequence recognition,
intro to Hidden Markov Model
o HMMs for splice and gene finding
Nov
5
17. The
Hidden Markov Model for motif recognition SIMON
* Finite state nondeterministic automata
o The Viterbi algorithm
o EM-the Baum Welch algorithm
PROTEINS
18
Machine learning – overview KJELL
o
Representation
of Data
o
Decision
Tree as an example
Detection vs discovery
o Supervised vs unsupervised machine
learning
Nov
12
17. BIOLOGY REVISIT Protein structure MOSAVI
* Review of primary, secondary, tertiary
and quaternary structure
* Sequence vis à vis structure
o Homology
o Analogy
* Protein folding –overview MOSAVI,SIMON
·
The
importance of a motif in structure prediction
·
Algorithms
for motif recognition-Hidden Markov models, joint
probability models, decision trees
·
Schemes
based on description length/ algorithmic discovery
18.
Classification of Protein Folds KING
* types of protein folds and protein
domains
* protein fold databases (SCOP, FSSP, CATH,
PDB)
* searching structural databases for 3D
homologs (DALI)
Nov
19
19.
Principle Component Analysis
JIN
BIOINFORMATICS AND LARGE DATA
ARRAYS OF BIOLOGICAL INFORMATION
20.
Analysis of Gene Expression - Gene Arrays
ROWE
·
BIOLOGY REVISIT cDNA, ESTs and the human genome
·
commercial
chips(Affymetrix et al), home made
chips
·
Nov
28
21.
Clustering and Pattern Discovery SIMON
Agglomeration vs Division
·
Canonical
Discriminant Analysis
·
Other
Pattern Analysis schemes
o
K-means
(k-medioids)
22.
Pattern Discovery in clusters
LAZAR
·
Neural
Nets
·
Self-organizing
maps
·
Genetic
programming for pattern discovery
Dec
3
23.
Bioinformatics and the classification of disease SIMON
·
Pattern
Recognition: the leukaemia model
24. Bioinformatics and drug discovery -MOSAVI
·
Drug
Discovery: Docking Models
Dec
10 Oral
Presentation of Final Projects
OPTIONAL
BIOLOGY MINICOURSE
Mon
Sept 10, 8:30-9:30pm
Part 1:
Proteins
·
Chemical
Bonds
·
Proteins
Amino acids: structure and
properties. Protein primary
structure, secondary structure,
secondary superstructure,
folding, tertiary structure,
quaternary structure
Mon
Sept 17, 6:30-7:30 pm
Part 2:
Classical and molecular theories of inheritance
·
Chromosomes
and classical inheritance
·
Mitosis
and Meiosis
·
DNA
and RNA structure
The
Central Dogma of Molecular Genetics
·
Replication
·
Transcription
- TATA Box
·
Ribosomes,
Splicing
·
Translation
·
The
genetic code
Required
Texts:
Primer
on Molecular Genetics (Download form Dept of Energy)
Introduction to Computational
Molecular Biology,
J.Setubal and J.Meidanis, PWS Publishing Company, ISBN: 0534952623
Data Mining Practical Machine Learning Tools and
Techniques with JAVA Implementation. IH Witten and E Frank, Morgan Kaufmann Publishers, ISBN
1-55860-552-5