(Please refer to your own copy of Introduction of Bioinformatics, by Arthur M. Lesk. All the following examples would be found from his book.
Example 1.1
Retrieve the amino acid sequence of horse pancreatic ribonuclease.
Use the ExPASy server at the Swiss Institute for Bioinformatics: The URL is: http://www.expasy.ch/cgi-bin/sprotsearch-
ful. Type in the keywords horse pancreatic ribonuclease followed by the ENTER key. Select
RNP_HORSE and then FASTA format (see Box: FASTA format). This will produce the following (the first line has been truncated):
>sp|P00674|RNP_HORSE RIBONUCLEASE PANCREATIC (EC 3.1.27.5) (RNASE 1) ...
KESPAMKFERQHMDSGSTSSSNPTYCNQMMKRRNMTQGWCKPVNTFVHEP
LADVQAICLQKNITCKNGQSNCYQSSSSMHITDCRLTSGSKYPNCAYQTS
QKERHIIVACEGNPYVPVHFDASVEVST
which can be cut and pasted into other programs
For example, we could retrieve several sequences and align them (see Box: Sequence Alignment). Analysis of patterns of similarity among aligned sequences are useful properties in assessing closeness of relationships.
FASTA format
A very common format for sequence data is derived from conventions of FASTA, a program for FAST
Alignment by W.R. Pearson. Many programs use FASTA format for reading sequences, or for reporting
results.
A sequence in FASTA format:
Begins with a single-line description. A > must appear in the first column. The rest of the title line is
arbitrary but should be informative.
Subsequent lines contain the sequence, one character per residue.
Use one-letter codes for nucleotides or amino acids specified by the International Union of
Biochemistry and International Union of Pure and Applied Chemistry (IUB/IUPAC).
See http://www.chem.qmw.ac.uk/iupac/misc/naabb.html
and http://www.chem.qmw.ac.uk/iupac/AminoAcid/
Use Sec and U as the three-letter and one-letter codes for selenocysteine:
http://www.chem.qmw.ac.uk/iubmb/newsletter/1999/item3.html
Lines can have different lengths; that is, 'ragged right' margins.
Most programs will accept lower case letters as amino acid codes.
An example of FASTA format: Bovine glutathione peroxidase
>gi|121664|sp|P00435|GSHC\_BOVIN GLUTATHIONE PEROXIDASE
MCAAQRSAAALAAAAPRTVYAFSARPLAGGEPFNLSSLRGKVLLIENVASLUGTTVRDYTQMNDLQR
RLG
PRGLVVLGFPCNQFGHQENAKNEEILNCLKYVRPGGGFEPNFMLFEKCEVNGEKAHPLFAFLREVLP
TPS
DDATALMTDPKFITWSPVCRNDVSWNFEKFLVGPDGVPVRRYSRRFLTIDIEPDIETLLSQGASA
The title line contains the following fields:
> is obligatory in column 1
gi|121664 is the geninfo number, an identifier assigned by the US National Center for Biotechnology
Information (NCBI) to every sequence in its ENTREZ databank. The NCBI collects sequences from a variety of sources, including primary archival data collections and patent applications. Its gi numbers provide a common and consistent 'umbrella' identifier, superimposed on different conventions of source databases.
When a source database updates an entry, the NCBI creates a new entry with a new gi number if the
changes affect the sequence, but updates and retains its entry if the changes affect only non-sequence
information, such as a literature citation.
sp|P00435 indicates that the source database was SWISS-PROT, and that the accession number of the
entry in SWISS-PROT was P00435.
GSHC_BOVIN GLUTATHIONE PEROXIDASE is the SWISS-PROT identifier of sequence and species,
(GSHC_BOVIN), followed by the name of the molecule.
Sequence alignment
Sequence alignment is the assignment of residue-residue correspondences. We may wish to find:
a Global match: align all of one sequence with all of the other.
And.--so,.from.hour.to.hour,.we.ripe.and.ripe
|||| ||||||||||||||||||||||||| ||||||
And.then,.from.hour.to.hour,.we.rot-.and.rot-
This illustrates mismatches, insertions and deletions.
a Local match: find a region in one sequence that matches a region of the other.
My.care.is.loss.of.care,.by.old.care.done,
||||||||| ||||||||||||| |||||| ||
Your.care.is.gain.of.care,.by.new.care.won
For local matching, overhangs at the ends are not treated as gaps. In addition to mismatches, seen in
this example, insertions and deletions within the matched region are also possible.
a Motif match: find matches of a short sequence in one or more regions internal to a long
one. In this case one mismatching character is allowed. Alternatively one could demand
perfect matches, or allow more mismatches or even gaps.
match
||||
for the watch to babble and to talk is most tolerable
or:
match
||||
Any thing that's mended is but patched: virtue that transgresses is
match match
|||| ||||
but patched with sin; and sin that amends is but patched with virtue
a Multiple alignment: a mutual alignment of many sequences.
no.sooner.---met.---------but.they.-look'd
no.sooner.look'd.---------but.they.-lo-v'd
no.sooner.lo-v'd.---------but.they.-sigh'd
no.sooner.sigh'd.---------but.they.--asked.one.another.the.reason
no.sooner.knew.the.reason.but.they.-------------sought.the.remedy
no.sooner. .but.they.
The last line shows characters conserved in all sequences in the alignment.
Example 1.2
Determine, from the sequences of pancreatic ribonuclease from horse (Equus caballus), minke whale
(Bolaenoptera acutorostrata) and red kangaroo (Macropus rufus), which two of these species are most closely related.
Knowing that horse and whale are placental mammals and kangaroo is a marsupial, we expect horse and whale to be the closest pair. Retrieving the three sequences as in the previous example and pasting the following:
>RNP_HORSE
KESPAMKFERQHMDSGSTSSSNPTYCNQMMKRRNMTQGWCKPVNTFVHEP
LADVQAICLQKNITCKNGQSNCYQSSSSMHITDCRLTSGSKYPNCAYQTS
QKERHIIVACEGNPYVPVHFDASVEVST
>RNP_BALAC
RESPAMKFQRQHMDSGNSPGNNPNYCNQMMMRRKMTQGRCKPVNTFVHES
LEDVKAVCSQKNVLCKNGRTNCYESNSTMHITDCRQTGSSKYPNCAYKTS
QKEKHIIVACEGNPYVPVHFDNSV
>RNP_MACRU
ETPAEKFQRQHMDTEHSTASSSNYCNLMMKARDMTSGRCKPLNTFIHEPK
SVVDAVCHQENVTCKNGRTNCYKSNSRLSITNCRQTGASKYPNCQYETSN
LNKQIIVACEGQYVPVHFDAYV
into the multiple-sequence alignment program CLUSTAL-W http://www.ebi.ac.uk/clustalw/
(or alternatively, T-coffee: http://www.ch.embnet.org/software/TCoffee.html)
produces the following:
CLUSTAL W (1.8) multiple sequence alignment
RNP_HORSE KESPAMKFERQHMDSGSTSSSNPTYCNQMMKRRNMTQGWCKPVNTFVHEPLADVQAICLQ
60
RNP_BALAC
RESPAMKFQRQHMDSGNSPGNNPNYCNQMMMRRKMTQGRCKPVNTFVHESLEDVKAVCSQ 60
RNP_MACRU -ETPAEKFQRQHMDTEHSTASSSNYCNLMMKARDMTSGRCKPLNTFIHEPKSVVDAVCHQ
59
*:** **:*****: :......*** ** *.**.* ***:***:**. *.*:* *
RNP_HORSE KNITCKNGQSNCYQSSSSMHITDCRLTSGSKYPNCAYQTSQKERHIIVACEGNPYVPVHF
120
RNP_BALAC KNVLCKNGRTNCYESNSTMHITDCRQTGSSKYPNCAYKTSQKEKHIIVACEGNPYVPVHF 120
RNP_MACRU ENVTCKNGRTNCYKSNSRLSITNCRQTGASKYPNCQYETSNLNKQIIVACEG-QYVPVHF 118
:*: ****::***:*.* : **:** *..****** *:**: :::******* ******
RNP_HORSE DASVEVST 128
RNP_BALAC DNSV---- 124
RNP_MACRU DAYV---- 122
* *
In this table, an * under the sequences indicates a position that is conserved (the same in all sequences), and : and . indicate positions at which all sequences contain residues of very similar physicochemical character (:), or somewhat similar physicochemical character (.).
Large patches of the sequences are identical. There are numerous substitutions but only one internal deletion.
By comparing the sequences in pairs, the number of identical residues shared among pairs in this alignment (not the same as counting *s) is:
Number of identical residues in aligned Ribonuclease A sequences (out of a total of 122–128 residues)
Horse and Minke whale 95
Minke Whale and Red kangaroo 82
Horse and Red kangaroo 75
Horse and whale share the most identical residues. The result appears significant, and therefore confirms our expectations.
Warning: Or is the logic really the other way round?
Let's try a harder one:
Example 1.3
The two living genera of elephant are represented by the African elephant (Loxodonta africana) and the Indian (Elephas maximus). It has been possible to sequence the mitochondrial cytochrome b from a specimen of the Siberian woolly mammoth (Mammuthus primigenius) preserved in the Arctic permafrost. To which modern elephant is this mammoth more closely related?
Retrieving the sequences and running CLUSTAL-W:
African elephant MTHIRKSHPLLKIINKSFIDLPTPSNISTWWNFGSLLGACLITQILTGLFLAMHYTPDTM 60
Siberian mammoth MTHIRKSHPLLKILNKSFIDLPTPSNISTWWNFGSLLGACLITQILTGLFLAMHYTPDTM 60
Indian elephant MTHTRKSHPLFKIINKSFIDLPTPSNISTWWNFGSLLGACLITQILTGLFLAMHYTPDTM 60
*** ******:**:**********************************************
African elephant TAFSSMSHICRDVNYGWIIRQLHSNGASIFFLCLYTHIGRNIYYGSYLYSETWNTGIMLL 120
Siberian mammoth TAFSSMSHICRDVNYGWIIRQLHSNGASIFFLCLYTHIGRNIYYGSYLYSETWNTGIMLL 120
Indian elephant TAFSSMSHICRDVNYGWIIRQLHSNGASIFFLCLYTHIGRNIYYGSYLYSETWNTGIMLL 120
************************************************************
African elephant LITMATAFMGYVLPWGQMSFWGATVITNLFSAIPCIGTNLVEWIWGGFSVDKATLNRFFA 180
Siberian mammoth LITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTDLVEWIWGGFSVDKATLNRFFA
180
Indian elephant LITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLVEWIWGGFSVDKATLNRFFA 180
********************************** ***:*********************
African Elephant LGLMPLLHTSKHRSMMLRPLSQVLFWTLTMDLLTLTWIGSQPVEYPYIIIGQMASILYFS 360
Siberian mammoth LGIMPLLHTSKHRSMMLRPLSQVLFWTLATDLLMLTWIGSQPVEYPYIIIGQMASILYFS 360
Indian elephant LGLMPFLHTSKHRSMMLRPLSQVLFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFS 360
**:**:**********************: *** ************* ************
African elephant IILAFLPIAGMIENYLIK 378
Siberian mammoth IILAFLPIAGMIENYLIK 378
Indian elephant IILAFLPIAGMIENYLIK 378
**********:*******
The mammoth and African elephant sequences have 10 mismatches, and the mammoth and Indian elephant sequences have 14 mismatches. It appears that mammoth is more closely related to African elephants.
However, this result is less satisfying than the previous one. There are fewer differences. Are they significant? (It is harder to decide whether the differences are significant because we have no preconceived idea of what the answer should be.)
This example raises a number of questions:
1. We 'know' that African and Indian elephants and mammoths must be close relatives - just look
at them. But could we tell from these sequences alone that they are from closely related
species?
2. Given that the differences are small, do they represent evolutionary divergence arising from
selection, or merely random noise or drift? We need sensitive statistical criteria for judging the
significance of the similarities and differences.
As background to such questions, let us emphasize the distinction between similarity and homology. Similarity is the observation or measurement of resemblance and difference, independent of the source of the resemblance. Homology means, specifically, that the sequences and the organisms in which they occur are descended from a common ancestor, with the implication that the similarities are shared ancestral characteristics. Similarity of sequences (or of macroscopic biological characters) is observable in data collectable now, and involves no historical hypotheses. In contrast, assertions of homology are statements of historical events that are almost always unobservable. Homology must be an inference from observations of similarity. Only in a few special cases is homology directly observable; for instance in family pedigrees showing unusual phenotypes such as the Hapsburg lip, or in laboratory populations, or in clinical studies that follow the course of viral infections at the sequence level in individual patients.
The assertion that the cytochromes b from African and Indian elephants and mammoths are homologous
means that there was a common ancestor, presumably containing a unique cytochrome b, that by alternative mutations gave rise to the proteins of mammoths and modern elephants. Does the very high degree of similarity of the sequences justify the conclusion that they are homologous; or are there other explanations?
It might be that a functional cytochrome b requires so many conserved residues that cytochromes b
from all animals are as similar to one another as the elephant and mammoth proteins are. We
can test this by looking at cytochrome b sequences from other species. The result is that
cytochromes b from other animals differ substantially from those of elephants and mammoths.
A second possibility is that there are special requirements for a cytochrome b to function well in an
elephant-like animal, that the three cytochrome b sequences started out from independent
ancestors, and that common selective pressures forced them to become similar. (Remember that
we are asking what can be deduced from cytochrome b sequences alone.)
The mammoth may be more closely related to the African elephant, but since the time of the last
common ancestor the cytochrome b sequence of the Indian elephant has evolved faster than that
of the African elephant or the mammoth, accumulating more mutations.
Still a fourth hypothesis is that all common ancestors of elephants and mammoths had very
dissimilar cytochromes b, but that living elephants and mammoths gained a common gene by
transfer from an unrelated organism via a virus.
Suppose however we conclude that the similarity of the elephant and mammoth sequences is taken to be high enough to be evidence of homology, what then about the ribonuclease sequences in the previous example?
Are the larger differences among the pancreatic ribonucleases of horse, whale and kangaroo evidence that they are not homologues?
How can we answer these questions? Specialists have undertaken careful calibrations of sequence similarities and divergences, among many proteins from many species for which the taxonomic relationships have been worked out by classical methods. In the example of pancreatic ribonucleases, the reasoning from similarity to homology is justified. The question of whether mammoths are closer to African or Indian elephants is still too close to call, even using all available anatomical and sequence evidence. Analyses of sequence similarities are now sufficiently well established that they are considered the most reliable methods for establishing phylogenetic relationships, even though sometimes - as in the elephant example - the results may not be significant, while in other cases they even give incorrect answers. There are a lot of data available, effective tools for retrieving what is necessary to bring to bear on a specific question, and powerful analytic tools. None of this replaces the need for thoughtful scientific judgement.
Kamat's Online teaching resources
Saturday, July 17, 2010
M.Sc. Part I >BOO-102 Bioinformatics and Chemoinformatics syllabus
By Dr. Nandkumar M. Kamat
Bioinformatics and Chemoinformatics :-Theory and practice
(Two modules, 2 Theory + 2 Practical credits, Total 30+30 =60 hours)
Objectives: Module I Bioinformatics
Understand the nature of biological data and need for Biological databases , explore the major biomolecular sequence databases (organization and contents) and their respective search engines and database appreciate the need and significance of sequence analysis and the bioinformatics approaches for the same, application of software analysis tools to sequence data, Understanding the levels of structure organization of macromolecules and related methods of structure determination, Knowledge of various methods of structure prediction, Understanding the interaction between macromolecules, To know the approaches for structure analysis, To get basic understanding of molecular modeling and its application, Understanding the levels of structure organization of macromolecules and related methods of structure determination, Understanding the interaction between macromolecules.
Objectives: Module II Chemoinformatics
1. Introduce major aspects of chemoinformatics, with particular emphasis on applications in modern drug discovery.
2. Provide hands-on experience in chemical enumeration, creation of databases and analysis of chemicals.
3. Develop tools that aid in the design the drugs
Theory Two modules, (15 X 2=30 hours) Two credits
Syllabus:
Module I
Bioinformatics Theory (15 hours) , One credit
1. Introduction to Bioinformatics , Nature of biological data, Overview of available Bioinformatics resources on the web, NCBI/EBI/EXPASY etc (3)
2. Biological Databases: Nucleic acid sequence databases, GenBank/EMBL/DDBJ
Protein sequence databases, SwissProt, UniProtKB, Genome databases-OMIM,
structural databases, PDB, NDB, CCSD, drived databases Prosite, BLOCKS, Pfam/Prodom, Database search engines, Entrez , SRS (3)
3. Overview/concepts in sequence analysis, Pairwise sequence alignment algorithms, Scoring matrices for Nucleic acids and proteins ,Database Similarity Searches –BLAST, FASTA
Multiple sequence alignment, PRAS, CLUSTALW (3)
4. DNA and Protein Microarrays (1)
5. Macromolecular Structure and Overview of molecular modeling Protein - Primary, Secondary, Supersecondary, Tertiary and Quaternary structure, Nucleic acid – DNA and RNA, Carbohydrates, 3D Viral structures, Methods to study 3D structure, Analysis of 3D structures (2)
6. Principles of protein folding and methods to study protein folding (1)
7. Maromolecular interactions , Protein – Protein, Protein – Nucleic acids , Protein – carbohydrates (1)
8.Introduction to Molecular modelling methods (1)
Module II
Chemoinformatics:-Theory 15 hours, one credit
1. Role of Chemoinformatics in pharmaceutical/chemical research, Integrated databases, HTS analysis, Ligand based design of compounds, Structure based design of compounds (2)
2. Overview of Structure representation systems, 2D and 3D structures, General introduction to chemical structure-hybridization, tetrahedron geometry etc, The degeneracy of isomeric SMILES and introduction to unique SMILES, Internal co-ordinates and introduction to calculation of Z matrix of simple small organic molecules. (2)
3. Chemical Databases – Design, Storage and Retrieval methods (1)
4. Introduction to database filters, property based & (drug-like)-Lipinski Rule of Five (1)
5. Search techniques, similarity searches and clustering (1)
6. Modeling of small molecules and methods for interaction mapping (1)
7. Characterization of chemicals by Class & by Pharmacophore, application in
HTS Analysis (1)
8. Introduction to pharmocophore, Identification of pharmacophore features, Building pharmacophore hypothesis, Searching databases using pharmacophores (2)
9. Overview of Quantitative Structure Activity Relationship & application to Hit to lead optimization (1)
10. Chemoinformatics tools for drug discovery-Integration of active drugs ,Optimization techniques , Filtering chemicals, In silico ADMET; QSAR approach, Knowledge-based approach (3)
Practicals (15 X 2 =30 hours) Total Two modules, Two credits
Bioinformatics Practicals
Module I:- 15 hours, One credit
Syllabus:
1. Exploring NCBI database system, querying the PUBMED and GenBank databases , EBI server and searching the EMBL Nucleotide database, Exploring & querying SWISSPROT & UniProtKB (2)
2. Pair-wise global alignments of protein and DNA sequences using Needleman-Wunsch algorithm & interpretation of results to deduce homology between the sequences, use of scoring matrices, Pair-wise local alignments of protein and DNA sequences using Smith-Waterman algorithm and interpretation of results (2)
3. Database (homology) searches using different versions of BLAST and FASTA and interpretation of the results to derive the biologically significant relationships of the query sequences (proteins/DNA) with the database sequences (3)
4. Multiple sequence alignments of sets of sequences using web-based and stand-alone version of CLUSTAL. Interpretation of results to identify conserved and variable regions and correlate them with physico-chemical & structural properties (2)
5. Exploring and using the derived databases: PROSITE, PRINTS, BLOCKS, Pfam and Prodom for pattern searching, domain searches etc. (1)
6. Search & retrieval: genomic and OMIM data at NCBI server, Exploring the Database & searches on PDB and CSD, WHATIF, Interpreting DNA and Protein microarray data (1)
7. Studying the format & content of structural databases, Molecular visualization tools :-Visualization of tertiary structures, quaternary structures, architectures and topologies of proteins and DNA using molecular visualization softwares such as RasMol, Cn3D, SPDBV, Chime, Mol4D, etc. (2)
8. Structure prediction tools and homology modeling, Comparison of the performance of the different methods for various classes of proteins, Prediction of tertiary structures of proteins using Homology Modeling approach: SWISSMODEL, SWISS-PDB Viewer (2)
Chemoinformatics Practicals
Module II:- 15 hours, One credit
Syllabus:
1. Introduction to basic chemoinformatics software/tools-ACDsketch, Chemsketch, Jchem VegaZZ etc. NCL’s moltable , Chembiofinder (3)
2. Importance of storing chemical in the form of graph, linear notation (SMILES,WLN, ROSDAL-with special emphasis on SMILES and stereochemistry- both optical and geometrical isomerism), connection tables-sd and mol files. (2)
3. Importance of 3D structure and methods available for 3D structure generation- CORINA and CONCORD (2)
4. A brief introduction to database (ISIS Base) with special emphasis on the storage of chemical in the database format. (2)
5. Substructure searching and general property calculation-rotatable bonds, hydrogen bond donor, hydrogen bond acceptor, molecular weight, molecular refractivity, molecular volume, surface area and polar surface area. (3)
6. Representing SMARTS, Recursive and Component level SMARTS and linear representation of chemical concepts like Pka, pH, Zwitterions, Functional Groups, Aromaticity (2)
7. Molecular docking-Drug docking basics (1)
References:
Module I
Bioinformatics
1. Bioinformatics: A Practical Guide to the analysis of Genes and Proteins (2nd Ed.) by Baxevanis, A.D. & Ouellettee, B., F. F., New York, John Wiley & Sons, Inc. Publications, 2002.
2. Introduction to Bioinformatics by Attwood, T.K. & Parry-Smith, D.J., Delhi, Pearson Education (Singapore) Pte.Ltd., 2001.
3. Bioinformatics: Sequence and Genome Analysis by Mount, David, New York, Cold Spring Harbor Laboratory Press, 2004.
4. Current Protocols in Bioinformatics by Baxevanis, A.D., Davison, D.B., Page, R. D. M. & Petsko, G.A., New York, John Wiley & Sons Inc., 2004.
5. Structural Bioinformatics - Methods of biochemical Analysis V. 44 by Philip E. Bourne (Editor), Helge Weissig (Editor) New Jersey. Wiley-Liss, 2003.
6. Principles of protein X-ray Crystallography by Jan Drenth, Springer-Verlag, 1994.
7. Introduction to Protein Structure by Branden, Carl & Tooze, John, Garland Publishing, 1991.
8. Molecular Modeling: Principles and Applications by Andrew Leach, Prentice Hall, 2001.
9. Computational methods for protein folding : advances in chemical physics vol. 120 by Friesner, R.A. Ed., Prigogine, L. Ed. & Rice, S.A.New York. John wiley & sons, Inc. publication, 2002.
10. Dynamics of Proteins and Nucleic Acids by J.A. McCammon and S.C. Harvey Cambridge University Press, 1087.
11. Protein Structure: A Practical approach by Creighton T. E., 1989.
12. Protein Folding by Creighton T., 1992.
13. Protein Structure Prediction: A practical approach by Sternberg M.J.E., 1996.
14. Molecular Modeling: Basic Principles and application by Hans Dieter and Didier Rognan. Wiley VeH Gmbh and Co. KGA, 2003.
15. Prediction of protein structure and the principles of protein conformation by Fasman, G.D. New York. Plenum Press, 1989.
16. Protein modules in cellular signaling edited by Heilmeyer, L. & Friedrich, P. Amsterdam . IOS Press, 2001.
17. Metal sites in proteins and models by Hill, H.A.O., Sadler, P.J. & Thomson, A.J Berlin. Springer, 1999.
18. Protein structure prediction: methods and protocols by Webster, D. M., Ed. Totowa Humana Press, 2000.
19. Modular protein domains by Gimona, G. Cesareni. & Yaffe, M. Sudol ( EDS. ), USA., Wiley-vch verlag gmbh & co.,3-527-30813-X , Aug. 2004.
20. Molecular modeling: basic principles and applications by Holtje, H.D. & Folkers, G., Weinheim, VCH, 1997.
21. Molecular Modeling: Basic Principles and application by Hans Dieter & Didier Rognan, Wiley VeH Gmbh and Co. KGA, 2003.
22. Arthur M. Lesk ( 2003) Introduction to Bioinformatics, Oxford University Press, Indian edition
23. Des Higgins and Willie Taylor (2000). Bioinformatics, Sequence, structure and databanks. A practical approach. Oxford University Press, Indian edition, Second impression, New Delhi
24. Imtiaz Alam Khan (2005). Elementary bioinformatics. Pharma Book Syndicate, Hyderabad
25. Irfan Ali Khan and Attiya Khanum (eds.) ( 2002). Emerging trends in Bioinformatics. Ukaaz Publications, Hyderabad
26. Irfan Ali Khan and Attiya Khanum (eds.) (2005). Basic concepts of Bioinformatics, Ukaaz Publications, Hyderabad
27. Irfan Ali Khan and Attiya Khanum (eds.) (2004). Introductory Bioinformatics. Ukaaz Publications, Hyderabad
28. Krane Dan, E. and Raymer M.L. (2004). Fundamental concepts of Bioinformatics. Pearson education. New Delhi. Second Indian reprint.
29. Rastogi, S.C., Medirattta, N., Rastogi. P. (2004), Bioinformatics, methods and applications, genomics, proteomics and drug discovery, Prentice hall of India, pvt. Ltd., New Delhi
30. Stephen Misener and Stephen Krawetz (eds.) (2004) Bioinformatics, methods and protocols, methods in molecular biology, Volume 132, Humana Press, New Jersey, Third Indian reprint
31. T K Atwood and D J Parry-Smith (2004) Introduction to Bioinformatics, Pearson education, New Delhi
32. Xiong, Jin (2006) Essential bioinformatics, Cambridge university press
References
Module II
Chemoinformatics
1. Chemoinformatics: Theory, Practice & Products (2009) Barry A. Bunin , Brian Siesel, Guillermo Morales Jürgen Bajorath, Springer
2. Pharmaceutical Data Mining: Approaches and Applications for Drug Discovery (2009) Konstantin V. Balakin Sean Ekins (Series Editor) , Wiley
3. Chemoinformatics: An Approach to Virtual Screening (2008), Alexandre Varnek Alexander Tropsha (Editor) , Royal Society of Chemistry
4. Chemoinformatics: Concepts, Methods, and Tools for Drug Discovery (Methods in Molecular Biology), (2004) J. bajorath (ed.) Humana Press
5. Chemoinformatics (2004) Johann Gasteiger and Thomas Engel.
6. An introduction to Chemoinformatics (2003) Andrew R. Leach and Valerie J. Gillet, Kluwer Academic Publisher,
7. Handbook of Chemoinformatics. From Data to Knowledge 92003). Johann Gasteiger.
8. Chemometrics and Chemoinformatics (2005) Barry K. Lavine, ACS Symposium series 894.
9. Molecular modelling and prediction of bioactivity (2000) by Gundertofte, K. & Jorgensen, F.S. New York. Kluwer academic publishers.
Addresses of public domain database/tools/resources/ free ware websites
1. DBGET-http://www.genome.jp/dbget/
2. LinkDB-http://www.genome.jp/dbget/linkdb.html
3. Fgenes-http://www.softberry.com/berry.phtml?topic=products
4. GeneBuilder-http://www.itb.cnr.it/sun/webgene/
5. GeneSCAN-http://genes.mit.edu/GENSCAN.html
6. GRAIL-http://compbio.ornl.gov/Grail-1.3/
7. CLC Free Workbench http://www.clcbio.com/index.php?id=28
8. BioEditor-http://bioeditor.sdsc.edu/
9. CN3D 4.1 -http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml
10. Protein-Explorer-http://www.umass.edu/microbio/chime/pe_beta/pe/protexpl/frntdoor.htm
11. Chimera-http://www.cgl.ucsf.edu/chimera/
12. Yasara-http://www.yasara.comhttp://www.yasara.com)
13. Ribosome builder-http://rbuilder.sourceforge.net/
14. ArrayExpress-www.ebi.ac.uk/arrayexpress/
15. EPICLUST-http://ep.ebi.ac.uk/EP/
16. RasMOL-http://www.umass.edu/microbio/rasmol/
17.CHIME-www.mdl.com/chime/
Bioinformatics and Chemoinformatics :-Theory and practice
(Two modules, 2 Theory + 2 Practical credits, Total 30+30 =60 hours)
Objectives: Module I Bioinformatics
Understand the nature of biological data and need for Biological databases , explore the major biomolecular sequence databases (organization and contents) and their respective search engines and database appreciate the need and significance of sequence analysis and the bioinformatics approaches for the same, application of software analysis tools to sequence data, Understanding the levels of structure organization of macromolecules and related methods of structure determination, Knowledge of various methods of structure prediction, Understanding the interaction between macromolecules, To know the approaches for structure analysis, To get basic understanding of molecular modeling and its application, Understanding the levels of structure organization of macromolecules and related methods of structure determination, Understanding the interaction between macromolecules.
Objectives: Module II Chemoinformatics
1. Introduce major aspects of chemoinformatics, with particular emphasis on applications in modern drug discovery.
2. Provide hands-on experience in chemical enumeration, creation of databases and analysis of chemicals.
3. Develop tools that aid in the design the drugs
Theory Two modules, (15 X 2=30 hours) Two credits
Syllabus:
Module I
Bioinformatics Theory (15 hours) , One credit
1. Introduction to Bioinformatics , Nature of biological data, Overview of available Bioinformatics resources on the web, NCBI/EBI/EXPASY etc (3)
2. Biological Databases: Nucleic acid sequence databases, GenBank/EMBL/DDBJ
Protein sequence databases, SwissProt, UniProtKB, Genome databases-OMIM,
structural databases, PDB, NDB, CCSD, drived databases Prosite, BLOCKS, Pfam/Prodom, Database search engines, Entrez , SRS (3)
3. Overview/concepts in sequence analysis, Pairwise sequence alignment algorithms, Scoring matrices for Nucleic acids and proteins ,Database Similarity Searches –BLAST, FASTA
Multiple sequence alignment, PRAS, CLUSTALW (3)
4. DNA and Protein Microarrays (1)
5. Macromolecular Structure and Overview of molecular modeling Protein - Primary, Secondary, Supersecondary, Tertiary and Quaternary structure, Nucleic acid – DNA and RNA, Carbohydrates, 3D Viral structures, Methods to study 3D structure, Analysis of 3D structures (2)
6. Principles of protein folding and methods to study protein folding (1)
7. Maromolecular interactions , Protein – Protein, Protein – Nucleic acids , Protein – carbohydrates (1)
8.Introduction to Molecular modelling methods (1)
Module II
Chemoinformatics:-Theory 15 hours, one credit
1. Role of Chemoinformatics in pharmaceutical/chemical research, Integrated databases, HTS analysis, Ligand based design of compounds, Structure based design of compounds (2)
2. Overview of Structure representation systems, 2D and 3D structures, General introduction to chemical structure-hybridization, tetrahedron geometry etc, The degeneracy of isomeric SMILES and introduction to unique SMILES, Internal co-ordinates and introduction to calculation of Z matrix of simple small organic molecules. (2)
3. Chemical Databases – Design, Storage and Retrieval methods (1)
4. Introduction to database filters, property based & (drug-like)-Lipinski Rule of Five (1)
5. Search techniques, similarity searches and clustering (1)
6. Modeling of small molecules and methods for interaction mapping (1)
7. Characterization of chemicals by Class & by Pharmacophore, application in
HTS Analysis (1)
8. Introduction to pharmocophore, Identification of pharmacophore features, Building pharmacophore hypothesis, Searching databases using pharmacophores (2)
9. Overview of Quantitative Structure Activity Relationship & application to Hit to lead optimization (1)
10. Chemoinformatics tools for drug discovery-Integration of active drugs ,Optimization techniques , Filtering chemicals, In silico ADMET; QSAR approach, Knowledge-based approach (3)
Practicals (15 X 2 =30 hours) Total Two modules, Two credits
Bioinformatics Practicals
Module I:- 15 hours, One credit
Syllabus:
1. Exploring NCBI database system, querying the PUBMED and GenBank databases , EBI server and searching the EMBL Nucleotide database, Exploring & querying SWISSPROT & UniProtKB (2)
2. Pair-wise global alignments of protein and DNA sequences using Needleman-Wunsch algorithm & interpretation of results to deduce homology between the sequences, use of scoring matrices, Pair-wise local alignments of protein and DNA sequences using Smith-Waterman algorithm and interpretation of results (2)
3. Database (homology) searches using different versions of BLAST and FASTA and interpretation of the results to derive the biologically significant relationships of the query sequences (proteins/DNA) with the database sequences (3)
4. Multiple sequence alignments of sets of sequences using web-based and stand-alone version of CLUSTAL. Interpretation of results to identify conserved and variable regions and correlate them with physico-chemical & structural properties (2)
5. Exploring and using the derived databases: PROSITE, PRINTS, BLOCKS, Pfam and Prodom for pattern searching, domain searches etc. (1)
6. Search & retrieval: genomic and OMIM data at NCBI server, Exploring the Database & searches on PDB and CSD, WHATIF, Interpreting DNA and Protein microarray data (1)
7. Studying the format & content of structural databases, Molecular visualization tools :-Visualization of tertiary structures, quaternary structures, architectures and topologies of proteins and DNA using molecular visualization softwares such as RasMol, Cn3D, SPDBV, Chime, Mol4D, etc. (2)
8. Structure prediction tools and homology modeling, Comparison of the performance of the different methods for various classes of proteins, Prediction of tertiary structures of proteins using Homology Modeling approach: SWISSMODEL, SWISS-PDB Viewer (2)
Chemoinformatics Practicals
Module II:- 15 hours, One credit
Syllabus:
1. Introduction to basic chemoinformatics software/tools-ACDsketch, Chemsketch, Jchem VegaZZ etc. NCL’s moltable , Chembiofinder (3)
2. Importance of storing chemical in the form of graph, linear notation (SMILES,WLN, ROSDAL-with special emphasis on SMILES and stereochemistry- both optical and geometrical isomerism), connection tables-sd and mol files. (2)
3. Importance of 3D structure and methods available for 3D structure generation- CORINA and CONCORD (2)
4. A brief introduction to database (ISIS Base) with special emphasis on the storage of chemical in the database format. (2)
5. Substructure searching and general property calculation-rotatable bonds, hydrogen bond donor, hydrogen bond acceptor, molecular weight, molecular refractivity, molecular volume, surface area and polar surface area. (3)
6. Representing SMARTS, Recursive and Component level SMARTS and linear representation of chemical concepts like Pka, pH, Zwitterions, Functional Groups, Aromaticity (2)
7. Molecular docking-Drug docking basics (1)
References:
Module I
Bioinformatics
1. Bioinformatics: A Practical Guide to the analysis of Genes and Proteins (2nd Ed.) by Baxevanis, A.D. & Ouellettee, B., F. F., New York, John Wiley & Sons, Inc. Publications, 2002.
2. Introduction to Bioinformatics by Attwood, T.K. & Parry-Smith, D.J., Delhi, Pearson Education (Singapore) Pte.Ltd., 2001.
3. Bioinformatics: Sequence and Genome Analysis by Mount, David, New York, Cold Spring Harbor Laboratory Press, 2004.
4. Current Protocols in Bioinformatics by Baxevanis, A.D., Davison, D.B., Page, R. D. M. & Petsko, G.A., New York, John Wiley & Sons Inc., 2004.
5. Structural Bioinformatics - Methods of biochemical Analysis V. 44 by Philip E. Bourne (Editor), Helge Weissig (Editor) New Jersey. Wiley-Liss, 2003.
6. Principles of protein X-ray Crystallography by Jan Drenth, Springer-Verlag, 1994.
7. Introduction to Protein Structure by Branden, Carl & Tooze, John, Garland Publishing, 1991.
8. Molecular Modeling: Principles and Applications by Andrew Leach, Prentice Hall, 2001.
9. Computational methods for protein folding : advances in chemical physics vol. 120 by Friesner, R.A. Ed., Prigogine, L. Ed. & Rice, S.A.New York. John wiley & sons, Inc. publication, 2002.
10. Dynamics of Proteins and Nucleic Acids by J.A. McCammon and S.C. Harvey Cambridge University Press, 1087.
11. Protein Structure: A Practical approach by Creighton T. E., 1989.
12. Protein Folding by Creighton T., 1992.
13. Protein Structure Prediction: A practical approach by Sternberg M.J.E., 1996.
14. Molecular Modeling: Basic Principles and application by Hans Dieter and Didier Rognan. Wiley VeH Gmbh and Co. KGA, 2003.
15. Prediction of protein structure and the principles of protein conformation by Fasman, G.D. New York. Plenum Press, 1989.
16. Protein modules in cellular signaling edited by Heilmeyer, L. & Friedrich, P. Amsterdam . IOS Press, 2001.
17. Metal sites in proteins and models by Hill, H.A.O., Sadler, P.J. & Thomson, A.J Berlin. Springer, 1999.
18. Protein structure prediction: methods and protocols by Webster, D. M., Ed. Totowa Humana Press, 2000.
19. Modular protein domains by Gimona, G. Cesareni. & Yaffe, M. Sudol ( EDS. ), USA., Wiley-vch verlag gmbh & co.,3-527-30813-X , Aug. 2004.
20. Molecular modeling: basic principles and applications by Holtje, H.D. & Folkers, G., Weinheim, VCH, 1997.
21. Molecular Modeling: Basic Principles and application by Hans Dieter & Didier Rognan, Wiley VeH Gmbh and Co. KGA, 2003.
22. Arthur M. Lesk ( 2003) Introduction to Bioinformatics, Oxford University Press, Indian edition
23. Des Higgins and Willie Taylor (2000). Bioinformatics, Sequence, structure and databanks. A practical approach. Oxford University Press, Indian edition, Second impression, New Delhi
24. Imtiaz Alam Khan (2005). Elementary bioinformatics. Pharma Book Syndicate, Hyderabad
25. Irfan Ali Khan and Attiya Khanum (eds.) ( 2002). Emerging trends in Bioinformatics. Ukaaz Publications, Hyderabad
26. Irfan Ali Khan and Attiya Khanum (eds.) (2005). Basic concepts of Bioinformatics, Ukaaz Publications, Hyderabad
27. Irfan Ali Khan and Attiya Khanum (eds.) (2004). Introductory Bioinformatics. Ukaaz Publications, Hyderabad
28. Krane Dan, E. and Raymer M.L. (2004). Fundamental concepts of Bioinformatics. Pearson education. New Delhi. Second Indian reprint.
29. Rastogi, S.C., Medirattta, N., Rastogi. P. (2004), Bioinformatics, methods and applications, genomics, proteomics and drug discovery, Prentice hall of India, pvt. Ltd., New Delhi
30. Stephen Misener and Stephen Krawetz (eds.) (2004) Bioinformatics, methods and protocols, methods in molecular biology, Volume 132, Humana Press, New Jersey, Third Indian reprint
31. T K Atwood and D J Parry-Smith (2004) Introduction to Bioinformatics, Pearson education, New Delhi
32. Xiong, Jin (2006) Essential bioinformatics, Cambridge university press
References
Module II
Chemoinformatics
1. Chemoinformatics: Theory, Practice & Products (2009) Barry A. Bunin , Brian Siesel, Guillermo Morales Jürgen Bajorath, Springer
2. Pharmaceutical Data Mining: Approaches and Applications for Drug Discovery (2009) Konstantin V. Balakin Sean Ekins (Series Editor) , Wiley
3. Chemoinformatics: An Approach to Virtual Screening (2008), Alexandre Varnek Alexander Tropsha (Editor) , Royal Society of Chemistry
4. Chemoinformatics: Concepts, Methods, and Tools for Drug Discovery (Methods in Molecular Biology), (2004) J. bajorath (ed.) Humana Press
5. Chemoinformatics (2004) Johann Gasteiger and Thomas Engel.
6. An introduction to Chemoinformatics (2003) Andrew R. Leach and Valerie J. Gillet, Kluwer Academic Publisher,
7. Handbook of Chemoinformatics. From Data to Knowledge 92003). Johann Gasteiger.
8. Chemometrics and Chemoinformatics (2005) Barry K. Lavine, ACS Symposium series 894.
9. Molecular modelling and prediction of bioactivity (2000) by Gundertofte, K. & Jorgensen, F.S. New York. Kluwer academic publishers.
Addresses of public domain database/tools/resources/ free ware websites
1. DBGET-http://www.genome.jp/dbget/
2. LinkDB-http://www.genome.jp/dbget/linkdb.html
3. Fgenes-http://www.softberry.com/berry.phtml?topic=products
4. GeneBuilder-http://www.itb.cnr.it/sun/webgene/
5. GeneSCAN-http://genes.mit.edu/GENSCAN.html
6. GRAIL-http://compbio.ornl.gov/Grail-1.3/
7. CLC Free Workbench http://www.clcbio.com/index.php?id=28
8. BioEditor-http://bioeditor.sdsc.edu/
9. CN3D 4.1 -http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml
10. Protein-Explorer-http://www.umass.edu/microbio/chime/pe_beta/pe/protexpl/frntdoor.htm
11. Chimera-http://www.cgl.ucsf.edu/chimera/
12. Yasara-http://www.yasara.comhttp://www.yasara.com)
13. Ribosome builder-http://rbuilder.sourceforge.net/
14. ArrayExpress-www.ebi.ac.uk/arrayexpress/
15. EPICLUST-http://ep.ebi.ac.uk/EP/
16. RasMOL-http://www.umass.edu/microbio/rasmol/
17.CHIME-www.mdl.com/chime/
Electronic journals subscribed by University grants commission-India under INFLIBNET facility-accessible at Goa University
Subscribe to:
Comments (Atom)