Data Mining and Text Mining for Bioinformatics: Proceedings of the
Held in Conjunction with
ECML / PKDD- 2003
in Dubrovnik, Croatia
22 September, 2003
In the past years, research in molecular biology and molecular medicine
has accumulated enormous amounts of data. This includes genomic sequences
gathered by the Human Genome Project, gene expression data from microarray
experiments, protein identification and quantification data from proteomics
experiments, and SNP data from high-throughput SNP arrays. However, our
understanding of the biological processes underlying these data lags far
behind. There is a strong interest in employing methods of knowledge discovery
and data mining to generate models of biological systems. Mining biological
databases imposes challenges which knowledge discovery and data mining have
to address, and which form the focus of the European Workshop on Data Mining
and Text Mining for Bioinformatics.
This volume contains the papers presented at the European Workshop on
Data Mining and Text Mining for Bioinformatics, held at the European Conference
on Machine Learning and the European Conference on Principles and Practice
of Knowledge Discovery in Databases, in Dubrovnik, Croatia, on September
22, 2003. Three invited and ten contributed papers were presented at the workshop;
invited presentations were given by
We would like to thank the members of the program committee.
- Luc Dehaspe, PharmaDM: “Great Expectations: A To-Do List for the Biologist’s
in Silico Research Assistant”,
- Udo Hahn, Freiburg University: “Challenging Natural Language Processors
– Prospects for Bioinformatics in the Natural Language Engineering Age”,
- Steven J. Barrett, GlaxoSmithKline Research and Development: “Recurring
Analytical Problems within Drug Discovery and Development”.
Additional reviews were written by Jörg Hakenberg. Based on these
reviews, we selected nine research papers and one research note for presentation
at the workshop.
- Sourav Bhomwick, Nanyang Technological University, Singapore.
- Christian Blaschke, Centro Nacional de Biotecnologia.
- Vladimir Brusic, Biodiscovery Group, Institute for Infocomm Research,
- Mark Craven, University of Wisconsin.
- Saso Dzeroski, Jozef Stefan Institute.
- George Forman, Hewlett Packard.
- Jiawei Han, University of Illinois at Urbana Champaign.
- Ross King, University of Wales, Aberystwyth, and PharmaDM.
- Adam Kowalczyk, Telstra.
- Stefan Kramer, Technische Universität München.
- Knut Reinert, Free University, Berlin.
- Steffen Schulze-Kremer, Max-Planck-Institute, Berlin.
- Myra Spiliopoulou, University of Magdeburg.
- Alfonso Valencia, Centro Nacional de Biotecnologia, Spain.
- David Vogel, AI Insight.
- Stefan Wrobel, Fraunhofer AiS and University of Bonn.
- Mohammed Zaki, Rensselaer Polytechnic Institute.
We gratefully acknowledge support from the European Network on Excellence
in Knowedge Discovery, KD-Net. We particularly wish to thank Codrina Lauth
for her support.
Berlin, 14 July 2003
Humboldt University, Berlin
Table of Contents
Get the Proceedings as one PDF file.
Francisco Couto, Mario Silva, and Pedro Coutinho: “Improving Information
Extraction through Biological Correlation”
Thanh-Nghi Do and François Poulet : “Incremental SVM and Visualization
Tools for Biomedical Data Mining”
Lukas Faulstich, Peter Stadler, Caroline Thurner, and Christina Witwer:
“litsift: Automated Text Categorization in Bibliographic Search”
Adam Kowalczyk and Bhavani Raskutti: “Fringe SVM Settings and Aggressive
Simon Lin, Sandip Patel, Andrew Duncan, and Linda Goodwin: “Using Decision
Trees and Support Vector Machines to Classify Genes by Names”
Michael Schroeder and Cecilia Eyre: “Visualization and Analysis of Bibliographic
Networks in the Biomedical Literature: A Case Study”
Alexander Seewald: “Towards Recognizing Domain and Species from MEDLINE
Alexander Seewald: “Evaluating Protein Name Recognition: An Automatic Approach”
Špela Vintar, Ljupčo Todorovski, Daniel Sonntag, Paul Buitelaar: “Evaluating
Context Features for Medical Relation Mining”
Larry Yu, Fu-lai Chung, Stephen Chan: „Emerging Pattern Based Projected
Clustering for Gene Expression Data”