Machine Learning Research Group
Research
We work on machine learning and applications in the areas of information retrieval and language processing. Currently, our work focuses on the following research topics.
- Transfer learning, learning when training and test distributions differ.
- Adversarial learning.
- Exploratory text analysis and visualization.
- Discriminative, structured prediction.
Current Funded Projects
Prediction Games
Funding: German Science Foundation DFG
Duration: 2010-2013
Most results on machine learning rely on the assumption that training data reflect the future behavior of the system under investigation. This assumption over-simplifies reality when an active adversary can exercise some control on the future behavior of the system. This is the case, for instance, with the identification of phishing attacks or credit card fraud. Here, model building becomes a game between learner and adversary. Game theory models such interactions as interleaved optimization problems. Since data-dependent optimization criteria are not a focus of game theory, many questions remain open today. Based on game-theoretic paradigms that model various patterns of interaction between players, the project aims at analyzing prediction games. In particular, the project will investigate learning models that constitute optimal solution to prediction games under defined circumstances.
Multimedia Retrieval
Funding: STRATO AG
Duration: Since 2009
The goals of this project is to evaluate and develop thechnology that allows to implement intuitive and intelligent ways of navigating large photo and video collections.
Security
Funding: Strato Rechenzentrum AG
Duration: since 7/2005
Strato is a European provider of webspace and server hosting services.
We analyze the adversarial classification problem of spam
identification. Spam filtering is a game between two opponents, spam
sender and spam filter, that react to each other's moves. We seek to
identify a winning strategy that cannot easily be dodged by spam
senders.
In cooperation with Strato AG, we have developed a spam filter that now processes roughly 1 percent of all emails sent and received worldwide.
Intrusions are attempted on a daily basis. Usually, attackers seek to
exploit insecure web sites in order to send huge amounts of spam emails via Strato's email servers. We develop an intelligent monitoring system that
tracks http requests and discriminates ligitimate use of a web site from attempts to exploit insecure scripts.
Modelling and Optimization of Dialysis Treatment
Funding: Fresenius-affiliate NephroCare e-Services GmbH
Duration: since 04/2008
We investigate model-building and the generation of actionable knowledge from records of dialysis treatments.
Scalable Ranking of Online Ads
Funding: nugg.ad AG
Duration: since 02/2007
In this project, we investigate efficient algorithms that predict which ad a user is most likely to click at, based on that user's past clicking behavior and all other information that is available.
Completed Projects
Differing Training and Test Distributions in Active Learning
Funding: Google Research Award
Active learning reduces the labeling effort incurred by applying machine learning algorithms. Active learning procedures direct the attention of a labeler towards examples whose label is believed to convey a maximum of information. Labeled samples in active learning are governed by a distribution that differs from the natural test distribution for multiple reasons. An initial labeled sample may be compiled from auxiliary data sources; the natural input distribution may change over time, or may be altered by an adversary. In addition - and specific to active learning - an active instance selection procedure creates a labeled sample that is biased by the selection criterion. Treating the artificially selected sample in active learning as if it was governed by the test distribution is not necessarily the best course of action. We will understand, develop, and evaluate systematic approaches to active learning that account for this discrepancy between labeled training and test distributions.
Funding: IBM, Jazz Faculty Grant
What is it that makes a good development process? We want to develop a plug-in that
learns from collaboration and defect data as tracked by Jazz, relates features of the
collaborative development process to the defect density of individual components, and
thereby automatically predicts code quality. For instance, the plug-in might advise that
package P should be reviewed more, because a new dependency on compiler internals
has been added shortly before the release date by a developer who is new to the team.
(German project title: Text Mining: Wissensentdeckung in Textsammlungen
und Effizienz von Dokumentenverarbeitungsprozessen) The amount of documents available in archives and on the web is
growing exponentially. This growth induces a demand for methods that
automatically analyze large volumes of documents, discover and utilize
valuable knowledge contained in them. A substantial part of our working
processes consists of processing (i.e., reading, writing, manipulating)
documents. Many tools support the administration of text documents,
such as file systems, databases, or document management systems. Much
greater efforts (and more expenses), however, are imposed by the actual
document manipulation processes — such as writing documents. Any
support of document manipulation processes requires substantial
knowledge; it is therefore much more difficult to support document
processing rather than document administration.
Funding: DaimlerChrysler AG We study the problem of discovering trends and new developments in
production and warranty databases as well as in workshop reports. We
develop technologies that automatically identify such trends and
discover their hidden causes. The goal of this project is the
constructive analysis of data mining
methods that lead to improved service processes by integrating and
analyzing textual information and data from multiple, heterogeneous and
distributed databases.
Duration: 2009
Mining Jazz Data to Assess Development Processes
Duration: 01/2008-12/2008
Principal Investigators: Andreas Zeller, Tobias SchefferText Mining: Knowledge Discovery in Text
Databases and Efficient
Document Processing
Funding: German Science Foundation DFG
Duration: June 2003 through June 2008
The goal of the „Text Mining“ project is to develop and study text
mining algorithms that discover knowledge in large document archives,
and
utilize this knowledge to support future text manipulation processes.
Data and Text Mining in Quality and Service
Duration: 08/2005-07/2008