Chargement...
 

Bioinformatics

Domaine
Bioinformatics
Domain - extra
Knowledge management and Algorithmics
Année
2012
Starting
September
État
Open
Sujet
Ranking biological and biomedical data: algorithms and applications
Thesis advisor
DENISE Alain
Co-advisors
COHEN BOULAKIA Sarah (LRI and INRIA AMIB)
Laboratory
Collaborations
Ulf Leser (Professor for Knowledge Management in Bioinformatics at Humboldt Universität in Berlin)
The design and evaluation of the methods will be done in close collaboration with oncologists from the Institut Curie and the Children's Hospital of Philadelphia.
Abstract
The extremely rapid growth in the number of public biomedical data sources associated with the exponential growth of data available in these sources makes the research phase of biomedical information increasingly complex and tedious. Several query systems of biomedical sources are now available and used by physicians but none of them offers to rank the data obtained in response to a query. This is particularly problematic in a context where simple queries can return thousands of results.
Many classical ranking techniques can be considered for biomedical data, from very simple (eg, focus on the most recent data) to complex. The aim of this thesis is to propose a uniform framework able to represent and compare (conceptually and on real data) ranking methods and bring up new approaches for ranking large masses of biological data.
Context
Many classical ranking techniques can be considered for biomedical data, varying from simple (eg, most recent data first) to complex (e.g., techniques such as PageRank exploiting links between data) approaches. These algorithms exploit two key aspects of biological data: (i) linked to each other’s, forming an intricate network (graph-based algorithms) and (ii) plain-text file (text mining like algorithms). Ranking algorithms have then very different time and space complexities.


On the other hand, techniques able to calculate consensus rankings (aka median) have been proposed Fagin et al, PODS 2004, Ailon, Algorithmica 2010,Cohen-Boulakia et al, SSDBM 2011. They propose to generate a new ranking highlighting the common points of a set of rankings while minimizing their disagreements. The computation of the median of a set of m rankings is then known to be NP-hard when m>4. Several (exact and approximated) algorithms and heuristics have then been proposed.



Objectives
The aim of this thesis is to introduce a uniform framework able to represent and compare (conceptually and on real data) both classical and consensual ranking methods and bring up new approaches for ranking large masses of biological data.
Work program
(1) Formal study of the ranking algorithms. Criteria of comparison will be introduced to describe all algorithms in a uniform way. The complexity of each algorithm will be evaluated. As for NP-hard problems, assumptions and parameters causing the complexity will be underlined.

(2) Evaluation of the ranking algorithms. Data obtained by the various ranking algorithms will be compared to expected data and random data. Several metrics will be considered in collaboration with Prof. Ulf Leser.

(3) Reducing the search space (graph-topology). Study the size of the search space to be explored for the results to be still reliable while obtained in a reduced time and using a reduced space.

4) Mixing graphs-based and text mining-based techniques to rank biological data
This part of the thesis will introduce text-mining techniques into ranking algorithms to exploit the textual aspect of biological data, following research directions described in Cohen-Boulakia and Leser, ICDE 2011.
Extra information
Prerequisite
Knowledge of the structures of biological data sources and skills in algorithmics and in object-oriented programming are expected.
Détails
Télécharger BioRanking.pdf
Expected funding
Institutional funding
Status of funding
Expected
Candidates
BRANCOTTE Bryan (étudiant M2 Recherche Informatique spécialité IAC, Orsay)
Utilisateur
alain.denise
Créé
Dimanche 03 juin 2012 16:44:41 CEST
dernière modif.
Lundi 11 juin 2012 21:39:20 CEST

Fichiers joints

 filenamecrééhitsfilesize 
BioRanking.pdf 11 Jun 2012 21:37119680.61 Kb


Ecole Doctorale Informatique Paris-Sud


Directrice
Nicole Bidoit
Assistante
Stéphanie Druetta
Conseiller aux thèses
Dominique Gouyou-Beauchamps

ED 427 - Université Paris-Sud
UFR Sciences Orsay
Bat 650 - aile nord - 417
Tel : 01 69 15 63 19
Fax : 01 69 15 63 87
courriel: ed-info à lri.fr