Hubness-Based Fuzzy Measures for High-Dimensional k-Nearest Neighbor Classification

Publication Date: 
30/08/2011
Authors: 
Nenad Tomasev, Milos Radovanovic, Dunja Mladenic, and Mirjana Ivanovic

This paper was published in the Proceedings of the 7th International Conference on Machine Learning and Data Mining in Pattern Recognition (MLDM 2011), New York, NY, USA, August 30 - September 3, 2011. The paper has been nominated as the Best Paper Award 2011.

Abstract: 

High-dimensional data are by their very nature often difficult to handle by conventional machine-learning algorithms, which is usually characterized as an aspect of the curse of dimensionality. However, it was shown that some of the arising high-dimensional phenomena can be exploited to increase algorithm accuracy. One such phenomenon is hubness, which refers to the emergence of hubs in high-dimensional spaces, where hubs are influential points included in many k-neighbor sets of other points in the data. This phenomenon was previously used to devise a crisp weighted voting scheme for the k-nearest neighbor classifier. In this paper we go a step further by embracing the soft approach, and propose several fuzzy measures for k-nearest neighbor classification, all based on hubness, which express fuzziness of elements appearing in k-neighborhoods of other points. Experimental evaluation on real data from the UCI repository and the image domain suggests that the fuzzy approach provides a useful measure of confidence in the predicted labels, resulting in improvement over the crisp weighted method, as well the standard kNN classifier.

AttachmentSize
2011-mldm-fuzzy-hubs.pdf184.07 KB

Partners

People

Alexander Marchuk
A.P. Ershov Institute of Informatics Systems
Alice Carpentier
Semantic Technology Institute, University of Innsbruck
Alina Dia Miron
Recognos Romania
Andreas Harth
AIFB Institute, Karlsruhe Institute of Technology
Anna Fensel
Semantic Technology Institute, University of Innsbruck
Barry Norton
AIFB Institute, Karlsruhe Institute of Technology
Benedikt Kämpgen
AIFB Institute, Karlsruhe Institute of Technology
Carlos Juiz
Universitat de les Illes Balears
Carolina Fortuna
Jozef Stefan Institute
Chris Bizer
Freie Universität Berlin
Daniel Fuleki
StrateGO Hungary - Creative Media Innovation Cluster
Daniele DellAglio
CEFRIEL
David Norheim
Computas
Dieter Fensel
Semantic Technology Institute, University of Innsbruck
Dumitru Roman
Stiftelsen SINTEF
Elena Simperl
AIFB Institute, Karlsruhe Institute of Technology
Francois Scharffe
University of Montpellier
Frank van Harmelen
Vrije Universiteit Amsterdam
Freddy Priyatna
Universidad Politécnica de Madrid
Giorgos Flouris
Foundation for Research and Technology Hellas
Graham Hench
Semantic Technology Institute International
Grigoris Antoniou
Foundation for Research and Technology Hellas
Ioana Ciuciu
Semantics Technology and Applications Research Laboratory
Irini Fundulaki
Foundation for Research and Technology Hellas
John Domingue
The Open University
Karl Aberer
Ecole Polytechnique Fédérale de Lausanne
Leonel Ruiz Miyares
Centre for Applied Linguistics
Lyndon Nixon
Semantic Technology Institute International
Marko Grobelnik
Jozef Stefan Institute
Marta Corubolo
CEFRIEL
Martin Kersten
Centrum Wiskunde & Informatica
Neil Chue Hong, EPPC
University of Edinburgh
Oscar Corcho
Universidad Politécnica de Madrid
Pablo Mendes
Freie Universität Berlin
Paolo Bouquet
Università degli Studi di Trento
Peter Mika
Yahoo Research Barcelona
Rajendra Akerkar
Western Norway Research Institute
Roberto García
Universitat de Lleida
Simeona Pellkvist
Semantic Technology Institute International
Simone Contessa
CEFRIEL
Snorri Gudmundsson
IceStat
Stefano Fumeo
CEFRIEL
Steffen Stadtmuller
AIFB Institute, Karlsruhe Institute of Technology
Thomas Bauereiss
Semantic Technology Institute, University of Innsbruck
Ying Zhang
Centrum Wiskunde & Informatica
York Sure
Leibniz Institute for the Social Sciences
Zoltan Miklos
Ecole Polytechnique Fédérale de Lausanne