Publications Repository - Gdańsk University of Technology

Page settings

polski
Publications Repository
Gdańsk University of Technology

Treść strony

Evaluation of Path Based Methods for Conceptual Representation of the Text

Typical text clustering methods use the bag of words (BoW) representation to describe content of documents. However, this method is known to have several limitations. Employing Wikipedia as the lexical knowledge base has shown an improvement of the text representation for data-mining purposes. Promising extensions of that trend employ hierarchical organization of Wikipedia category system. In this paper we propose three path-based measures for calcu- lating document relatedness in such conceptual space and compare them with the Path Length widely used approach. We perform their evaluation using the OPTICS clustering algorithm for categorization of keyword-based search results. The results have shown that our method outperforms the Path-Length approach.

Authors

Additional information

DOI
Digital Object Identifier link open in new tab 10.1007/978-3-319-08326-1_44
Category
Aktywność konferencyjna
Type
publikacja w wydawnictwie zbiorowym recenzowanym (także w materiałach konferencyjnych)
Language
angielski
Publication year
2014

Source: MOSTWiedzy.pl - publication "Evaluation of Path Based Methods for Conceptual Representation of the Text" link open in new tab

Portal MOST Wiedzy link open in new tab