Towards Effective Processing of Large Text Collections

Julian Szymański; Henryk Krawczyk

doi:10.1109/intech.2012.6457784

In the article we describe the approach to parallelimplementation of elementary operations for textual data categorization.In the experiments we evaluate parallel computations ofsimilarity matrices and k-means algorithm. The test datasets havebeen prepared as graphs created from Wikipedia articles relatedwith links. When we create the clustering data packages, wecompute pairs of eigenvectors and eigenvalues for visualizationsof the datasets. We describe the method used for evaluation ofthe clustering quality. Finally we discuss achieved results, pointsome improvements and perspectives for future development.

Authors

Additional information

DOI: Digital Object Identifier link open in new tab 10.1109/intech.2012.6457784
Category: Aktywność konferencyjna
Type: materiały konferencyjne indeksowane w Web of Science
Language: angielski
Publication year: 2012

Source: MOSTWiedzy.pl - publication "Towards Effective Processing of Large Text Collections" link open in new tab

link open in new tab

Publications Repository - Gdańsk University of Technology

Treść strony