Název: | Dynamic threshold selection method for multi-label newspaper topic identification |
Autoři: | Skorkovská, Lucie |
Citace zdrojového dokumentu: | SKORKOVSKÁ, Lucie. Dynamic threshold selection method for multi-label newspaper topic identification. In: International conference on image analysis and recognition. Berlin: Springer, 2013, p. 209-216. (Lecture notes in computer science; 8082). ISBN 978-3-642-40584-6. |
Datum vydání: | 2013 |
Nakladatel: | Springer |
Typ dokumentu: | článek article |
URI: | http://www.kky.zcu.cz/cs/publications/SkorkovskaL_2013_DynamicThreshold http://hdl.handle.net/11025/16982 |
ISBN: | 978-3-642-40584-6 |
Klíčová slova: | identifikace tématu;multi-label klasifikace textu;jazykové modelování;naivní bayesovská klasifikace |
Klíčová slova v dalším jazyce: | topic identification;multi-label text classification;language modelling;naive bayes classification |
Abstrakt v dalším jazyce: | Nowadays, the multi-label classification is increasingly required in modern categorization systems. It is especially essential in the task of newspaper article topics identification. This paper presents a method based on general topic model normalisation for finding a threshold defining the boundary between the "correct" and the "incorrect" topics of a newspaper article. The proposed method is used to improve the topic identification algorithm which is a part of a complex system for acquisition and storing large volumes of text data. The topic identification module uses the Naive Bayes classifier for the multiclass and multi-label classification problem and assigns to each article the topics from a defined quite extensive topic hierarchy - it contains about 450 topics and topic categories. The results of the experiments with the improved topic identification algorithm are presented in this paper. |
