"Optimization in Document Classification"



by Salvador Nieto Sanchez and Evangelos Triantaphyllou

Encyclopedia of Optimization, (P.M. Pardalos and C. Floudas, Editors), Kluwer Academic Publishers, Boston, MA, U.S.A., Vol. 4, pp. 182-189, (2001).


Abstract:
From the 1950s onwards, the search for computerized tools and mathematical models that can speed up the classification of large collections of documents has been the focus of many research efforts. These efforts have been centered in developing tools that can speed up the classification of documents according to some underlying context. A current example of this situation is the Internet. In this worldwide conglomerate of databases, one can easily see the speed at which documents on the topic, say, 'basketball' is retrieved from among the millions of documents produced daily on the Internet. Document classification is also of paramount importance in many information retrieval applications, such as news routing [7], classification/ declassification of official documents [15], email filtering [27], and context derivation of electronic meetings [3].

Keywords and Phrases: Document classification, computational linguistics, indexing terms, context descriptors, text classification, keywords, document surrogate, principle of least effort (PLE), vector space model (VSM), One Clause At a Time (OCAT) algorithm, indexing vocabulary, optimal indexing vocabulary, semantic analysis methodologies, word patterns, conjunctive normal form (CNF), disjunctive normal form (DNF).


Index:
Classification of large collections of documents, Internet, document classification, news routing, s-mail filtering, computational linguistics, automatic document classification, indexing terms, context descriptors, text classification, keywords, common and rare words, document surrogate, surrogate, optimization in document classification, principle of least effort (PLE), vector space model (VSM), One Clause At a Time (OCAT) algorithm, indexing vocabulary, meaningful words, optimal vocabulary, similarity of all the surrogates, optimal indexing vocabulary, binary surrogates, semantic analysis methodologies, word patterns, conjunctive normal form (CNF), disjunctive normal form (DNF).



Download this paper as a PDF file. (size = 1,302 KB)




Visit Dr. Triantaphyllou's Homepage.