by Salvador Nieto Sanchez and Evangelos Triantaphyllou
Encyclopedia of Optimization,
(P.M. Pardalos and C. Floudas, Editors), Kluwer Academic Publishers,
Boston, MA, U.S.A., Vol. 4, pp. 182-189, (2001).
Abstract:
From the 1950s onwards, the search for computerized tools and mathematical models that
can speed up the classification of large collections of documents has been the focus of
many research efforts. These efforts have been centered in developing tools that can speed up
the classification of documents according to some underlying context. A current example of this
situation is the Internet. In this worldwide conglomerate of databases, one can easily see the
speed at which documents on the topic, say, 'basketball' is retrieved from among the millions of documents
produced daily on the Internet. Document classification is also of paramount importance in many information
retrieval applications, such as news routing [7], classification/ declassification of official documents [15],
email filtering [27], and context derivation of electronic meetings [3].
Keywords and Phrases:
Document classification, computational linguistics, indexing
terms, context descriptors, text classification, keywords,
document surrogate, principle of least effort (PLE), vector space
model (VSM), One Clause At a Time (OCAT) algorithm, indexing
vocabulary, optimal indexing vocabulary, semantic analysis
methodologies, word patterns, conjunctive normal form (CNF),
disjunctive normal form (DNF).
Index:
Classification of large collections of documents, Internet,
document classification, news routing, s-mail filtering,
computational linguistics, automatic document classification,
indexing terms, context descriptors, text classification,
keywords, common and rare words, document surrogate, surrogate,
optimization in document classification, principle of least
effort (PLE), vector space model (VSM), One Clause At a Time
(OCAT) algorithm, indexing vocabulary, meaningful words, optimal
vocabulary, similarity of all the surrogates, optimal indexing
vocabulary, binary surrogates, semantic analysis methodologies,
word patterns, conjunctive normal form (CNF), disjunctive normal
form (DNF).