Title: Algorithmic Knowledge Extraction and Representation
from Sources of English Text
Date: April 9th, 2008
Time: 11:00 AM - 12:30 PM
Location: Room 256, Coates Hall
Abstract:
Fifty years have passed since scientific study of natural human languages
gained prominence, with two distinct fields: theoretical linguistics, and
the computationally grounded Natural Language Processing (NLP), the latter
of which is our primary interest. Over this time, advances with NLP
software and algorithms have been made, and presently there is a wide
selection of especially finely niched research available. Fewer are full
systems which unite pieces of this finely niched research for a shared
purpose. I would like to introduce Ehlic, a user interactive system of
knowledge exchange. Ehlic incorporates both open source code from the NLP
community, and new code based on concepts in theoretical linguistics. The
user interacts with Ehlic completely by means of the English language; the
user may provide statements of knowledge and pose questions, and in the case
of Ehlic's uncertainty, the user may be posed a question in return. The
current system operates in a subset of the English grammar: interactions are
conducted in present tense, with nouns, verbs, and adjectives, which may be
compounded into phrases, relative clauses, and sentences. Words and phrases
that represent concepts are manifest in the knowledge model, a semantic
network, as either instance or class types. An initial reference to a
concept introduces a new node into the semantic network, with any
relationships represented in the mentioning statement encoded as typed links
to other nodes. Subsequent references resolve to the same node, and new
knowledge is incorporated into its vicinity. Subsequent (future)
incarnations of Ehlic will continue to incorporate still more newly coded
theoretical linguistics algorithms and open source community NLP software,
increasing the capabilities of Ehlic as both a research tool, and eventually
a working interactive knowledge exchange system.
|