I found this paper particularly interesting:
"Knowledge Discovery from Semi-Structured Data for Conceptual Organization"
The authors talk about
- creating a concept map (a graph of co-occurring [in other terms, related] concepts [noun phrases]) from a corpus,
- then extracting cliques of a particular concept [using a topological sort],
- and then assigning the documents to that particular clique
What you get a the end is the mapping of a document to concepts. In short, what are the noun phrases that describe the document.
These noun phrases may not always contain all the noun phrases that occur in the document [eg. red soil, which may have occurred in one of the documents containing the concept "rose", but it did not co-occur in any other document containing "rose"]
On the other hand, these noun phrases may include some concepts which were not obvious from that particular document [eg. a document containing "school kids" may not contain the concept "chocolate", but these two concepts have co-occurred in a significant number of documents in the corpus]..
I liked the paper, the concept is very similar to what we do in real life.
We have concepts stored inside our brain [connection between neurons?]. To extract the concept map stored in your brain, you just need to think about all the things that come to your mind when you think about a concept say "s e x".
When ever we come across something new (a new concept), we just associate it to an already existing concept map in our brain. These concepts are in turn linked to information about them (related documents?)
We can even take this a step ahead by assigning weights to the edges [based upon how frequently they co-occur, label the edges[and nodes] with all possible verbs that connect them
btw, the authors the Stanford Parser to extract the noun phrases.
you can try it online here.
Moving from syntactic to semantic.. arent we?
No comments:
Post a Comment