Letupan

Catch the next wave!

June 9, 2008 at 4:32pm
Home

Guess content semantic with SOLR

I don’t know whether it can be named semantic. SO what I want to do is create an Lucene index with two fields: word, category (multiValued). Then we can pass a bunch of text to it and retrieve the score. It should output something like this:

Word/Category/Score

election/politics/1

obama/politics/1

microsoft/technology/1

microsoft/business/1

Then we sum up the score: politics/2, technology/1,business/1

We may then guess that the content was about politics :D. LOL. Can we do something like that?

How are we going to do that? Lucene+Hadoop

Notes