Friday, August 1, 2008

Mining Topic-Specific Concepts and Definitions on the Web

I just finished reading the paper of the same name as this post by Bing Liu, Chee Wee Chin and Hwee Tou Ng. There were two thoughts that I had as reading this paper. First, though we all know that search engines are a one-size fits all application, I have not really thought about it. Is it not interesting that a 8-year old and a Biologist who has been doing research for 20 years can both type in the term "ant" in Google and receive the same results (Apache's Ant project for building Java applications).

Sure, without signing in to Google, what could they know about the individual making the search? What is really interesting is we are giving Google or other search engines practically nothing to work with, and expecting them to read our minds and for most of my searches they do pretty well. (Try searching for "mormons bosnia" and see what you get. By the way LDS is an abbreviation often used to refer to Mormons.)

Second (as in the second idea I had from this paper), it would be really nice if we could summarize a 100 web pages of results into a single table. What do you include and what do you not include are obvious questions. But imagine if our search for "data mining" as discussed in this paper returned a summary of definitions held on data mining. For example, "_" (53 pages), "_" (6 pages) and "_" (2 pages). Where each "_" would be a definition and the parantheses would be links to results pages containing all of those pages. Now it would be nice if there was some intelligent analysis of the definitions so that each of those 53 pages for the first definition, though each page did not write the same definition, for our purposes held the same meaning. Certainly each individual has different purposes, so that definition equality is up for debate in a lot of cases depending on who you talk to. This idea could definitely use a lot of refinement.

1 comment:

Anonymous said...

Who knows where to download XRumer 5.0 Palladium?
Help, please. All recommend this program to effectively advertise on the Internet, this is the best program!