June 12, 2005

Reading Web Tea Leaves

BM recently launched WebFountain, an ambitious commercial service that mines valuable information from about half of the web's content, including informal communication from weblogs, newsgroups, and chat rooms.

IBM's supercomputer can process about 14,000 web pages per second. The system reads each page, extracts its content, then automatically annotates the material. The tagged pages, often many times the length of the originals, go into a huge data storage array. About 3 billion pages are already in the system.

All this painstakingly labeled information is then available to anyone interested in looking for trends or other valuable insights into what's going on. Users can deploy various software tools, including their own, to analyze the data and dig out relevant patterns and relationships.

[Source: Science News]

No comments: