Revealing the Logic of the Hidden Web

Michael Benedikt

While search engines have made great strides at retrieving static documents in response to keyword queries, an enormous amount of web content consists of structured data available only via browser-oriented interfaces (e.g. web forms). This content, often referred to as the "Deep Web" or "Hidden Web" is estimated to dwarf the information content of the static pages in the surface web. In this talk I discuss foundational research issues in creating federated query interfaces on top of the Hidden Web. One issue particularly relevant to the Hidden Web is the limited interfaces they expose -- drop-down lists or textboxes representing queries with tuple parameters. We give methods for analyzing extraction and query answering on top of these interfaces, relating them to implication and equivalence problems for various logics. We will also touch upon other issues in Hidden Web processing, including form discovery and language support. The talk includes joint work with Georg Gottlob and Pierre Senellart.