Data extraction and label assignment for Web databases

  • "Most current work is deficient in providing users the meaning of the attributes of the extracted data." - also applies to our work, RoadRunner and Extracting Structured Data (SIGMOD03). This problem is addressed in "Automatic annotation of data extraction from large Web sites" but the solution proposed is not general enough.
  • "Sometimes several attributes of the data object are encoded together into one text string that is not separated by HTML tags."- our work has not solve this problem yet.
  • They are only dealing with the pages from the web sites which provide complex HTML search forms. This is a serious limitation. Most web sites do not provide such complex search form.
  • Their wrapper generator is inspired by and similiar to previous work on IEPAD.
last update: April 4, 2005