Data extraction and label assignment for Web databases
"Most current work is deficient in providing users the meaning
of the attributes of the extracted data." - also applies to our work,
RoadRunner and Extracting
Structured Data (SIGMOD03). This problem is addressed in "Automatic
annotation of data extraction from large Web sites" but the solution
proposed is not general enough.
"Sometimes several attributes of the data object are encoded together
into one text string that is not separated by HTML tags."- our work
has not solve this problem yet.
They are only dealing with the pages from the web sites which provide
complex HTML search forms. This is a serious limitation. Most web sites
do not provide such complex search form.
Their wrapper generator is inspired by and similiar to previous work
on IEPAD.