Web Content Mining
ACM SIGKDD Inaugural Webcast, Nov 29, 2006
The webcast presentation covered three main topics
- Structured Data Extraction
- Information Integration
- Opinion Mining
You can download the live recording of the Webcast from here. A link to the recording is also posted at www.KDD.org. The slides are available here.
Questions and Answers
I have answered all the questions from the audience posted during the Webcast. They have appeared in the Dec 5, 2006 issue of kduggets news.
References
The relevant works are cited below. For the actual references,
please download this,
which is the reference section of my Web Data Mining book. (If you can
wait for a few days, I will extract all the references from the file and
put them here).
Structured data extraction
- Liu, Web Data Mining book 2006 - Chapter 9
- Wrapper induction systems:
WL2 by Cohen et al. [108]
Thresher by Hogue and Karger [241]
Softmealy by Hsu and Dung [244]
The system reported by Irmak and Suel [250]
WIEN by Kushmerick et al [296)
Stalker by Muslea et al. [399]
IDE by Zhai and Liu [599]
- Automatic extraction systems, e.g.,
EXALG by Arasu and Garcia-Molina [26],
IEPAD by Chang and Lui [91] (semi-automatic)
RoadRunner by Crescenzi et al [117]
The system by Lerman et al [312]
MDR by Liu et al [341]
NET by Liu and Zhai [351]
DeLa by Wang and Lochovsky [530]
DEPATA by Zhai and Liu [600]
The system by Zhao et al [612]
The systems by Zhu et al [620, 621]
Information Integration
- Liu, Web Data Mining book, 2006. - Chapter 10
- Database integration
Batini et al [40]
Dhamankar et al. [133]
Clifton et al. [105]
Cohen [107]
Embley et al. [162]
Do and Rahm [144]
Doan and Halevy [146]
Kalfoglou and Schorlemmer [265]
Kashyap and Sheth [269]
Larson et al. [306]
Madhavan et al. [358]
Rahm and Bernstein [455]
Sheth and Larson [488]
Shvaiko and Euzenat [491]
Xu and Embley [563]
Yan et al. [566]
- Web interface integration
Dragut et al. [153, 154]
He and Chang [227, 229]
He et al. [230, 231]
Wang et al. [531]
Wu et al. [559]
Zhang et al. [609]
- Ontology integration
Agrawal and Srikant [13],
Doan et al. [147]
Gal et al. [190]
Zhang and Lee [602]
Opinion Mining
- Liu, Web Data Mining book 2006) - Chapter 11
- Work before 2006
Andreevskaia and Bergler [22]
Beineke et al. [44]
Carenini et al [80]
Dave et al. [122]
Gamon [191]
Gamon et al. [192]
Hatzivassiloglou and McKeown [224]
Hatzivassiloglou and Wiebe [225]
Hearst [232]
Hu and Liu [245]
Kim and Hovy [276]
Kobayashi et al. [284]
Ku et al. [291]
Liu et al. [347]
Nigam and Hurst [412]
Pang et al. [427, 428]
Popescu and Etzioni [447]
Riloff and Wiebe [462]
Turney [521]
Wiebe and Riloff [545]
Wilson et al. [548]
Yi et al. [577]
Yu and Hatzivassiloglou [584]
- Some works in 2006
Carenini et al [81]
Eguchi, and Lavrenko [159]
Jindal and Liu [255, 256]
Kaji and Kitsuregawa [264]
Kanayama and Nasukawa [267]
Kim and E. Hovy [277]
Ku et al [290]
Ng et al, [409]
Stoyanov and Cardie [507]
Wiebe, and Mihalcea [544]
Wilson et al, [546]
Zhuang et al [622]
First Draft: by Bing Liu on Nov 29, 2006.