TWiki> Cruz Web>CS586Fall2018 (2018-08-30, Main.ifcruz)EditAttach

CS586 Data and Web Semantics


Welcome (back) to UIC!

The following is preliminary. There may be adjustments and updates.


Semantics is an essential component of Big Data, including of Web Data. The all encompassing name is Semantic Web. You can cluster, extract, transform, or reduce data, but without associating semantics, those data are practically useless. Would you trust an engine that extracts information from some very large repository to diagnose your disease (e.g., acute compartment syndrome) and decide on the course of action (e.g., your arm amputation) without your signed consent because amputation is as "important" as influenza?

Another very important benefit of semantics is the ability for data to be computer processable and for that processing to be scalable.

Semantic Web enables Linked Data by connecting data and knowledge across different web repositories. It is also an important tool for data integration.


Semantic Web is an important subject in the main conferences on the Web (WWW, now named the Web Conference) and on AI (AAAI, IJCAI) have a substantial component on the topic of semantics. The Web conference's (the top conference on data, according to Google) track on "Web content analysis, semantics and knowledge" was the largest among 11 tracks in 2018. ISWC (International Semantic Web Conference) is the main conference for Semantic Web work.


Google has been adding semantics to their data for several years, for example in their Knowledge Graphs. They have also been hiring top AI researchers (e.g., from IBM Watson and Stanford). Banks (e.g., Bank of America) have been using Semantic Web languages and techniques for many years. The healthcare industry uses the Semantic Web for organizing medical and patient data (e.g., the Mayo Clinic) and Snomed CT (clinical terms) uses Semantic Web for a terminology for the Electronic Health Record. NASA hired a consulting Semantic Web company for managing the space shuttle inventory parts.

A current data specialist position at Apple states as the two first key qualifications:

  • Strong analytical skills; Data modeling, semantic development

  • Expert at connecting to data sources and normalizing data for reporting

And for education:

  • BS required, advanced degree in Statistics and/or Computer Science a plus.


Data Science consists of the following disciplines: Data Modeling, Data Management, Data Extraction, Data Visualization, and Data Analytics. Cross-cutting disciplines are: Semantic Web, Machine Learning, Probability and Statistics. This course touches on all of these areas with the particular aim to prepare students to conduct research in the important subjects that comprise the Semantic Web research area and to add Semantic Web techniques to other Data Science disciplines. Material will be formally covered following the textbook to be presented in class or assigned as reading. There may be course projects representative of current research and development in the Semantic Web area, especially designed to further the students' understanding of the main research topics. The exam will be comprehensive of the formal material (book and research papers).

Recommended Background

Be a PhD student at UIC. This course can be used as part of the course requirements for the PhD qualifier.

As for courses, at least 1 and preferably 2 or more at the senior/graduate level:

  • Database Management Systems

  • Data Science

  • Information Retrieval

  • Data Mining

  • Artificial Intelligence

  • Visual Analytics

  • Machine Learning

Readings and References

There is a recommended book for the course:

A Semantic Web Primer by Grigoris Antoniou, Paul Groth, Frank van Harmelen, and Rinke Hoekstra (The MIT Press, 3rd edition, 2012).

Further readings will be posted as the semester unfolds. Specific readings for the survey and projects will be suggested.


Exams (and/or quizzes): 30%

Topic survey (written component, presentation, slides, questions): 20%

Smaller projects (if any): 0-20 %

Final Project (written component, presentation, slides, questions): 25-45%

Class participation: 5%

All of the above components will be judged for their quality in content (at a grad school level) and presentation (e.g., style, proper use of definitions and terminology, typos).

Course Policy

Cheating will not be tolerated in this course and the penalties imposed by the department and the university will be followed. In particular, individual work must be performed by the student alone and group projects must be performed only by the elements of the group. Note that plagiarism, including copying information from the web, is a form of cheating.

Students are urged to check with the instructor on what constitutes proper and improper use of references and software both available in printed form or electronic form and on what constitutes proper and improper forms of collaboration and authoring. Understanding such distinctions will be extremely useful in a student's research or professional career.


Strongly encouraged and graded. Students are required to stay till the end (Friday) of the exam week.

Instructor and Office Hours

Professor Isabel Cruz

Office hours: By appointment

Auxiliary mentoring (TBD)

-- ifcruz - 2018-08-30

Topic revision: r1 - 2018-08-30 - 05:00:56 - Main.ifcruz
Copyright 2016 The Board of Trustees
of the University of
Helping Women Faculty Advance
Funded by NSF