This page is still under construction

Data is only valuable after it has been discovered

This page is designated to be the source of as many public available datasets we know as possible. More exciting research starts from these data.
A synthesis data collection can be found here. More collections on specific topics are listed below.

Social Network Data

  • Stanford Large Network Data Collection
  • It is the source of many network data, e.g., user relations in Twitter, Wikipedia, Amazon, Epinions, web graphs from Google, Stanford, citation networks from high-energy physics and US patents, etc. These data drive interesting research and publications on top conferences including KDD, WWW, ICDM, SDM, WSDM, NIPS, etc.
  • Social Media Dataset
    It contains BlogCatalog data with user blogs, tags, categories, etc. It also has Flickr and Youtube data in user network form. More social media data from the same group can be found here. These data drive the research of Dr. Huan Liu's group in Arizona State University.
  • Academic Social Network Dataset
    This page contains citation graph about authors, papers, venues, and time information. It is a good resource for research on topic model, social influence and graph mining.



Collaborative Filtering