This page is still under construction
Data is only valuable after it has been discovered
This page is designated to be the source of as many public available datasets we know as possible.
More exciting research starts from these data.
A synthesis data collection can be found here. More collections on specific topics
are listed below.
Social Network Data
- Stanford Large Network Data Collection
It is the source of many network data, e.g., user relations in Twitter, Wikipedia, Amazon, Epinions, web graphs
from Google, Stanford, citation networks from high-energy physics and US patents, etc. These data drive interesting
research and publications on top conferences including KDD, WWW, ICDM, SDM, WSDM, NIPS, etc.
- Social Media Dataset
It contains BlogCatalog data with user blogs, tags, categories, etc.
It also has Flickr and Youtube data in user network form. More social media data from the same group can be
found here. These data drive the research of
Dr. Huan Liu's group in Arizona State University.
- Academic Social Network Dataset
This page contains citation graph about authors, papers, venues, and time information. It is a good resource for research on
topic model, social influence and graph mining.
Natural Language Data
Spririt Project from CMU