TWiki> PSYu Web>Lab (2016-04-14, Main.caobokai)EditAttach

*Big Data and Social Computing (BDSC) Lab

The research focuses on next generation data management and mining issues on big data with special attentions to heterogeneous information networks and social computing.

Projects:

-Graph Mining: Graphs are increasingly important in modeling real-world data with complex structures. Our research on graph mining includes the following topics:

-Large graph database management: Graph search and indexing on a database of large graphs, and on a single large network [1].

-Scalable machine mining on large graph(s): Conduct machine learning and data mining algorithms in large graph(s), including community detection, link inference, collective classification [2-13, 19-21, 28].

-Subgraph Pattern Mining: it is the process of finding and extracting useful information from graph structured data sets (e.g., molecular structure graphs) to discover significant features [14-18].

-Social Network Mining: Powered by data cloud and mapreduce infrastructure, social network platforms are gathering data on many aspects of our daily lives. Motivated by this trend, our research addresses interesting phenomena on social networks including the following topics:

-Network structure and macro social pattern mining: such as magnet community detection, social influence evaluation [6].

-Influence propagation and social activity mining, including social sharing temporal pattern, spam detection, and social advertising [41-44].

-Role discovery: (e.g., finding the most influential nodes [29]).

-Learning from multiple data sources: Multiple related data sources containing different types of features may be available for a given task. For instance, users' profiles can be used to build recommendation systems; in addition, a model can also use users' historical behaviors and social networks to infer users' interests on related products. It is desirable to collectively use any available multiple heterogeneous data sources in order to build effective learning models:

-Transfer Learning [30-40].

-Crowd Sourcing [45].

-Heterogeneous Learning (e.g., a gradient boosting framework for learning with heterogeneous data [26], a method to cluster and sharpen text with side/auxiliary attributes [7, 13]).

-Multi-label Learning: Many real-world classification tasks involve multiple concepts instead of one single concept, and each data object can be assigned with multiple concepts (class labels) simultaneously. Multi-label learning aims at building accurate classification models that can predict multiple concepts collectively for each object [22~28].

-Stream mining [9, 10, 12]: Design efficient real-time algorithms for continuous data streams, especially for graph streams.

-Heterogeneous Information Networks: Many real-world networks like social networks and information systems usually involve a large number of components, multiple types entities interconnected with different types of relations. We call these networks as heterogeneous information networks, which are critical for modern information infrastructure. [4, 5, 19, 21].

-Mining Uncertain and Incomplete Data: Most real data we are facing these days are neither certain nor complete, which becomes a great challenge for applying conventional data mining methods on these data. We aim at designing effective models to perform knowledge discovery from data with uncertainty and incompleteness. [11, 20, 42].

-Review spam detection: as well-organized spammers are adopting smart strategies in spamming review websites (e.g. amazon.com, yelp.com), traditional language based and feature-extraction based spam detection methods become less effective. We designed a time series pattern correlation method [41] and a graph-based relational reinforcement model [43] to catch the most prevalent and crafty spam reviews.

-Privacy preserving data publishing: Privacy-preserving data publishing provides methods and tools for publishing useful information while preserving data privacy.

References:

[1] Yan Xie, Philip S. Yu, “CP-index: on the efficient indexing of large graphs”, Proceedings of the 20th ACM Conference on Information and Knowledge Management (CIKM 2011), 2011.

[2] Yan Xie, Philip S. Yu, “Max-Clique: A Top-down Graph-based Approach to Frequent Pattern Mining”, Proceedings of the 10th IEEE International Conference on Data Mining (ICDM 2010), 2010.

[3] Charu Aggarwal, Yan Xie, Philip S. Yu, “GConnect: A Connectivity Index for Massive Disk-resident Graphs”, Proceedings of the 35th International Conference on Very Large Data Bases (VLDB 2009), 2009.

[4] Charu Aggarwal, Yan Xie, Philip S. Yu, “Towards Community Detection in Locally Heterogeneous Networks”, Proceedings of the 11th SIAM International Conference on Data Mining (SDM 2011), 2011.

[5] Charu Aggarwal, Yan Xie, Philip S. Yu, “On Dynamic Link Inference in Heterogeneous Networks”, Proceedings of the 12th SIAM International Conference on Data Mining (SDM 2012), 2012.

[6] Guan Wang, Yuchen Zhao, Xiaoxiao Shi, Philip S. Yu. “Magnet Community Identification on Social Networks”, in Proceedings of the 18st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2012.

[7] Charu C. Aggarwal, Yuchen Zhao, Philip S. Yu. “On Text Clustering with Side Information”, in Proceedings of the 28th International Conference on Data Engineering, 2011.

[8] Yuchen Zhao, Xiangnan Kong, Philip S. Yu. “Positive and Unlabeled Learning for Graph Classification”, in Proceedings of the 11th IEEE International Conference on Data Mining, 2011.

[9] Charu C. Aggarwal, Yuchen Zhao, Philip S. Yu. “Outlier Detection in Graph Streams”, in Proceedings of the 27th International Conference on Data Engineering, 2011.

[10] Charu C. Aggarwal, Yuchen Zhao, Philip S. Yu. “On Clustering Graph Streams”, in Proceedings of the SDM Conference, 2010.

[11] Yuchen Zhao, Charu C. Aggarwal, Philip S. Yu. “On Wavelet Decomposition of Uncertain Time Series Data Sets”, in Proceedings of the 19th ACM international conference on Information and knowledge management

[12] Charu C. Aggarwal, Yuchen Zhao, Philip S. Yu. “A Framework for Clustering Massive Graph Streams”, in Statistical Analysis and Data Mining, Volume 3, Issue 6, pages 399-416, December 2010.

[13] Charu C. Aggarwal, Yuchen Zhao, Philip S. Yu. “On the use of Side Information for Mining Text Data”, TKDE, 2012.

[14] Xiaoxiao Shi, Xiangnan Kong and Philip S. Yu. Transfer Significant Subgraphs across Graph Databases. In Proceedings of the 12th SIAM International Conference on Data Mining (SDM'12), 2012.

[15] Xiangnan Kong, Wei Fan and Philip S. Yu. Dual Active Feature and Sample Selection for Graph Classification. In Proceedings of the 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'11), 2011.

[16] Xiangnan Kong and Philip S. Yu. gMLC: a Multi-label Feature Selection Framework for Graph Classification. Knowledge and Information Systems (KAIS), 2011.

[17] Xiangnan Kong and Philip S. Yu. Multi-label Feature Selection for Graph Classification. In Proceedings of the 10th IEEE International Conference on Data Mining (ICDM'10), 2010.

[18] Xiangnan Kong and Philip S. Yu. Semi-supervised Feature Selection for Graph Classification. In Proceedings of the 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'10), 2010.

[19] Chuan Shi, Chong Zhou, Xiangnan Kong, Philip Yu, Gang Liu. HeteRecom: A Semantic Recommendation System in Heterogeneous Networks. In Proceedings of the 18st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'12) (system demo), 2012.

[20] Wangqun Lin, Xiangnan Kong, Philip Yu, Quanyuan Wu, Yan Jia and Chuan Li. Community Detection in Incomplete Information Networks. In Proceedings of the 21st International World Wide Web Conferences (WWW'12), 2012.

[21] Chuan Shi, Xiangnan Kong, Philip S. Yu, Sihong Xie, and Bin Wu. Relevance Search in Heterogeneous Networks. In Proceedings of the 15th International Conference on Extending Database Technology (EDBT'12), 2012.

[22] Chuan Shi, Xiangnan Kong, Philip S. Yu, and Bai Wang. Multi-Objective Multi-Label Classification. In Proceedings of the 12th SIAM International Conference on Data Mining (SDM'12), 2012.

[23] Xiangnan Kong and Philip S. Yu. An Ensemble-based Approach to Fast Classification of Multi-label Data Streams. In Proceedings of the 7th IEEE International Conference on Collaborative Computing: Networking, Application and Worksharing (CollaborateCom '11), 2011.

[24] Chuan Shi, Xiangnan Kong, Philip S. Yu and Bai Wang. Multi-label Ensemble Learning. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD'11), 2011.

[25] Xiangnan Kong, Xiaoxiao Shi and Philip S. Yu. Multi-label Collective Classification. In Proceedings of the 11th SIAM International Conference on Data Mining (SDM'11), 2011.

[26] Xiangnan Kong, Michael K. Ng and Zhi-Hua Zhou. Transductive Multi-label Learning via Label Set Propagation. IEEE Transactions on Knowledge and Data Engineering (TKDE), 2011.

[27] Xiangnan Kong and Philip S. Yu. gMLC: a Multi-label Feature Selection Framework for Graph Classification. Knowledge and Information Systems (KAIS), 2011.

[28] Xiangnan Kong and Philip S. Yu. Multi-label Feature Selection for Graph Classification. In Proceedings of the 10th IEEE International Conference on Data Mining (ICDM'10), 2010.

[29] Xiaoxiao Shi, Jean-Francois Paiement, David Grangier, and Philip S. Yu, " Learning from Heterogeneous Sources via Gradient Boosting Consensus", In Proceedings of the 12th SIAM International Conference on Data Mining (SDM'12), 2012.

[30] Xiaoxiao Shi, Qi Liu, Wei Fan, and Philip S. Yu, "Transfer across Completely Different Feature Spaces via Spectral Embedding", accepted by Transactions on Knowledge and Data Engineering (TKDE).

[31] Xiaoxiao Shi, Yao Li, and Philip S. Yu, " Collective Prediction with Latent Graphs", ACM Conference on Information and Knowledge Management (CIKM'11), 2011.

[32] Xiaoxiao Shi, Wei Fan, Jianping Zhang, and Philip S. Yu, " Discovering Shaker from Evolving Entities via Cascading Graph Inference", 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'11), 2011.

[33] Xiaoxiao Shi and Philip S. Yu, " Limitations of Matrix Completion via Trace Norm Minimization", SIGKDD Explorations, 12(2):16-20, 2011.

[34] Xiaoxiao Shi, Wei Fan, and Philip S. Yu, " Efficient Semi-supervised Spectral Co-clustering with Constraints", 2010 IEEE International Conference on Data Mining (ICDM'10), 2010.

[35] Xiaoxiao Shi, Qi Liu, Wei Fan, Philip S. Yu, and Ruixin Zhu, " Transfer Learning on Heterogenous Feature Spaces via Spectral Transformation", 2010 IEEE International Conference on Data Mining (ICDM'10), 2010.

[36] Xiaoxiao Shi, Kevin Chang, Vijay K. Narayanan, Vanja Josifovski and Alex J. Smola, "A Compression Framework for Generating User Profiles", 2010 ACM SIGIR workshop on feature generation and selection for information retrieval, 2010.

[37] Xiaoxiao Shi, Qi Liu, Wei Fan, Qiang Yang and Philip S. Yu, " Predictive Modeling with Heterogeneous Sources", 2010 SIAM International Conference on Data Mining (SDM'10), 2010.

[38] Xiaoxiao Shi, Wei Fan, Qiang Yang and Jiangtao Ren, " Relaxed Transfer of Different Classes via Spectral Partition", 2009 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD09), 2009.

[39] Xiaoxiao Shi, Wei Fan, and Jiangtao Ren, " Actively Transfer Domain Knowledge", 2008 European Confernce on Machine Learning and Principles and Practices of Knowledge Discovery in Databases (ECML/PKDD08), 2008.

[40] Jiangtao Ren, Xiaoxiao Shi, Wei Fan, and Philip S. Yu " Type Independent Correction of Sample Selection Bias via Structural Discovery and Re-balancing", 2008 SIAM International Conference on Data Mining (SDM'08), 2008.

[41] Sihong Xie, Guan Wang, Shuyang Lin, and Philip S. Yu, “Review Spam Detection via Time Series Pattern Discovery”, KDD'12, 2012

[42] Sihong Xie, Guan Wang, Shuyang Lin, Philip S. Yu, “Review spam detection via time series pattern discovery”, WWW (Companion Volume) 2012: 635-636

[43] Guan Wang, Sihong Xie, Bing Liu, and Philip S. Yu, “Review Graph based Online Store Review Spammer Detection”, ICDM'11, 2011.

[44] Guan Wang, Sihong Xie, Bing Liu, and Philip S. Yu, “Identify Online Store Review Spammers via Social Review Graph”, ACM Transactions on Intelligent Systems and Technology (TIST'11).

[45] Sihong Xie, Wei Fan, Philip S.Yu, "An Iterative and Re-weighting Framework for Rejection and Uncertainty Resolution in Crowdsourcing", In: Proceedings of SIAM International Conference on Data Mining (SDM'12), Anaheim, California, USA

[46] Sihong Xie, Wei Fan, Olivier Verscheure, and Jiangtao Ren, "Efficient and Numerically Stable Sparse Learning", 2010 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databaes (ECML/PKDD'2010), Barcelona, Spain

[47] Sihong Xie, Wei Fan, Jing Peng, Olivier Verscheure, and Jiangtao Ren, "Latent Space Domain Transfer between High Dimensional Overlapping Distributions", 2009 18th International World Wide Web Conference (WWW'09),Madrid, Spain

[48] Erheng Zhong, Sihong Xie, Wei Fan, Jiangtao Ren, Jing Peng, and Kun Zhang, "Graph-based Iterative Hybrid Feature Selection", 2008 IEEE International Conference on Data Mining (ICDM'08), Pisa, Italy

Topic revision: r7 - 2016-04-14 - 04:18:24 - Main.caobokai
 
Copyright 2016 The Board of Trustees
of the University of Illinois.webmaster@cs.uic.edu
WISEST
Helping Women Faculty Advance
Funded by NSF