The research focuses on next generation data management and mining issues on big data with special attentions to heterogeneous information networks and social computing. ·As the data size continues to grow at an exponential rate, we need to develop scalable algorithms to manage and mine petascale data. ·Traditional data mining focuses on record oriented data. However, many new applications are with graph oriented data, where the linkage relationships among the entities need to be captured, analyzed and mined.·Data stream technology offers an alternative paradigm to perform continuous real-time filtering, analysis and mining of high volume data that continuously being generated and collected.·Privacy preserving data publishing is critically needed for sharing of data.·Clouding computing platform offers a new computing platform to manage and mine data.·Social, natural, and information systems usually consist of a large number of interacting components. Examples of such systems include communication and computer systems, the Internet, biological networks, transportation systems, epidemic networks, criminal rings, and hidden terrorist networks. All the above systems share an important common feature: they are networked systems, i.e., individual agents or components interact with a specific set of components, forming large, interconnected, and heterogeneous networks. Without loss of generality, we call such interconnected networks or systems as information networks. Clearly, information networks are ubiquitous and form a critical component of modern information infrastructure.Hidden in these networks are the answers to important questions.Some of the main research projects are:
Graph and Link mining
Heterogeneous information network mining, indexing and querying
OLAP on information network
Data stream mining
Mining heterogeneous data sources
Privacy preserving data publishing, especially on network data
Social computing
Transfer learning
Mining under Cloud computing
Domain specific mining
Recent Books:
1.P.S. Yu, J. Han, and C. Faloutsos, "Link Mining: Models, Algorithms and Applications", Springer, 2010.2.B. Fung, K. Wang, A. Fu, and P.S. Yu, "Privacy-Preserving Data Publishing: Concepts and Techniques", Chapman & Hall, 2010.3.B. Long, Z. Zhang, P.S. Yu, "Relational Data Clustering: Models, Algorithms, and Applications", Chapman & Hall, 2010.4.H. Kargupta, J. Han, P.S. Yu, and R. Motwani, "Next Generation of Data Mining", Chapman & Hall, 2009. 5. J. Tsai, and P.S. Yu, "Machine Learning in Cyber Trust: Security, Privacy, and Reliability", Springer, 2009.6.C. Aggarwal, and P.S. Yu, "Privacy-Preserving Data Mining: Models and Algorithms", (Advances in Database Systems), Springer, 2008.7.L. Cao, P.S. Yu, C. Zhang, and H. Zhang, "Data Mining for Business Applications", Springer, 2008. Funding: The projects of this group have been funded by various grants from NSF, NIH, Google Mobile 2014 Program, and MITRE.