The research focuses on next generation data management and mining issues on big data with special attentions to heterogeneous information networks and social computing.

- As the data size continues to grow at an exponential rate, we need to develop scalable and parallelizagle algorithms to manage and mine petascale data.

- Traditional data mining focuses on record oriented data. However, many new applications are with graph oriented data, where the linkage relationships among the entities need to be captured, analyzed and mined.

- Data stream technology offers an alternative paradigm to perform continuous real-time filtering, analysis and mining of high volume data that continuously being generated and collected.

- With abundant of data being available from heterogeneous data sources often under different modelities from structural data to text data to graphic/network data to image data to auido data, etc., fusion of data provides a great opportuities to gain furher insight on knowledge discovery.

- With massive amount of text data being available on the web, scalable and fast mining techniques are needed to understand the semantics of the text, which is especially challenging in Q & A systems and short text messages.

- In the big data era, the veracity of the data increasingly becomes a concern. For example, sensor data can be noisy and incomplete, while in social network and review data, spam is a big problem.

- With the advance of medical imaging technology, such as MRI, there are new opportunities to mine brain images to detect neurological disorder at early stages. Generally speaking, health care is an emerging area of big data challenges and opportunities.

- Privacy preserving data publishing is critically needed for sharing of data.

- Social, natural, and information systems usually consist of a large number of interacting components. Examples of such systems include communication and computer systems, the Internet, biological networks, transportation systems, epidemic networks, criminal rings, and hidden terrorist networks. All the above systems share an important common feature: they are "networked systems", i.e., individual agents or components interact with a specific set of components, forming large, interconnected, and heterogeneous networks. Without loss of generality, we call such interconnected networks or systems as information networks. Clearly, information networks are ubiquitous and form a critical component of modern information infrastructure. Hidden in these networks are the answers to important questions.

Some of the main research projects are:

  • Graph and link mining
  • Heterogeneous information network mining, indexing and querying
  • Mining and fusion of heterogeneous data sources
  • Brain informatics
  • Social computing
  • Deep learning and wide fusion
  • Recommendation
  • Data stream mining
  • Text Mining
  • Transfer learning
  • Crowd sourcing
  • Smart city
  • Privacy preserving data publishing, especially on network data
  • Domain specific mining, e.g. e-commerce, health care, IOT, Q & A, mobile applications and location based services

Recent Books

1.P.S. Yu, J. Han, and C. Faloutsos, "Link Mining: Models, Algorithms and Applications", Springer, 2010.

2.B. Fung, K. Wang, A. Fu, and P.S. Yu, "Privacy-Preserving Data Publishing: Concepts and Techniques", Chapman & Hall, 2010.

3.B. Long, Z. Zhang, P.S. Yu, "Relational Data Clustering: Models, Algorithms, and Applications", Chapman & Hall, 2010.

4.H. Kargupta, J. Han, P.S. Yu, and R. Motwani, "Next Generation of Data Mining", Chapman & Hall, 2009.

5. J. Tsai, and P.S. Yu, "Machine Learning in Cyber Trust: Security, Privacy, and Reliability", Springer, 2009.

6.C. Aggarwal, and P.S. Yu, "Privacy-Preserving Data Mining: Models and Algorithms", (Advances in Database Systems), Springer, 2008.

7.L. Cao, P.S. Yu, C. Zhang, and H. Zhang, "Data Mining for Business Applications", Springer, 2008.

The projects of this group have been funded by various grants from NSF, NIH, Google Mobile 2014 Program, and MITRE.

Google Scholar Profile

H-index of Computer Science

DBLP list of publications

Copyright 2016 The Board of Trustees
of the University of
Helping Women Faculty Advance
Funded by NSF