Data Mining
Prerequisites
An upper-level undergraduate course(s) in algorithms and data structures, a basic course on probability and statistics. This is a first course on data mining and no prior knowledge of data mining or machine learning is assumed. Homework assignments will require programming in Java, which can sometimes be substituted with C++
Text book
There is no single text book for the course. For each topic, we will list the relevant chapters from various books and papers. We will make sure that you can access such books chapters. Second, we will have students scribe two page summaries of lectures in class.
Reading Material
Chapters from the following text books could be useful
- [ Bis07]
- Pattern recognition and machine learning by Christopher Bishop
- [Mit97]
- T. Mitchell. Machine Learning. McGraw-Hill, 1997.
- [ HTF]
- Hastie, Tibshirani, Friedman The elements of Statistical Learning Springer Verlag
- [ Han00]
- Data Mining: Concepts and Techniques by Jiawei Han, Micheline Kamber, Morgan Kaufmann Publishers
- [ MVS ]
- Applied Multivariate statistical analysis by Johnson and Wichern, 3rd Edition, PHI
Lecture Slides
These are old slides that are made available as is. Most of the instructions will be using the board and the material covered may be substantially different from what is there in these slides. Also, remember that it is not very useful to study from slides.- Introduction
- Data warehousing and OLAP
- Overview of mining operations
- Decision tree classifiers
- Instance-based learners
- Bayesian classifiers
- Learning hyperplanes
- Meta learning
- Classifier evaluation
- KDD Cup Case study
- Clustering
- Active learning
- Duplicate elimination
- Similarity measures
- Cohen's Similarity functions
- Min hash
- Set joins
- Sequence mining
- Hidden Markov Models
- Collaborative Filtering
- Association rule mining
- Surprising itemset mining
- Temporal itemset mining
- Feature selection methods
- Intrusion detection
- Forcasting (Prof. Bernard)
- Concluding remarks