## Teaching - Postgraduate

Course material for ID5059 Knowledge Discovery and Data Mining, Semester 2 of 2014-15. The coursework was co-developed with Carl Donovan.

Lecture notes

- Lecture 01 - Introduction - Slides - Notes - Breiman paper
- Lecture 02 - Basis Functions - Slides - Notes - Kondor paper
- Lecture 03 - Model Fit Measures - Slides - Notes - Stokes paper
- Lecture 04 - Model Selection - Slides - Notes - Efron paper
- Lecture 05 - Tree-based Methods - Slides - Notes - Wang et al. paper
- Lecture 06 - Regression and Decision Trees - Slides - Notes- Anderson et al. paper
- Lecture 08 - Regression Trees - Slides
- Lecture 09 - GLM & GAM Regression - Slides
- Lecture 10 - Classification Trees - Slides - Notes
- Lecture 11 - Decision Tree Worked Example - Slides - Notes
- Lecture 12 - Complexity and Numerics - Slides - Notes
- Lecture 13 - Neural Nets I - Slides - Notes
- Lecture 14 - Neural Nets II - Slides - Notes
- Lecture 15 - Bayesian Classification - Slides - Notes- Hsu et al. paper- Leung slides
- Lecture 16 - Classification Evaluation & ROC - Slides - Notes
- Lecture 17 - ROC, AUC & Lift- Slides - Notes
- Lecture 18 - Bootstrapping - Slides - Notes
- Lecture 19 - Bagging - Slides - Notes
- Lecture 20 - Boosting - Slides - Notes - Freund & Schapire paper
- Lecture 21 - Support Vector Machines I - Slides - Notes
- Lecture 22 - Support Vector Machines II - Slides - Notes

Practicals

- Practical 01 - Auto MPG - Spec. - Resources - Data
- Practical 02 - Purchase Probability - Specification, due dates and tips

Tutorials

- Tutorial 01 - Titanic Survival - zip file
- Tutorial 02 - SVM with linear kernel - R code
- Tutorial 02 - another SVM with linear kernel - R code
- Tutorial 02 - SVM with radial kernel - R code

The Elements of Statistical Learning by Hastie, Tibshirani & Friedman is available from Stanford University. There are other useful resources at the same location, including sample R files.

R and Data Mining: Examples and Case Studies by Yanchang Zhao. This is the PDF of a textbook containing many worked examples (in R) together with detailed explanations of the theory and the the technicalities involved.

R material for non-geeks is available from the University of California at Davis.

As this appears not to be in the old examp paper repository, I've made available the 2010-20 MT5759 exam. **Warning!** Both the structure and content are likely to change this year; this document gives you some insight into the type of question set for Maths masters students.

Online R tutorials from Data Camp, Code School, and R Studio