## CS423: Data Mining (2/2018)Class meeting: 14.30-16.00 Tuesday,Friday @ CSB 202 Instructor: Jakramate Bootkrajang Email: jakramate.b@cmu.ac.th Office hours : 16.00-17.00 Tuesday,Friday @ CSB 107 |

- Homework checklist
- Logistic regression lab moved to Tue 9
- midterm score
- No class on 15 March 2019 due to severe PM2.5 condition
- New update for Basic concepts slides
- Welcome to course's homepage

- I. Introduction + Basic concepts
- II. Data preprocessing
- Data cleaning
- Data integration
- Data transformation
- III. Dimensionality Reduction Techniques
- PCA
- Feature subset selection
- Random projection
- VI. Mining Association Rules and Recommendation system
- Association rules and frequent subgraphs mining
- Recommendation systems
- V. Classification
- Bayesian Learning
- Linear Discriminant Analysis
- Logistic Regression
- K-nearest neighbours
- Classifier evaluation
- VI. Clustering
- Partitioning clustering -- k-means, k-medoids
- Hierarchical clustering -- agglomerative, divisive

Date | Lecture (Tue) | Date | Lecture (Fri) |

8 Jan | First hour | 11 Jan | Introduction [Slides] |

15 Jan | Basic concept 1 [Slides] | 18 Jan | Basic concept 2 |

22 Jan | Data preprocessing 1 [Slides] | 25 Jan | Data preprocessing 2 |

29 Jan | Association rules mining [Slides] [Apriori paper] |
1 Feb | Frequent subgraph mining [Slides] [FSG paper] |

5 Feb | Content-based recommendation [Slides] | 8 Feb | Collaborative filtering [Slides] |

12 Feb | Intro to Julia
Julia notebook MovieLens small MovieLens |
15 Feb | Dimensionality reduction (PCA) [Slides] |

19 Feb | วันมาฆะบูชา | 22 Feb | Dimensionality reduction (PCA) [2] |

26 Feb | PCA Lab [Slides] [Dataset] [Notebook] | 1 Mar | Feature subset selection [Slides] |

5 Mar | สอบกลางภาค 15.30:18.30 @ CSB100 | ||

12 Mar | Bayesian classification 1 [Slides] | 15 Mar | No class |

19 Mar | Bayesian classification 2 [Slides] | 22 Mar | Bayesian classification 3 [Slides] |

26 Mar | Normal Discriminant Analysis 1 [Slides] | 29 Mar | Normal Discriminant Analysis 2 [Slides] |

2 Apr | Logistic regression [Slides] | 5 Apr | KNN [Slides] and Classifier evaluation [Slides] |

9 Apr | Logistic Regression Lab [Slides] [data] |
12 Apr | วันสงกรานต์ |

16 Apr | วันสงกรานต์ | 19 Apr | k-means, k-medoids [Slides] |

23 Apr | Hierarchical clustering [Slides] | 26 Apr | Hierarchical clustering [Slides] |

13 May | สอบปลายภาค 12:00-15:00 @ CSB-TBC |

- Assignment 1 Instruction Files New classifier notebook (Due 15 Mar 2019)
- Assignment 2 Instruction Files (Due 29 Mar 2019)
- Assignment 3 Instruction (Due 19 Apr 2019)
- Assignment 4 Instruction (Due 5 May 2019)

- Midterm 20 %
- Final 30 %
- Assignments 4 x 10 = 40 %
- Quiz 10 %

- Books
- Data Mining, Concepts and Techniques: Jiawei Han and Micheline Kamber, MORGAN KAUFMANN.
- Data Mining, Concepts, Models, Methods, and Algorithms: Mehmed Kantardzic, WILEY-INTERSCIENCE
- Pattern Classification: Peter E. Hart, David G. Stork, and Richard O. Duda, WILEY
- Mining From Massive Data: Anand Rajaraman, Jure Leskovec, Jeffrey D. Ullman [PDF]
- Lecture note (in Thai)
- Data mining [PDF]
- Papars
- I.Guyon and A. Elisseeff, An Introduction to Variable and Feature Selection, JMLR (2003)
- CC.Aggarwal, A.Hinneburg, DA. Keim, On the Surprising Behavior of Distance Metrics in High Dimensional Space, ICDT (2001)
- Online Material
- Julia language
- Julia language quick introduction
- Distance measures summarisation