datamining

CS423: Data Mining (2/2018)


Class meeting: 14.30-16.00 Tuesday,Friday @ CSB 202

Instructor: Jakramate Bootkrajang
Email: jakramate.b@cmu.ac.th
Office hours : 16.00-17.00 Tuesday,Friday @ CSB 107

Announcements
Homework checklist
Logistic regression lab moved to Tue 9
midterm score
No class on 15 March 2019 due to severe PM2.5 condition
New update for Basic concepts slides
Welcome to course's homepage
Course outline
I. Introduction + Basic concepts
II. Data preprocessing
Data cleaning
Data integration
Data transformation
III. Dimensionality Reduction Techniques
PCA
Feature subset selection
Random projection
VI. Mining Association Rules and Recommendation system
Association rules and frequent subgraphs mining
Recommendation systems
V. Classification
Bayesian Learning
Linear Discriminant Analysis
Logistic Regression
K-nearest neighbours
Classifier evaluation
VI. Clustering
Partitioning clustering -- k-means, k-medoids
Hierarchical clustering -- agglomerative, divisive

Class schedule

DateLecture (Tue)DateLecture (Fri)
8 Jan First hour 11 Jan Introduction [Slides]
15 Jan Basic concept 1 [Slides] 18 Jan Basic concept 2
22 Jan Data preprocessing 1 [Slides] 25 Jan Data preprocessing 2
29 Jan Association rules mining [Slides]
[Apriori paper]
1 Feb Frequent subgraph mining [Slides]
[FSG paper]
5 Feb Content-based recommendation [Slides] 8 Feb Collaborative filtering [Slides]
12 Feb Intro to Julia
Julia notebook
MovieLens small
MovieLens
15 Feb Dimensionality reduction (PCA) [Slides]
19 Feb วันมาฆะบูชา 22 Feb Dimensionality reduction (PCA) [2]
26 Feb PCA Lab [Slides] [Dataset] [Notebook] 1 Mar Feature subset selection [Slides]
5 Mar สอบกลางภาค 15.30:18.30 @ CSB100
12 Mar Bayesian classification 1 [Slides] 15 Mar No class
19 Mar Bayesian classification 2 [Slides] 22 Mar Bayesian classification 3 [Slides]
26 Mar Normal Discriminant Analysis 1 [Slides] 29 Mar Normal Discriminant Analysis 2 [Slides]
2 Apr Logistic regression [Slides] 5 Apr KNN [Slides] and Classifier evaluation [Slides]
9 Apr Logistic Regression Lab [Slides]
[data]
12 Apr วันสงกรานต์
16 Apr วันสงกรานต์ 19 Apr k-means, k-medoids [Slides]
23 Apr Hierarchical clustering [Slides] 26 Apr Hierarchical clustering [Slides]
13 May สอบปลายภาค 12:00-15:00 @ CSB-TBC

Assigments
Assignment 1  Instruction  Files New classifier notebook   (Due 15 Mar 2019)
Assignment 2  Instruction  Files (Due 29 Mar 2019)
Assignment 3  Instruction (Due 19 Apr 2019)
Assignment 4  Instruction   (Due 5 May 2019)

Grading
Midterm 20 %
Final 30 %
Assignments 4 x 10 = 40 %
Quiz 10 %

Useful resources
Books
Data Mining, Concepts and Techniques: Jiawei Han and Micheline Kamber, MORGAN KAUFMANN.
Data Mining, Concepts, Models, Methods, and Algorithms: Mehmed Kantardzic, WILEY-INTERSCIENCE
Pattern Classification: Peter E. Hart, David G. Stork, and Richard O. Duda, WILEY
Mining From Massive Data: Anand Rajaraman, Jure Leskovec, Jeffrey D. Ullman [PDF]
Lecture note (in Thai)
Data mining [PDF]
Papars
I.Guyon and A. Elisseeff, An Introduction to Variable and Feature Selection, JMLR (2003)
CC.Aggarwal, A.Hinneburg, DA. Keim, On the Surprising Behavior of Distance Metrics in High Dimensional Space, ICDT (2001)
Online Material
Julia language
Julia language quick introduction
Distance measures summarisation
Back to Top