datamining

CS423: Data Mining (1/2017)


Class meeting: 9.30-11.00 Tuesday,Friday @ CSB 202

Lecturer: Jakramate Bootkrajang
Email: jakramate.b@cmu.ac.th
Office hours : 17.00-18.00 Tuesday,Friday @ CSB 107

Announcements
Class marks (Quiz and HWs) (as of 10 Nov) [click]
Due to numerical instability, dataset for HW2 is changed from 'biodeg.mat' to 'boston.mat'
คะแนนกลางภาค [click]
การสอบกลางภาค: 6 ตุลาคม 2560 เวลา 12.00-15.00 ห้องสอบ CSB 209
Good news! The deadline for assignment #1 has been extended to 24 Sep.
Welcome to course's homepage
New slides for Ch03-Preprocessing uploaded
Course outline
I. Introduction + Basic concepts
II. Data preprocessing
Data cleaning
Data integration
Data transformation
III. Dimensionality Reduction Techniques
PCA
Feature subset selection
Random projection
VI. Mining Association Rules and Recommendation system
Association rule mining
Recommendation systems
V. Classification
Bayesian Learning
Linear Discriminant Analysis
Logistic Regression
K-nearest neighbours
Classifier evaluation
VI. Clustering
Partitioning clustering -- k-means, k-medoids
Hierarchical clustering -- agglomerative, divisive

Class schedule

DateLecture (Tue)DateLecture (Fri)
8 Aug First hour 11 Aug Introduction [Slides]
15 Aug Basic concept 1 [Slides] 18 Aug สัปดาห์วันวิทยาศาสตร์
22 Aug Basic concept 2 25 Aug Data preprocessing 1 [Slides]
29 Aug Data preprocessing 2 Supplementary reading 1 Sep Dimensionality reduction (PCA) [Slides]
5 Sep Dimensionality reduction (PCA) [Slides] 8 Sep PCA Lab [Slides] [Dataset]
12 Sep Feature subset selection and Random projection [Slides] 15 Sep Association rules [Slides] [Original paper]
19 Sep Content-based recommendation [Slides] 22 Sep Collaborative filtering [Slides]
26 Sep Recommendation system lab 29 Sep ทบทวนก่อนสอบ
3 Oct สอบกลางภาค 6 Oct สอบกลางภาค
10 Oct Bayesian learning 1 [Slides] 13 Oct วันหยุด
17 Oct Bayesian learning 2 [Slides] 20 Oct Linear Discriminant [Slides]
24 Oct Discriminant Analysis [Slides] 27 Oct Logistic regression [Slides]
30 Oct No class 2 Nov KNN [Slides]
6 Nov Logistic Regression Lab [data] 9 Nov Classifier evaluation [Slides]
13 Nov k-Means [Slides] 16 Nov k-Medoids
20 Nov Hierarchical clustering 1 [Slides] 23 Hierarchical clustering 2
27 Nov สอบปลายภาค 30 Nov สอบปลายภาค

Assigments
Assignment 1  Instruction  Files (Due 19 Sep 2017, Extended to 24 Sep)
Assignment 2  Instruction  Files (Due 7 Nov 2017)
Assignment 3  Instruction  Datasets (Due 26 Nov 2017)
Assignment 4  Instruction   (Due 4 Dec 2017)

Grading
Midterm 20 %
Final 30 %
Assignments 4 x 10 = 40 %
Quiz 10 %

Useful resources
Books
Data Mining, Concepts and Techniques: Jiawei Han and Micheline Kamber, MORGAN KAUFMANN.
Data Mining, Concepts, Models, Methods, and Algorithms: Mehmed Kantardzic, WILEY-INTERSCIENCE
Pattern Classification: Peter E. Hart, David G. Stork, and Richard O. Duda, WILEY
Mining From Massive Data: Anand Rajaraman, Jure Leskovec, Jeffrey D. Ullman [PDF]
Lecture note (in Thai)
Data mining [PDF]
Papars
I.Guyon and A. Elisseeff, An Introduction to Variable and Feature Selection, JMLR (2003)
CC.Aggarwal, A.Hinneburg, DA. Keim, On the Surprising Behavior of Distance Metrics in High Dimensional Space, ICDT (2001)
Online Material
Julia language
Julia language quick introduction
Distance measures summarisation
Back to Top