DS540 Data Mining / Spring 2023
Course Instructor: Prashant Shekhar, PhD
Contact:
- Email: prashant.shekhar@erau.edu
- Office: Bldg COAS Rm. 301.26
Updates
- New Lecture is up: Lecture 24 K-means clustering [Notes]
- New Lecture is up: Lecture 23 Ensemble Models II (Boosting)
- New Lecture is up: Lecture 22 Ensemble Models I (Bagging) [Notes]
- New Lecture is up: Lecture 20 Support Vector Machines III
- New Lecture is up: Lecture 19 Support Vector Machines II [Notes]
- New Lecture is up: Lecture 18 Support Vector Machines [Notes]
- New Lecture is up: Lecture 17 Imbalanced Classes
Class Details
- Class time: Tu,Th 12:45 PM to 2:00 PM
- Class venue: Bldg COAS Rm. 108
- Office hours: Tu,Th 11:30 AM to 12:30 PM
- Office hours Venue: Bldg COAS Rm. 301.26
- Course description (pdf)
- Tentative schedule (pdf)
Course Description
This is a project-based course. Broadly, major topics covered in this course include:
- Approximately first 60% of the course
- Fundamentals of Data Mining
- Classification
- Remaining course
- Association
- Clustering
- Anomaly Detection
The concepts that you learn in this course can be utilized to solve problems in the general area of machine learning and data science. Starting from the basics of data mining, we will go deeper into concepts of classification, covering multiple widely used algorithms that find immense applications in a wide number of fields from medical to engineering. The remaining course addresses the unsupervised component of the data mining domain. The concepts learnt in this section would be useful for finding and analyzing patterns in data which don’t have any labels (classes) due to various practical constraints such as data collection cost etc.
Text Book
The study material for the course would be provided to the students in the form of jupyter notebooks, pdfs and handwritten notes. Additionally, students are encouraged to refer the textbooks menntioned below for a much deeper understanding
- Main Text: Ping-Ning Tan, et al., Introduction to Data Mining, Pearson Education, second edition, 2019
- Additional References :
- James, Gareth, et al. An introduction to statistical learning, with Applications in R, Second Edition, 2021.
- Murphy, Kevin P. Machine learning: a probabilistic perspective. MIT press, 2012.
- Python and Machine Learning: Aurelien Geron, Hands-on Machine Learning with Scikit-learn, Keras & TensorFlow. O’Reilly, second edition, September 2019.
In class, I will be broadly following the excellent (chapter-wise) presentation slides prepared by the main text authors and available at Link. Additionally, I will provide notes and jupyter notebooks related to concepts discussed in class.
Attendance
I will take attendance in every class. I encourage you to participate in class activities because attendance is usually found to be heavily correlated with the course grade. Additionally, a portion of the course grade depends on class participation making attendance very important.
Grading
Your course grade will be determined as follows:
- Homeworks (4): 40%
- Exams (2): 30%
- Class participation and attendance: 5%
- Project: 25%
The grading is expected to follow the standard scale
- A: 90% - 100%
- B: 80% - 89.5%
- C: 70% - 79.5%
- D: 60% - 69.5%
- F: <60%
However, based on the performance of the entire class, I might curve the grading scale later.
Exam
You will have 2 exams (tentative dates mentioned in the course schedule document). Make-ups on the exam may be allowed only for valid extenuating circumstances when I am informed before the test takes place – please see me about conflicts as soon as they occur. In case you are missing an exam, it is your responsibility to schedule a makeup exam with me within one week of the actual exam date. After that makeup exam is not possible.
Project and Presentation
During the semester you will be supervised to work on a project which combines classroom materials and real-world applications. It is supposed to be a group project and I will work with each group separately to identify a topic of your interest and find a relevant project in that domain. I will announce project topics, guidelines, and rubric soon.
Homework
Your homework grade will be determined based on 4 programming oriented homeworks . You are required to use Python (Jupyter notebooks) to solve homework problems. These homework problems will test the ability of the students to apply the concepts learnt in class to real-life problems. Please note that homework submissions are only acceptable on canvas and not acceptable on email. The course will implement the following late submission policy:
- Late by less than 1 day, i.e. 24 hours: -20 points
- Late between 1 day and 2 days: -40 points
- Late between 2 day and 3 days: -60 points
- Late between 3 day and 4 days: -80 points
- Late by greater than 4 days: Not acceptable
Academic Integrity
Embry-Riddle Aeronautical University maintains high standards of academic honesty and integrity in higher education. To preserve academic excellence and integrity, the University prohibits academic dishonesty in any form, including, but not limited to, cheating and plagiarism. More specific definitions of these violations and their consequences are described in detail in the Dean of Students’ Honor Codes and Student Policies.
Disability Services
DSS Administration Office: Bldg 500; Contact: (386) 226-7916; email: dbdss@erau.edu
- Student Disability Services: Students with disabilities who believe that they may need accommodations in this class are encouraged to contact the Office of Disability Services. Professors cannot make appropriate disability accommodations. Students are encouraged to register with DSS at the beginning of the term to better ensure that such accommodations are implemented in a timely fashion. Accommodations are not granted until official notice is received from DSS.
- It is the responsibility of the student to notify DSS the date and time of test once s/he has been made aware of the scheduled test. DSS requires: (a) 2 business days minimum notification for tests and quizzes and (b) 5 business days minimum notification for final exams. Professors cannot make appropriate testing modification without notification from DSS.