DS440 Data Mining / Fall 2022
Course Instructor: Prashant Shekhar, PhD
Contact:
- Email: prashant.shekhar@erau.edu
- Office: Bldg COAS Rm. 301.26
Updates
- New Lecture is up: Lecture 31 K-Means Clustering [Notebook]
- New Lecture is up: Lecture 27 Ensemble models 2 [Notebook]
- New Lecture is up: Lecture 26 Ensemble models [Notebook]
- New Lecture is up: Lecture 25 K-Nearest Neighbor Classifier [Notebook]
- New Lecture is up: Lecture 24 Classifier Comparison [Notebook]
- New Lecture is up: Lecture 23 Classifier Comparison [Notebook]
- New Assignment released: [Homework 3]
Class Details
- Class time: MWF 10:00 AM to 10:50 AM
- Class venue: Bldg COAS Rm. 304
- Office hours: MWF 12:00 PM to 1:00 PM
- Office hours Venue: Bldg COAS Rm. 301.26
- Course description (pdf)
- Tentative schedule (pdf)
Course Description
The goal of this course is to learn how to use the advanced mathematics language and computation tools to solve real-world problems. The topics of the course cover broad interdisciplinary problems whose solutions heavily depend on data mining and visualization. Students will gain hands-on experience on how to use Python based software tools to analyze large data sets. Broadly, major topics covered in this course include
- Introduction to data mining
- Python for numerical and scientific computations
- Supervised learning methods
- Unsupervised learning methods
The concepts that you learn in this course can be utilized to solve problems in the general area of machine learning and data science. Starting from the basics of data mining, we will go deeper into concepts of classification, covering multiple widely used algorithms that find immense applications in a wide number of fields from medical to engineering. The remaining course addresses the unsupervised component of the data mining domain. The concepts learnt in this section would be useful for finding and analyzing patterns in data which don’t have any labels (classes) due to various practical constraints such as data collection cost etc.
Text Book
The study material for the course would be provided to the students in the form of jupyter notebooks, pdfs and handwritten notes. Additionally, students are encouraged to refer the textbooks menntioned below for a much deeper understanding
- Main Text: Ping-Ning Tan, et al., Introduction to Data Mining, Pearson Education, second edition, 2019
- Additional References :
- James, Gareth, et al. An introduction to statistical learning, with Applications in R, Second Edition, 2021.
- Murphy, Kevin P. Machine learning: a probabilistic perspective. MIT press, 2012.
- Python and Machine Learning: Aurelien Geron, Hands-on Machine Learning with Scikit-learn, Keras & TensorFlow. O’Reilly, second edition, September 2019.
Attendance
I will try to take attendance in every class and I encourage you to participate in class activities. This is because attendance is found to be heavily correlated with the course grade and attending class everyday ensures that you will not miss any important announcement.
Grading
Your course grade will be determined as follows:
- Homeworks: 40%
- Tests: 20%
- Class participation and attendance: 10%
- Project: 30%
The grading is expected to follow the standard scale
- A: 90% - 100%
- B: 80% - 89.5%
- C: 70% - 79.5%
- D: 60% - 69.5%
- F: <60%
However, based on the performance of the entire class, I might curve the grading scale later.
Test
You will have one main test (tentative date: 16th Nov). Make-ups on the test may be allowed only for valid extenuating circumstances when I am informed before the test takes place – please see me about conflicts as soon as they occur.
Project and Presentation
During the semester, you will be supervised to work on a project which combines classroom materials and real-world applications. The project together with the presentation is the final deliverable for the course. It is supposed to be a group project with teams consisting of 2-4 students. I will work with each of the team separately to identify a topic of your interest and find a relevant project in that domain. In case you are already working on a research problem related to the topics discussed in class, that can also be considered. I will announce project guidelines and rubric in due course.
Homeworks
Your homework grade will be determined based on 4 programming oriented homeworks . You are required to use Python (particularly Jupyter notebooks) to solve homework problems. These homework problems will test the ability of the students to apply the concepts learnt in class to real-life problems.
Academic Integrity
Embry-Riddle Aeronautical University maintains high standards of academic honesty and integrity in higher education. To preserve academic excellence and integrity, the University prohibits academic dishonesty in any form, including, but not limited to, cheating and plagiarism. More specific definitions of these violations and their consequences are described in detail in the Dean of Students’ Honor Codes and Student Policies.
Disability Services
DSS Administration Office: Bldg 500; Contact: (386) 226-7916; email: dbdss@erau.edu
- Student Disability Services: Students with disabilities who believe that they may need accommodations in this class are encouraged to contact the Office of Disability Services. Professors cannot make appropriate disability accommodations. Students are encouraged to register with DSS at the beginning of the term to better ensure that such accommodations are implemented in a timely fashion. Accommodations are not granted until official notice is received from DSS.
- It is the responsibility of the student to notify DSS the date and time of test once s/he has been made aware of the scheduled test. DSS requires: (a) 2 business days minimum notification for tests and quizzes and (b) 5 business days minimum notification for final exams. Professors cannot make appropriate testing modification without notification from DSS.