Description: This course provides a broad introduction to the practical side of machine-learning and data analysis. Topics covered include supervised vs. unsupervised learning, decision trees, logistic regression, linear discriminant analysis, linear and non-linear regression, and support vector machines. An introduction to neural networks is provided towards to end of the class.
Prerequisite: Students must be enrolled in the Data Science Program. Other students may be admitted with instructor permission.
Course Learning Objectives: Upon completion, students will
- Understand conceptually the basics of machine learning like hypothesis space, probability, classifier, dimensionality reduction, and cross validation.
- Be introduced to basic unsupervised learning methods, such as clustering.
- Learn key supervised learning techniques including decision trees, linear and logistic regression, Bayesian classifiers, and support vector machines.
- Be introduced to neural networks, deep learning, and reinforcement learning.
- Apply the learned techniques to some analytics problem through a project.
References
- Introduction to Machine Learning with Python, A Guide for Data Scientists by Andreas C. Müller and Sarah Guido (2016)
- Hands on Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools and Techniques to Build Intelligent Systems by Aurelien Geron
Recommended Software
The course will be using Python 3 with the following libraries: numpy, sklearn, pandas, matplotlib, Jupyter. If you’d like to have the environment installed locally, Anaconda is a Python distribution that has all required libraries. The recommended option is to use Google’s Colab which is available from your UMBC account.
Course Format and Assignments
Students will complete 3-5 homework assignments, a semester-long project, a midterm, and a final exam. The assignments will give students an opportunity to gain practical insights with specific machine learning methods. The project will give students the opportunity to practice the whole life-cycle and processing pipeline of machine learning tasks for data science applications.
Tentative Syllabus
Week 1 – Course overview: What is Machine Learning?
Week 2 – Overview: Hypotheses spaces, Linear Algebra, Probability and Statistics
Week 3 – Supervised Learning: Linear vs. Logistic Regression
Week 4 – Decision Trees and Naive Bayes
Week 5 – Model Validation: Cross-Validation, Performance Measures
Diagnosing Over/Under fitting
Week 6 – Feature Engineering: text, categorical data, binning
Week 7 – Support Vector Machines, Nearest Neighbor, Linear Discriminant Analysis
Week 8 – Bagging, Boosting and Ensemble Methods, and Random Forests
Week 9 – Experiment Design, Decisions in Model Selection, Productizing Models
Week 10 – Unsupervised learning: agglomerative, divisive, k-means, DBSCAN
Week 11 – Dimensionality reduction and visualization in principal component analysis
Week 12 – Bayesian Networks
Week 13 – Introduction to Neural Networks and Deep Learning
Week 14 – Introduction to Reinforcement Learning
Week 15 – Final Exam/Project presentations