DATA 602 Introduction to Data Analysis and Machine Learning

Description: This course provides a broad introduction to the practical side of machine-learning and data analysis. Topics covered include supervised vs. unsupervised learning, decision trees, logistic regression, linear discriminant analysis, linear and non-linear regression, and support vector machines. An introduction to neural networks is provided towards to end of the class.

Prerequisite: Students must be enrolled in the Data Science Program. Other students may be admitted with instructor permission.

Course Learning Objectives: Upon completion, students will

Understand conceptually the basics of machine learning like hypothesis space, probability, classifier, dimensionality reduction, and cross validation.
Be introduced to basic unsupervised learning methods, such as clustering.
Learn key supervised learning techniques including decision trees, linear and logistic regression, Bayesian classifiers, and support vector machines.
Be introduced to neural networks, deep learning, and reinforcement learning.
Apply the learned techniques to some analytics problem through a project.

References

Introduction to Machine Learning with Python, A Guide for Data Scientists by Andreas C. Müller and Sarah Guido (2016)
Hands on Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools and Techniques to Build Intelligent Systems by Aurelien Geron

Recommended Software
The course will be using Python 3 with the following libraries: numpy, sklearn, pandas, matplotlib, Jupyter. If you’d like to have the environment installed locally, Anaconda is a Python distribution that has all required libraries. The recommended option is to use Google’s Colab which is available from your UMBC account.

Course Format and Assignments
Students will complete 3-5 homework assignments, a semester-long project, a midterm, and a final exam. The assignments will give students an opportunity to gain practical insights with specific machine learning methods. The project will give students the opportunity to practice the whole life-cycle and processing pipeline of machine learning tasks for data science applications.

Tentative Syllabus
Week 1 – Course overview: What is Machine Learning?

Week 2 – Overview: Hypotheses spaces, Linear Algebra, Probability and Statistics

Week 3 – Supervised Learning: Linear vs. Logistic Regression

Week 4 – Decision Trees and Naive Bayes

Week 5 – Model Validation: Cross-Validation, Performance Measures

Diagnosing Over/Under fitting

Week 6 – Feature Engineering: text, categorical data, binning

Week 7 – Support Vector Machines, Nearest Neighbor, Linear Discriminant Analysis

Week 8 – Bagging, Boosting and Ensemble Methods, and Random Forests

Week 9 – Experiment Design, Decisions in Model Selection, Productizing Models

Week 10 – Unsupervised learning: agglomerative, divisive, k-means, DBSCAN

Week 11 – Dimensionality reduction and visualization in principal component analysis

Week 12 – Bayesian Networks

Week 13 – Introduction to Neural Networks and Deep Learning

Week 14 – Introduction to Reinforcement Learning

Week 15 – Final Exam/Project presentations

Graduate Data Science Programs: Information Hub

College of Engineering and Information Technology

Graduate Data Science Programs: Information Hub

DATA 602 Introduction to Data Analysis and Machine Learning

Graduate Data Science Programs: Information Hub

Subscribe to UMBC Weekly Top Stories

I am interested in: