Description: The goal of this class is to give students an introduction to and hands-on experience with all phases of the data science process using real data and modern tools. Topics that will be covered include data formats, loading, and cleaning; statistical and exploratory data analysis using Python; basics of data visualization; and some fundamental ethical issues in data science.
Prerequisite: Students must be enrolled in the Data Science Program. Other students may be admitted with instructor permission. Students are expected to have experience with python programming.
Course Learning Objectives: Upon completion, students will
- Understand issues relating to acquisition, cleaning and loading of data,
- Be able to perform exploratory data analysis using Python,
- Understand the basics of how data can be presented and visualized,
- Become familiar with some very fundamental ethical issues in data science.
Optional Texts (not required)
- “Python Data Science Handbook” by Jake VanderPlas. O’Reilly Media
- “Data Wrangling with Python: Tips and Tools to Make Your Life Easier” by Jacqueline Kazil and Katharine Jarmul. O’Reilly Media
- “Think Like a Data Scientist: Tackle the data science process step-by-step” by Brian Godsey. Manning Publications
Please review options at the UMBC library. You do not need to buy any books.
Recommended Software and Hardware: All software used in this course is free.
- Web browser capable of running Jupyter Notebooks.
- Docker for running containerized applications.
- VirtualBox for running virtual computers.
- A laptop. Electrical outlets are available in the classroom. UMBC Wi-Fi is available.
Course Format and Assignments
- Students will complete assigned homework, readings, essays, quizzes, two projects, and a final project. This course incorporates a variety of hands-on labs and practical exercises to engage students and prepare them for challenges they may encounter in the workplace.
- Students will occasionally present their solutions to homework assignments in class. Projects will also involve presentations.
- The final project will provide students an opportunity to showcase what they have learned in a format similar to what they will encounter in a professional work setting.
Tentative Syllabus
- Week 1 – Course overview and introduction to data science and Python
- Week 2 – Basic python programming
- Week 3 – Introduction to Numpy
- Week 4 – Introduction to Pandas and data-frames
- Week 5 – Object-oriented programming and automation
- Week 6 – Data loading, cleaning, summarization
- Week 7 – Data aggregation and transformation
- Week 8 – Data visualization
- Week 9 – Review of basics statistics
- Week 10 – Statistical and exploratory data analysis and outlier detection
- Week 11 – Linear Algebra Review
- Week 12 – Linear and Logistic Regression
- Week 13 – Feature Selection
- Week 14 – Data Ethics
- Week 15 – Project presentations