DATA 601 Introduction to Data Science

Description: The goal of this class is to give students an introduction to and hands on experience with all phases of the data science process using real data and modern tools. Topics that will be covered include data formats, loading, and cleaning;  statistical and exploratory data analysis using Python; basics of data visualization; and an introduction scaling up for Big Data.

Prerequisite: Students must be enrolled in the Data Science Program. Other students may be admitted with instructor permission. Students are expected to have experience with python programming.
Course Learning Objectives: Upon completion, students will
  • Understand issues relating to acquisition, cleaning and loading of data,
  • Be able to perform exploratory data analysis using Python,
  • Understand the basics of how data can be presented and visualized,
  • Understand issues involved when the analysis scales up to Big Data, and
  • Become familiar with some very fundamental ethical issues in data science.
Optional Texts (not required)
  • “Python Data Science Handbook” by Jake VanderPlas. O’Reilly Media
  • “Data Wrangling with Python: Tips and Tools to Make Your Life Easier” by Jacqueline Kazil and Katharine Jarmul. O’Reilly Media
  • “Think Like a Data Scientist: Tackle the data science process step-by-step” by Brian Godsey. Manning Publications

Please review options at the UMBC library. You do not need to buy any books.

Recommended Software and Hardware: All software used in this course is free.
  • Web browser capable of running Jupyter Notebooks.
  • Docker for running containerized applications.
  • VirtualBox for running virtual computers.
  • A laptop. Electrical outlets are available in the classroom. UMBC Wi-Fi is available.

Course Format and Assignments

  • Students will complete assigned homework, readings, essays, quizzes, two projects, and a final project. This course incorporates a variety of hands-on labs and practical exercises to engage students and prepare them for challenges they may encounter in the workplace.
  • Students will occasionally present their solutions to homework assignments in class. Projects will also involve presentations.
  • The final project will provide students opportunity to showcase what they have learned in a format similar to what they will encounter in a professional work setting.

Tentative Syllabus

  • Week 1 – Course overview and introduction to data science and Python
  • Week 2 – Basic python programming
  • Week 3 – Introduction to Pandas and data-frames
  • Week 4 – Data loading, cleaning, summarization
  • Week 5 – Statistical and exploratory data analysis and outlier detection
  • Week 6 – Statistical and exploratory data analysis and outlier detection (cont.)
  • Week 7 – Data visualization
  • Week 8 – SQL, NoSQL, key/value stores
  • Week 9 – Automation
  • Week 10 – Linear Regression
  • Week 11 – Data Ethics and Legality
  • Week 12 – Cloud computing
  • Week 13 – Scaling Up
  • Week 14 – Project presentations