DATA 601 Introduction to Data Science

Description: The goal of this class is to give students an introduction to and hands-on experience with all phases of the data science process using real data and modern tools. Topics that will be covered include data formats, loading, and cleaning;  statistical and exploratory data analysis using Python; basics of data visualization; and some fundamental ethical issues in data science.

Prerequisite: Students must be enrolled in the Data Science Program. Other students may be admitted with instructor permission. Students are expected to have experience with python programming.
Course Learning Objectives: Upon completion, students will
  • Understand issues relating to acquisition, cleaning and loading of data,
  • Be able to perform exploratory data analysis using Python,
  • Understand the basics of how data can be presented and visualized,
  • Become familiar with some very fundamental ethical issues in data science.
Optional Texts (not required)
  • “Python Data Science Handbook” by Jake VanderPlas. O’Reilly Media
  • “Data Wrangling with Python: Tips and Tools to Make Your Life Easier” by Jacqueline Kazil and Katharine Jarmul. O’Reilly Media
  • “Think Like a Data Scientist: Tackle the data science process step-by-step” by Brian Godsey. Manning Publications

Please review options at the UMBC library. You do not need to buy any books.

Recommended Software and Hardware: All software used in this course is free.
  • Web browser capable of running Jupyter Notebooks.
  • Docker for running containerized applications.
  • VirtualBox for running virtual computers.
  • A laptop. Electrical outlets are available in the classroom. UMBC Wi-Fi is available.

Course Format and Assignments

  • Students will complete assigned homework, readings, essays, quizzes, two projects, and a final project. This course incorporates a variety of hands-on labs and practical exercises to engage students and prepare them for challenges they may encounter in the workplace.
  • Students will occasionally present their solutions to homework assignments in class. Projects will also involve presentations.
  • The final project will provide students an opportunity to showcase what they have learned in a format similar to what they will encounter in a professional work setting.

Tentative Syllabus

  • Week 1 – Course overview and introduction to data science and Python
  • Week 2 – Basic python programming
  • Week 3 – Introduction to Numpy
  • Week 4 – Introduction to Pandas and data-frames
  • Week 5 – Object-oriented programming and automation
  • Week 6 – Data loading, cleaning, summarization
  • Week 7 – Data aggregation and transformation
  • Week 8 – Data visualization
  • Week 9 – Review of basics statistics
  • Week 10 – Statistical and exploratory data analysis and outlier detection
  • Week 11 – Linear Algebra Review
  • Week 12 – Linear and Logistic Regression
  • Week 13 – Feature Selection
  • Week 14 – Data Ethics
  • Week 15 – Project presentations