DATA 604 Data Management

Description: This course is specifically designed to support the range of complex data challenges Data Practitioners face today from optimizing relational database systems to managing big data. Students will get an overview of relational database management systems, SQL programming, and emerging Big Data NoSQL database technologies.

Prerequisite: Students must be enrolled in the Data Science Program. Other students may be admitted with instructor permission.

Course Learning Objectives: Upon completion, students will learn:

  • the fundamentals of relational and big data database systems
  • key concepts related to database design and administration
  • how to acquire and store both structured and unstructured data
  • how to write SQL queries to transform and process data
  • fundamentals of data warehouse design for relational and big data systems

References:

  • MDM Modern Database Management, 13th Edition, Hoffer, Venkataraman & Topi ISBN 780134773650. This is only available as an e-book at present. You can order the e-book from http://www.mypearsonstore.com/bookstore/modern-database-management-9780134773650
  • TSQL T-SQL Fundamentals, 3rd Edition Itzik Ben-Gan ISBN 1-5093-0200-X
  • PPPC Python Parallel Programming Cookbook, 2015, Giancarlo Zaccone ISBN 9781785289583

Recommended Software and Hardware

  • SQL Server 2017 Express Edition https://www.microsoft.com/en-us/sql-server/sql-server-editions-express
  • Management Studio https://docs.microsoft.com/en-us/sql/ssms/download-sql-server-management-studio-ssms?view=sql-server-2017
  • Cloudera Quickstart Sandbox https://www.cloudera.com/downloads/quickstart_vms/5-13.html
  • Due to the size of the sandbox, this software shall be provided on USB drives to each student. These drives should be returned to the instructor once the sandbox is installed.
  • Anaconda Python https://www.anaconda.com/download/

Course Format and Assignments: The students will complete three homework, seven online learning labs, a midterm exam and a team project. This course incorporates a variety of hands-on labs and practical exercises to engage students and prepare them for challenges they may encounter in the workplace. Topics include relational and big data systems, writing SQL queries, and using distributed/parallel programming methods to optimize data processing.

The team project will provide students to showcase what they have learned in a team format that is similar to what one will encounter in a professional work setting.

Tentative Syllabus

  • Introduction & History of Databases
  • Computer Setup – SQL Express and Management Studio
  • Database Modeling
  • Database Administration
  • SQL Programming
  • Database connectivity
  • ETL
  • Reporting and Visualization
  • Data Warehouse and OLAP
  • Parallel and GPU Computing
  • Pycuda
  • Big Data and Cloud Computing
  • Hadoop Ecosystem
  • MPP and NoSQL Databases
  • Big Data Warehouse: MapReduce, Tez, and Spark
  • Team Project Preparation