DATA 690 Special Topics: Introduction to Natural Language Processing

Course Description: This course aims to teach the use of natural language processing (NLP) as a set of methods for exploring and reasoning about text as data. The focus will be on the applied side of NLP. Students will use existing NLP methods and libraries in Python to textual problems. Topics include language modeling, text classification, sentiment analysis, summarization and machine translation.

Prerequisites: DATA 602.

References:

Daniel Jurafsky and James H, Martin, “Speech and Language Processing: An introduction to speech recognition, computational linguistics and natural language processing,” Prentice Hall, 2008 (2nd edition)
Christopher D. Manning and Hinrich Schütze, “Foundations of Statistical Natural Language Processing,” MIT Press, 2000
Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, “Introduction to Information Retrieval,” Cambridge University Press. 2008
Nitin Indurkhya and Fred J. Damerau editors, “Handbook of Natural Language Processing,” CRC Press, 2010 (2nd edition)

Learning Outcomes: After this course, students should be able to

Understand the key concepts of NLP for describing and analyzing language
Describe the typical problems and processing layers in NLP
Analyze NLP problems to decompose them into independent components
Choose appropriate solutions for solving typical NLP problems (tokenizing, tagging, parsing)
Assess / Evaluate NLP based systems

Tentative Schedule

Introduction to NLP
Basic text processing: Preprocessing: tokenization and segmentation; normalization of words: stemming, lemmatization, morphological analyzers; regular expressions; edit distance
N-grams, perplexity, and methods of smoothing
Language models: input prediction, error correction, speech recognition, text generation.
Tagging: POS tagging and named entity recognition
Hidden Markov models and the Viterbi algorithm
Midterm Exam/Project
Text classification, Sentiment analysis, and Naive Bayes classifier
Performance measures: Accuracy, precision, recall, and F-measure
Parsing: Trees, context-free grammar, probabilistic approach to parsing, lex-icalized PCFGs, and CKY algorithm.
Machine Learning: Direct, transfer-based, interlingual, and statistical ML
Computational Semantics: Word senses and meanings; WordNet; semantic similarity measures: thesaurus-based and distributional methods.
Text Summarization: Extractive and abstractive summarization, multiple-document summarization, and query-based summarization
Unsupervised Text Summarization and Evaluation of Summarization Systems
Final Exam/Project

Graduate Data Science Programs: Information Hub

College of Engineering and Information Technology

Graduate Data Science Programs: Information Hub

DATA 690 Special Topics: Introduction to Natural Language Processing

Graduate Data Science Programs: Information Hub

Subscribe to UMBC Weekly Top Stories

I am interested in: