Spring 2020 Update 
    
    Mar 29, 2024  
Spring 2020 Update [ARCHIVED CATALOG]

MTH (0144) 601 - Data Science


Credits: 3.00

Student learn the data design, management and manipulation tools and processes commonly used by data scientists. Students gain an overview of the basic techniques of data science, including data analysis, statistical modeling, data engineering, relational databases, manipulation of big data, algorithms for data mining, data quality, remediation, and consistency operations.

Free Note: Open only to students in the MS in Computer Science and the MS in Mathematics.

Students will:
●    Describe what Data Science is and describe the skill sets needed to be a data scientist. This will be assessed by Quiz 1 and the mid-term examination.
●    Explain in basic terms what Statistical Inference means. This will be assessed by Quiz 2 and the mid-term examination.
●    Identify probability distributions commonly used as foundations for statistical modeling. This will be assessed by Quiz 3 and the mid-term examination. 
●    Fit a model to data. This will be assessed by Quiz 3 and the mid-term examination.
●    Use a programming language to carry out basic statistical modeling and analysis. This will be assessed by all course assignments. 
●    Explain the significance of exploratory data analysis (EDA) in data science. This will be assessed by Quiz 4 and the mid-term examination.
●    Apply basic tools (plots, graphs, summary statistics) to carry out EDA. This will be assessed by Quiz 5 and the mid-term examination.
●    Describe the Data Science Process and how its components interact. This will be assessed by all course assignments. 
●    Use application program interfaces (APIs) and other tools to scrape the Web and to collect data. This will be assessed by Quiz 6 and the final examination.
●    Apply basic machine learning algorithms (Linear Regression, k-Nearest Neighbors (k-NN), k-means, Naive Bayes) to predictive modeling. This will be assessed by Quiz 7 and the final examination.
●    Explain why Linear Regression and k-nearest neighbors algorithm (k-NN) are poor choices for Filtering Spam. This will be assessed by Quiz 8 and the final examination.
●    Explain why Naive Bayes is a better alternative. This will be assessed by Quiz 9 and the final examination.
●    Identify and explain fundamental mathematical and algorithmic ingredients that constitute a Recommendation Engine (dimensionality reduction, singular value decomposition, principal component analysis). This will be assessed by Quiz 10 and the final examination.
●    Explain and Describe ethical and privacy issues in data science conduct and apply ethical practices. This will be assessed by all course assessments.