联系我们: 手动添加方式: 微信>添加朋友>企业微信联系人>13262280223 或者 QQ: 1483266981
Syllabus New York University Tandon School of Engineering Computer Science & Engineering Course Outline CS-GY 6923 Machine Learning Spring 2022 1/24/2022 – 5/9/2022 Raman Kannan Monday 8:00 PM Zoom Call To contact professor: rk1750@nyu.edu Office hours: Monday 4:00 – 6:00 by appointment ONLY Course Pre-requisites Course Description This course is an introduction to the field of machine learning, covering fundamental techniques for classification, regression, dimensionality reduction, clustering, and model selection. There are three parts to this course, in Part 1 (EDA): students will choose any dataset with 100K observations 15 or more dimensions suitable for classification and perform comprehensive exploratory data analysis. Students will write an elaborate report on the dataset and its suitability for further analysis using classification techniques. In Part 2 (Classification) students will use any three (or more) single classifiers and write a critical review and analysis of the performance and plausible causes for the differences observed. And in Part 3 (Ensemble), students will use three of the many ensemble methods to improve the performance and write a report comparing the performance of the individual classifiers in part 2 and the meta classifiers in part 3. Throughout the 14 week semester students will submit a weekly report enumerating progress made, problems encountered and plan for the following week. Instructor will introduce many supervised techniques, optimization methods, and unsupervised methods and several ensemble techniques. Instructor will share working R implementations and students are challenged to recast them in reusable and parallel versions for the three deliverables in part 1,2&3. The EDA report is due the last week of February. The comparative classifier report is due the last week of March. The ensemble method report is due the last week of April. There will be two tests one in 3rd week of March and the 3rd week of April. 1 Course Objectives Students are expected to attain 1. conceptual understanding of both Supervised/Unsupervised Learning Techniques. Understand the statistical/algebraic foundation of these techniques, relative strengths and weakness, theoretical and practical criteria in adopting a model. 2. Understanding the process discipline: collect, describe, model, explore and verify data. 3. Engineering. Use industry standard environment and process to conduct repeatable and reproducible classification experiments. 4. Experimentation and Analysis: Run prescribed process to optimize model using multiple classification algorithms, evaluate them using standard performance metrics. 5. Deliver summary results of the experiments and explain key decisions they made in designing the model and model output. Course Structure This is an online course. All lectures, meetings are done using zoom accessed through brightspace.nyu.edu We meet on Mondays at 8:00 PM. Office hours on Mondays 4 to 6 PM by appointment via email by Sunday 5 PM. For participation, students have to make something original and comment on two comments made by other students before the next class. Readings URL: https://statlearning.com / AUTHORS:Trevor Hastie, Robert Tibshirani, Jerome Friedman TITLE:An Introduction to Statistical Learning An optional and recommended text URL:http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pd f AUTHORS:Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani TITLE:Elements of Statistical Learning, CODE:ESL Online resources: https://www.quora.com/Wha t – i s – th e – bes t – boo k – t o – lear n – M L Stackoverflow and other sources Green01: Required Reading 2 pages.stern.nyu.edu/~wgreene/Text/Greene-EA-7&8ed-Appendices.pdf math.nyu.edu/~cfgranda/pages/DSGA1002_fall16/material/linear_algebra.pd f https://www.cns.nyu.edu/~eero/mat h – tools/Handouts/linalg_jordan_86.pd f Fisher01: Required Reading Fisher’s Discriminant Analysis: Linear Discriminant Analysis, Multivariate QDA [ISLR: 4.6.3,4.6.4] NN:Neural Nets (any introductory material will suffice) Chapter 11 intro to ML 3rd Edition, Ethem Alpaydin Chapter 04 and Chapter 05 Miroslav-Kubat-Springer Grade Distribution Quality of Performance Letter Grade Range % A+ 97-100 Excellent – work is of exceptional quality A 93 – 96.9 A- 90 – 92.9 Good – work is above average B+ 87 – 89.9 Satisfactory B 83 – 86.9 Below Average B- 80 – 82.9 Poor C+ 77 – 79.9 C 70 – 76.9 Failure F < 70 Not everyone can get A or A-. There will be a distribution of grades. Grade Calculation Grades in this course are determined by the percentage of points obtained. Course assignments Percentage of Final Grade Points Homework – 1 (HW01 - EDA) 15.00% 15 EDA assignment (15%) due 03/05/2022 Analyze, the structure of your dataset (15%) due 09/03/21 identifying redundant,correlated and constant(useless) features, variableImportance and VIF Homework – 2 (HW02 – Individual Classifier) 20.00% 20 3 Experiments with 3 or more Supervised Learning Techniques. Estimate variance and bias. Tabulate critical/appropriate metrics. Write a critical review of the observed performance differences, due 04/09/2022 Engagement 10.00% 10 You must participate in weekly forums and discussions. Discussions are applied analysis from the texts. You must post a response by Sunday midnight (ET) You must submit your weekly engagement You should provide meaningful feedback on the analysis. HW03 – experiments with Ensemble techniques 25.00% 25 Improve your work in HW02 using CV/Bagging/Boosting/RandomForest/Stacking – 15% Due 05/07/2022 – analyze and summarize, compare and contrast these techniques and their utility to improve performance – 15% Open book, open notes review test – t1 03/25/22 10.00% 10 Open book, open notes review test – t2 04/25/22 10.00% 10 Quiz, once in every two weeks 10.00% 10 Total 100% 100 Course requirements Submit all assignments before 11:59 PM on the due date, specified above. 4 Course Outline: Please note that this schedule is subject to change depending on progress, questions, requests, etc. Week Topics Tentative Date/Reading 1 Machine Learning, Supervised Learning (Classification) Unsupervised Learning(Clustering). Reinforcement Learning is not in scope. Supervised Learning:Generic Concepts applicable to all supervised learners: Occam's Razor,No Free Lunch Theorem, Induction/Generalization, Loss Function Minimization (aka Optimization), Bias/ Variance,Inability to learn, inability to generalize ISLR:Chap01, Chap 2.1,2.2.2, 2.2.3, Chap 4.1 2 Refresher Advanced Probability[Expected Value and Linear Algebra, Matrices, Vectors]/Statistics[i.i.d, CLT, LLN,descriptive/summary statistics,moments]/ Dataset manipulation in R, Datasets, Numerical/Categorical data, Scale and Models ISLR:Chap02 Greene01 3 Supervised Learning: Regression, Logistic Regression, Antidote to overfitting: regularization techniques: Ridge (L2), Lasso (L1) Classifier performance:TP,FP,TN,FN RoC, AUC, Accuracy,Specificity, Sensitivity,Precision, Recall, ISLR:Chap 4.1, 4.3, 4.6.2 4 Sigmoid Function, activation function, perceptron (aka neural nets or ANNs) Extending the perceptrons with back propagation, hidden layers, other activation function Explainability, Regularization:Ridge, Lasso and ElasticNet ISLR: 5 Uncorrelated Features:Na ve Bayes Generative vs Discriminative Classifiers 6 Instance Based techniques (no assumptions about the distribution, aka non-parametric) Distance, Nearest Neighbor kNN Chap 4.6.5 7 Curse of Dimensionality, Mahalanobis Distance, Dimensionality Reduction (LDA as Feature Selection, LASSO as Feature Selection, PCA (Cholesky,Eigen, SVD) ISLR:4.4.2 ISLR: 6.3 8 Decision Trees (no assumptions about the distribution, aka non-parametric), Entropy, Information Gain ISLR:8 9 Support Vectors (no assumptions about the distribution, aka non-parametric) Support Vector Machines: Margins, Kernel, Radial Basis, Gaussian ISLR:9 10 What causes inferior performance, techniques for performance improvement Resampling:Varying dataset Bootstrapping, Cross Validation, Bagging, Boosting ISLR:Chap 05 ISLR:8.2 11 Combining Classifiers: Aggregating variants of one classifier, combining heterogeneous Classifiers, Stacking article (instructor will provide) 12 Relevance of Stacking/CV/Bagging to parallelism and NFL article (instructor will provide) 13 Unsupervised Learning;Clustering ISLR:Chap. 10 Survey article (instructor will 5 Topic Modeling in Text Analytics,SOMs provide) 14 Semi Supervised – leveraging strengths of unsupervised and supervised article (instructor will provide) 15 Final Exam No class Moses Center Statement of Disability If you are student with a disability who is requesting accommodations, please contact New York University’s Moses Center for Students with Disabilities (CSD) at 212-998-4980 or mosescsd@nyu.edu. You must be registered with CSD to receive accommodations. Information about the Moses Center can be found at www.nyu.edu/csd. The Moses Center is located at 726 Broadway on the 3rd floor. NYU School of Engineering Policies and Procedures on Academic Misconduct – complete Student Code of Conduct here A. Introduction: The School of Engineering encourages academic excellence in an environment that promotes honesty, integrity, and fairness, and students at the School of Engineering are expected to exhibit those qualities in their academic work. It is through the process of submitting their own work and receiving honest feedback on that work that students may progress academically. Any act of academic dishonesty is seen as an attack upon the School and will not be tolerated. Furthermore, those who breach the School’s rules on academic integrity will be sanctioned under this Policy. Students are responsible for familiarizing themselves with the School’s Policy on Academic Misconduct. B. Definition: Academic dishonesty may include misrepresentation, deception, dishonesty, or any act of falsification committed by a student to influence a grade or other academic evaluation. Academic dishonesty also includes intentionally damaging the academic work of others or assisting other students in acts of dishonesty. Common examples of academically dishonest behavior include, but are not limited to, the following: 1. Cheating: intentionally using or attempting to use unauthorized notes, books, electronic media, or electronic communications in an exam; talking with fellow students or looking at another person’s work during an exam; submitting work prepared in advance for an in-class examination; having someone take an 6 exam for you or taking an exam for someone else; violating other rules governing the administration of examinations. 2. Fabrication: including but not limited to, falsifying experimental data and/or citations. 3. Plagiarism: intentionally or knowingly representing the words or ideas of another as one’s own in any academic exercise; failure to attribute direct quotations, paraphrases, or borrowed facts or information. 4. Unauthorized collaboration: working together on work meant to be done individually. 5. Duplicating work: presenting for grading the same work for more than one project or in more than one class, unless express and prior permission has been received from the course instructor(s) or research adviser involved. 6. Forgery: altering any academic document, including, but not limited to, academic records, admissions materials, or medical excuses. NYU School of Engineering Policies and Procedures on Excused Absences – complete policy here A. Introduction: An absence can be excused if you have missed no more than 10 days of school. If an illness or special circumstance has caused you to miss more than two weeks of school, please refer to the section labeled Medical Leave of Absence. B. Students may request special accommodations for an absence to be excused in the following cases: 1. Medical reasons 2. Death in immediate family 3. Personal qualified emergencies (documentation must be provided) 4. Religious Expression or Practice Deanna Rayment, deanna.rayment@nyu.edu, is the Coordinator of Student Advocacy, Compliance and Student Affairs and handles excused absences. She is located in 5 MTC, LC240C and can assist you should it become necessary. NYU School of Engineering Academic Calendar – complete list here. The last day of the final exam period is _____. Final exam dates for undergraduate courses will not be determined until later in the semester. Final exams for graduate courses will be held on the last day of class during the week of _____. If you have two final exams at the same time, report the conflict to your 7 professors as soon as possible. Do not make any travel plans until the exam schedule is finalized. Also, please pay attention to notable dates such as Add/Drop, Withdrawal, etc. For confirmation of dates or further information, please contact Susana: sgarcia@nyu.edu 8


发表评论