Introduction to Machine Learning

Course Type	Course Code	No. Of Credits
Discipline Elective	SLS2EC241	4

Course coordinator and team: Jyotirmoy Bhattacharya

Does the course connect to, build on or overlap with any other courses offered in AUD?

This course builds on the core MA Economics courses “Statistics and Data Exploration” and “Econometrics”.

Specific requirements on the part of students who can be admitted to this course: (Pre-requisites; prior knowledge level; any others – please specify)

Completion of the above core MA Economics courses or equivalent maturity in statistics and mathematics. No prior programming experience is required.

No. of students to be admitted (with justification if lower than usual cohort size is proposed):

As per University norms.

Course scheduling (semester; semester-long/half-semester course; workshop mode; seminar mode; any other – please specify):

Semester.

How does the course link with the vision of AUD?

Machine learning techniques have an important role to play in the analysis of the large volume of administrative and survey data on social issues that is increasing becoming available. Proficiency in these techniques thereby would improve students ability to developed a reasoned and evidence-based understanding of social issues as envisaged in AUD’s vision.

How does the course link with the specific programme(s) where it is being offered?

This course further develops students proficiency in empirical analysis, introducing them to emerging computationally intensive techniques that go beyond traditional econometrics.

Pre-requisites: An understanding of statistics and data analysis at a level of core MA Economics courses.

Objective: This course is a first introduction to machine learning algorithms for postgraduate students of economics. Economists both in academia and industry are handling ever-increasing data sets with large number of observations over a large number of variables whose potential cannot be fully exploited by traditional econometric methods. Machine learning models hold the promise of enabling economists to make better use of these big datasets. Exposure to these models also promises to expand economists’ conceptual horizons, taking them beyond their traditional confined of small parametric statistical models. At the same time, economics has something crucial to offer to machine learning. For the past few decades economists have devoted much thought and ingenuity to causal inference, i.e. the prediction of the effect of interventions on a system (also known as ‘treatments’ in analogy with medical research) as opposed to to merely predicting outcomes from a system kept at arms length. Economists are now at the forefront of combining the insight gained from this research with methods of machine learning to produce techinques which can make use of large data sets to answer nuanced causal questions.

This course aims to introduce students of the MA Economics programs to these exciting developments. Starting with an introduction to programming using the Python programming language, the course would go on to develop hands-on skills in applying a variety of common machine learning algorithms for prediction and finally introduce them to the emerging area of the estimatation of causal effects using machine learning methods.

Course outcomes:

On successful completion of this course students will be able to:

Describe the major classes of machine learning algorithms and identify appropriate algorithms for given problems.
Use Python notebooks and popular libraries such and scikit-learn and keras (or their equivalents in other languages) to apply machine learning algorithms to given datasets using CPUs and GPUs for both prediction and causal analysis tasks.
Evaluate the performance of machine learning algorithms and troubleshoot common issues in the use of these models

Brief description of modules/ Main modules:

Citations in square brackets such as [PDA] refer to readings given in section 11 below.

Session (2 hours each)	Topics	Description	Core Readings	Additional Readings	Python libraries
Module 1. Introduction to Python programming This module introduces students to the programming language Python which will be used in this course. The module does not intend to cover all features of Python. It’ll be restricted to those features and libraries which are most useful for data analysis work.
1-4	Introduction to the Python programming language.	Introduction to the notion of programming languages and programming language implementation. The notebook interface. Basic data types and collections. Use of variables. Defining functions.	[PDA], Ch. 1-3
5	Multidimensional arrays	Basic manipulations and computations with multidimensional arrays using the numpy library	[PDA], Ch. 4		numpy
6-8	Data Frames	Basic manipulation of data frames using the pandas library	[PDA], Ch. 5-7		pandas
9	Visualization	Basic plotting with the matplotlib and seaborn libraries	[PDA], Ch 9		matplotlib, seaborn
Module 2. Machine learning for prediction This module introduces the use of machine learning for predictive tasks. This is currently the most developed area of machine learning. From the wide variety of model available we pick those which dominate current applied work and which also provide a basis for causal machine learning in economics which is the subject of the next module.
10	Introduction to machine learning	Introduction to machine learning and its relation to traditional statistical practice. Supervised vs. unsupervised learning, regression vs. classification. The bias-variance tradeoff	[ISL], Chapters 1, 2, 5.
11.	Linear models	Review of linear and logistic regression.	[ISL2] Chapter 3.1-3.2,4.1-4.3		scikit-learn
12-13.	The machine learning pipeline.	Using machine learning pipelines to transform data, fit linear models and evaluate their performance.	[SKT], Ch. 4
14.	Regularization for linear models.	Introduction to the problem of overfitting and the need for regularization. Ridge Regression and lasso	[ISL] Chapter 6.2,6.4 [SKT], Ch. 4		scikit-learn
15.	Decision trees	Introduction to classification and regression trees	[ISL] Chapter 8.1 [SKT], Ch. 6		scikit-learn
16-18.	Ensembles of trees	Bagging, random forests, boosting.	[ISL] Chapter 8.2 [SKT], Ch. 8		scikit-learn, xgboost, lightgbm
19-21.	Introduction to deep learning	Solving basic regression and classification tasks using deep neural networks	[DLP], Chapter 3,4	[SKT], Chapter 10	keras
22-23.	Further deep learning	Overfitting and underfitting; regularization, hyperparameter tuning	[DLP], Chapter 5	[SKT], Chapter 11	keras
Module 3. Machine learning for causal effects Empirical research in economics is increasingly oriented towards estimating causal effects of different interventions. This module looks at the emerging research in econometrics on the use of machine learning methods in causal inference tasks.
24.	The causal inference problem	Introducing the potential outcomes framework for causal inference and the fundamental problem of unobservability of individual-level causal effects.	Angrist, J.D. and Pischke, J.-S. (2009). Mostly Harmless Econometrics, Ch. 1
25-26.	Double/debiased machine learning	Using the double/debiased machine learning framework introduced by Chernozhukov which allows machine learning methods to be used in econometric models while controlling the bias introduced by these models.	https://docs.doubleml.org/stable/guide/basics.html	Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W. and Robins, J. (2018), “Double/debiased machine learning for treatment and structural parameters.” The Econometrics Journal	doubleml, econml
27-28.	Causal Forests	Use of an extension of random forest model to estimate heterogeneous causal effects.	https://towardsdatascience.com/causal-machine-learning-for-econometrics-causal-forests-5ab3aec825a7	Wager, S., & Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523), 1228-1242.	econml

Note: Based on technical developments, student/faculty interests other languages or libraries may be substituted for the ones mentioned above while covering broadly the same topics.

Main Reading list:

[PDA] McKinney, W. (2022) Python for Data Analysis, 3^rd ed., O’Reilly. https://wesmckinney.com/book/

[ISL2] James, G. et al. (2021) Introduction to Statistical Learning, 2^nd ed., Springer. https://www.statlearning.com/

[SKT] Géron, A. (2019) Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2^nd ed., O’ Reilly

[DLP] Chollet, F. (2021) Deep Learning with Python, 2^nd ed., Manning

Additional reference list:

Athey, S., & Imbens, G. W. (2019). “Machine learning methods that economists should know about.” Annual Review of Economics,11, 685-725.

Breiman, L. (2001). “Statistical modeling: The two cultures (with comments and a rejoinder by the author)” .Statistical Science,16(3), 199-231.

Goodfellow, I., Bengio, Y., & Courville, A. (2016).Deep Learning. MIT press.

Imbens, G., & Athey, S. (2021). “Breiman's two cultures: A perspective from econometrics.” Observational Studies,7(1), 127-133.

Mullainathan, S., & Spiess, J. (2017). “Machine learning: an applied econometric approach.” Journal of Economic Perspectives, 31(2), 87-106.

Van der Plas, J. (2023) Python Data Science Handbook, 3^rd ed., O’Reilly

Zerilli, J. et al (2021) A Citizen’s Guide to Artificial Intelligence, MIT Press

Pedagogy:

Instructional design: Classroom lectures supplemented by independent work by students.
Special needs (facilities, requirements in terms of software, studio, lab, clinic, library, classroom/others instructional space; any other – please specify): Classroom with audio-visual equipment and internet connection.
Expertise in AUD faculty or outside: expertise exists within the AUD faculty to teach this course.
Linkages with external agencies (e.g., with field-based organizations, hospital; any others): Non

Assessment structure (modes and frequency of assessments)

Two mini-projects: 25% each.
Summative project: 50%.

Pedagogy:

Instructional strategies: lectures
Special needs (facilities, requirements in terms of software, studio, lab, clinic, library, classroom/others instructional space; any other – please specify): Classroom equipped with projector.
Expertise in AUD faculty or outside: AUD Faculty
Linkages with external agencies (e.g., with field-based organizations, hospital; any others): NA