• header Image

Introduction to Machine Learning

Home/ Introduction To Machine Learning
Course Type Course Code No. Of Credits
Discipline Elective SLS2EC241 4

Course coordinator and team: Jyotirmoy Bhattacharya

Does the course connect to, build on or overlap with any other courses offered in AUD?

This course builds on the core MA Economics courses “Statistics and Data Exploration” and “Econometrics”.

Specific requirements on the part of students who can be admitted to this course: (Pre-requisites; prior knowledge level; any others – please specify)

Completion of the above core MA Economics courses or equivalent maturity in statistics and mathematics. No prior programming experience is required.

No. of students to be admitted (with justification if lower than usual cohort size is proposed):

As per University norms.

Course scheduling (semester; semester-long/half-semester course; workshop mode; seminar mode; any other – please specify):

Semester.

How does the course link with the vision of AUD?

Machine learning techniques have an important role to play in the analysis of the large volume of administrative and survey data on social issues that is increasing becoming available. Proficiency in these techniques thereby would improve students ability to developed a reasoned and evidence-based understanding of social issues as envisaged in AUD’s vision.

How does the course link with the specific programme(s) where it is being offered?

This course further develops students proficiency in empirical analysis, introducing them to emerging computationally intensive techniques that go beyond traditional econometrics.

Pre-requisites: An understanding of statistics and data analysis at a level of core MA Economics courses.

Objective:  This course is a first introduction to machine learning algorithms for postgraduate students of economics. Economists both in academia and industry are handling ever-increasing data sets with large number of observations over a large number of variables whose potential cannot be fully exploited by traditional econometric methods. Machine learning models hold the promise of enabling economists to make better use of these big datasets. Exposure to these models also promises to expand economists’ conceptual horizons, taking them beyond their traditional confined of small parametric statistical models. At the same time, economics has something crucial to offer to machine learning. For the past few decades economists have devoted much thought and ingenuity to causal inference, i.e. the prediction of the effect of interventions on a system (also known as ‘treatments’ in analogy with medical research) as opposed to to merely predicting outcomes from a system kept at arms length. Economists are now at the forefront of combining the insight gained from this research with methods of machine learning to produce techinques which can make use of large data sets to answer nuanced causal questions.

This course aims to introduce students of the MA Economics programs to these exciting developments. Starting with an introduction to programming using the Python programming language, the course would go on to develop hands-on skills in applying a variety of common machine learning algorithms for prediction and finally introduce them to the emerging area of the estimatation of causal effects using machine learning methods.

Course outcomes:

On successful completion of this course students will be able to:

  • Describe the major classes of machine learning algorithms and identify appropriate algorithms for given problems.
  • Use Python notebooks and popular libraries such and scikit-learn and keras (or their equivalents in other languages) to apply machine learning algorithms to given datasets using CPUs and GPUs for both prediction and causal analysis tasks.
  • Evaluate the performance of machine learning algorithms and troubleshoot common issues in the use of these models

Brief description of modules/ Main modules:

Citations in square brackets such as [PDA] refer to readings given in section 11 below.

Session

(2 hours each)

Topics

Description

Core Readings

Additional Readings

Python libraries

Module 1. Introduction to Python programming

This module introduces students to the programming language Python which will be used in this course. The module does not intend to cover all features of Python. It’ll be restricted to those features and libraries which are most useful for data analysis work.

1-4

Introduction to the Python programming language.

Introduction to the notion of programming languages and programming language implementation. The notebook interface. Basic data types and collections. Use of variables. Defining functions.

[PDA], Ch. 1-3

 

 

5

Multidimensional arrays

Basic manipulations and computations with multidimensional arrays using the numpy library

[PDA], Ch. 4

 

numpy

6-8

 Data Frames

Basic manipulation of data frames using the pandas library

[PDA], Ch. 5-7

 

pandas

  1. 9

Visualization

Basic plotting with the matplotlib and seaborn libraries

[PDA], Ch 9

 

matplotlib, seaborn

  1. Module 2. Machine learning for prediction

 

This module introduces the use of machine learning for predictive tasks. This is currently the most developed area of machine learning. From the wide variety of model available we pick those which dominate current applied work and which also provide a basis for causal machine learning in economics which is the subject of the next module.

  1. 10
  1. Introduction to machine learning

Introduction to machine learning and its relation to traditional statistical practice. Supervised vs. unsupervised learning, regression vs. classification. The bias-variance tradeoff

[ISL], Chapters 1, 2, 5.

 

 

 

11.

Linear models

Review of linear and logistic regression.

[ISL2] Chapter 3.1-3.2,4.1-4.3

 

 

scikit-learn

12-13.

The machine learning pipeline.

Using machine learning pipelines to transform data, fit linear models and evaluate their performance.

[SKT], Ch. 4

 

 

14.

Regularization for linear models.

Introduction to the problem of overfitting and the need for regularization. Ridge Regression and lasso

[ISL] Chapter 6.2,6.4

[SKT], Ch. 4

 

scikit-learn

15.

Decision trees

Introduction to classification and regression trees

[ISL] Chapter 8.1

[SKT], Ch. 6

 

scikit-learn

16-18.

Ensembles of trees

Bagging, random forests, boosting.

[ISL] Chapter 8.2

[SKT], Ch. 8

 

scikit-learn, xgboost, lightgbm

19-21.

Introduction to deep learning

Solving basic regression and classification tasks using deep neural networks

[DLP], Chapter 3,4

[SKT], Chapter 10

keras

22-23.

Further deep learning

Overfitting and underfitting; regularization, hyperparameter tuning

[DLP], Chapter 5

[SKT], Chapter 11

keras

Module 3. Machine learning for causal effects

Empirical research in economics is increasingly oriented towards estimating causal effects of different interventions. This module looks at the emerging research in econometrics on the use of machine learning methods in causal inference tasks.

24.

The causal inference problem

Introducing the potential outcomes framework for causal inference and the fundamental problem of unobservability of individual-level causal effects.

Angrist, J.D. and Pischke, J.-S. (2009). Mostly Harmless Econometrics, Ch. 1

 

 

25-26.

Double/debiased machine learning

Using the double/debiased machine learning framework introduced by Chernozhukov which allows machine learning methods to be used in econometric models while controlling the bias introduced by these models.

https://docs.doubleml.org/stable/guide/basics.html

 

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W. and Robins, J. (2018), “Double/debiased machine learning for treatment and structural parameters.” The Econometrics Journal

doubleml, econml

27-28.

Causal Forests

Use of an extension of random forest model to estimate heterogeneous causal effects.

https://towardsdatascience.com/causal-machine-learning-for-econometrics-causal-forests-5ab3aec825a7

 

Wager, S., & Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association113(523), 1228-1242.

econml

Note: Based on technical developments, student/faculty interests other languages or libraries may be substituted for the ones mentioned above while covering broadly the same topics.

Main Reading list:

[PDA] McKinney, W. (2022)  Python for Data Analysis, 3rd ed., O’Reilly. https://wesmckinney.com/book/

[ISL2] James, G. et al. (2021) Introduction to Statistical Learning, 2nd ed., Springer. https://www.statlearning.com/

[SKT] Géron, A. (2019) Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd ed., O’ Reilly

[DLP] Chollet, F. (2021) Deep Learning with Python, 2nd ed., Manning

Additional reference list:

Athey, S., & Imbens, G. W. (2019). “Machine learning methods that economists should know about.” Annual Review of Economics,11, 685-725.

Breiman, L. (2001). “Statistical modeling: The two cultures (with comments and a rejoinder by the author)” .Statistical Science,16(3), 199-231.

Goodfellow, I., Bengio, Y., & Courville, A. (2016).Deep Learning. MIT press.

Imbens, G., & Athey, S. (2021). “Breiman's two cultures: A perspective from econometrics.” Observational Studies,7(1), 127-133.

Mullainathan, S., & Spiess, J. (2017). “Machine learning: an applied econometric approach.” Journal of Economic Perspectives31(2), 87-106.

Van der Plas, J. (2023) Python Data Science Handbook, 3rd ed., O’Reilly

Zerilli, J. et al  (2021) A Citizen’s Guide to Artificial Intelligence, MIT Press

Pedagogy:

  • Instructional design: Classroom lectures supplemented by independent work by students.
  • Special needs (facilities, requirements in terms of software, studio, lab, clinic, library, classroom/others instructional space; any other – please specify): Classroom with audio-visual equipment and internet connection.
  • Expertise in AUD faculty or outside: expertise exists within the AUD faculty to teach this course.
  • Linkages with external agencies (e.g., with field-based organizations, hospital; any others): Non

Assessment structure (modes and frequency of assessments)

  • Two mini-projects: 25% each.
  • Summative project: 50%.

Pedagogy:

  • Instructional strategies: lectures​​​​​​​
  • Special needs (facilities, requirements in terms of software, studio, lab, clinic, library, classroom/others instructional space; any other – please specify): Classroom equipped with projector.​​​​​​​
  • Expertise in AUD faculty or outside: AUD Faculty ​​​​​​​
  • Linkages with external agencies (e.g., with field-based organizations, hospital; any others): NA
Top