Course schedule

Topic Estimated Date Time
πŸ‘‹ Introduction + Texevier Test 2025-03-14 (Friday) 11:30-12:30
⛃ Databases and SQL 2025-03-17 (Monday) 16:30-18:30
⏰ Recess 2025-03-21 to 2025-04-07
🎲 Basic Bayesian 2025-04-07 (Monday) 11:30-13:30
πŸ“š NLP: Text Analysis 2025-04-14 (Monday) 16:30-18:30
πŸ§‘β€πŸ« ML: Supervised 2025-04-25 (Friday) 11:30-13:30
βš‡ ML: Unsupervised 2025-05-02 (Friday) 11:30-13:30
πŸ€– Deep Learning: RNN 2025-05-05 (Monday) 16:30-18:30
πŸ“ Exam 2025-06-17 to 2025-06-19
πŸ“‹ Project 2025-06-20

Materials

Session Learning Outcomes
⛃ Databases and SQL
  • Basic understanding of different database designs
  • Query database from inside R using {dbasic }
  • Exploratory data description and analyse through SQL
  • Slides πŸ–ΌPDF
🎲 Basic Bayesian
  • Difference between bayesian and frequentist approach
  • Understand bayesian jargon: Prior, Likelihood and Posterior
  • Understand how MCMC sampling works
  • Bayesian Estimation of Philips Curve
  • Slides πŸ–ΌPDF
  • Readme πŸ–ΌReadme
  • CLICK HERE FOR RDS ECONOMIC FILE
πŸ“š NLP: Text Analysis
πŸ§‘β€πŸ« ML: Supervised
βš‡ ML: Unsupervised
  • Working without Y ~ f(X)
  • Practical PCA and MCA
  • Clustering: hclust, kmeans and density-based
  • Slides πŸ–ΌPDF
  • Website πŸ–ΌPractical
πŸ€– Deep Learning: RNN
  • DL vs ML
  • Foundation of DL
  • GRU vs LSTM
  • Generative AI: Interacting via API
  • Slides πŸ–ΌPDF
  • Website πŸ–ΌPractical
🎯 Targets
  • Intro to Data Engineering
  • Slides πŸ–ΌPDF

Project

Please fill in your slot: HERE

The assessment for this part of the module takes the form of a semester project. Students are expected to craft their own research question using one of four unique data sets. The submission date for the project is the 20th of June at midnight. Details of the project will be provided at the start of the semester. To pass the module, a final mark of at least 50% has to be obtained. To obtain a distinction in this module, a minimum final mark of 75% is required.

⚠️ All projects need to use `R Projects` and {texevier} (OR {elsevier}) to knit to a final PDF in order to qualify for a mark greater than 50%. So make sure to practice this skill throughout! Please ask for help earlier than later.

πŸ€– What about using AI? You are welcome to utilize this new technology, BUT be aware that writing style will still matter. All generative models are very verbose, so they are easy to spot (especially for someone like me who use them often). Make sure to use them as a tool to bounce ideas off of, but write your in your own words and understanding! I will be checking to see how much of the paper was AI generated.

πŸ“‹ Project
The final project is a write-up of 2000 words (Think Economic Letters). The project must contain:
  • Motivation and economic question
  • A literature review section
  • Exploratory data description and analysis
  • Statistical Modeling
  • Conclusion
I will be providing four datasets to choose from. Please find samples of them below:
  • Wine Reviews
    • Dataset of Vivino reviews for Calitzdorp, South Africa for the years 2014 to 2016.
    • Data in your Database in table "wine"
    • CLICK HERE FOR RDS FILE
  • Job Ads
  • Global Macro Database
    • This dataset complements the paper from MΓΌller, Xu, Lehbib, and Chen (2025), which introduces a panel dataset of 46 macroeconomic variables across 243 countries from historical records beginning in the year 1086 to projections through the year 2030.
    • CLICK HERE FOR LINK TO DATA
  • CBS Speeches
    • The dataset features 35,487 unique speeches from 131 central banks, for the period going from the beginning of January 1986 (date of the first online speech by a central bank) to the end of December 2023.
    • CLICK HERE FOR RDS FILE

Readings

Topic Name Cover
Fundamentals R For Data Science:
This book will teach you how to do data science with R: You'll learn how to get your data into R, get it into the most useful structure, transform it and visualize.
R for Data Science (2e)
Fundamentals Basic Course:
SQL is a standard language for storing, manipulating and retrieving data in databases. This SQL tutorial will teach you how to use SQL in: MySQL, SQL Server, MS Access, Oracle, Postgres, and other database systems.
W3 School
Bayesian Bayesian Data Analysis:
Winner of the 2016 De Groot Prize from the International Society for Bayesian Analysis. Now in its third edition, this classic book is widely considered the leading text on Bayesian methods, lauded for its accessible, practical approach to analyzing data and solving research problems.
Bayesian Data
Bayesian Statistical Rethinking:
Winner of the 2024 De Groot Prize awarded by the International Society for Bayesian Analysis (ISBA). This book builds your knowledge of and confidence in making inferences from data. Reflecting the need for scripting in today's model-based statistics, the book pushes you to perform step-by-step calculations that are usually automated. This unique computational approach ensures that you understand enough of the details to make reasonable choices and interpretations in your own modeling work. The text presents causal inference and generalized linear multilevel models from a simple Bayesian perspective that builds on information theory and maximum entropy.
Bayesian Data
Bayesian Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan
Provides an accessible approach for conducting Bayesian data analysis, as material is explained clearly with concrete examples.
Bayesian Data
NLP This book serves as an introduction of text mining using the tidytext package and other tidy tools in R. The functions provided by the tidytext package are relatively simple; what is important are the possible applications. NLP
Machine Learning Elements of Statistical Learning:
Bible of all machine and deep learning
NLP
Machine Learning Introduction to statistical learning:
See Elements of Statistical Learning
NLP
Deep Learning Deep Learning with R:
Shows you how to put deep learning into action using first principles.
NLP