ʼһ

XClose

UCL Module Catalogue

Home
Menu

Quantitative Text Analysis for Social Science (PUBL0099)

Key information

Faculty
Faculty of Social and Historical Sciences
Teaching department
Political Science
Credit value
15
Restrictions
Open to MSc students in the Department and Political Science as well as students on MSc Economics and MSc Economics and Finance. A7U also open to BSc PIR students who have attended POLS0083 Quantitative Data Analysis co-requisite: ECON0127 Statistical Learning for Public Policy or PUBL0055 Introduction to Quantitative Methods or equivalent. Pre-requisite for PIR students: POLS0083 Quantitative Data Analysis. POLS0012 attendance will be sufficient to opt out of PUBL0050 and into PUBL0099.
Timetable

Alternative credit options

There are no alternative credit options available for this module.

Description

Some of the most interesting and important concepts in the social sciences are observable predominantly (or sometimes even exclusively) in written form. This is because much of social life occurs through the language that we use: laws are written; speeches are spoken; historical events are transcribed; correspondence is shared; and so on.

Historically, social scientists have analysed the texts produced as the output of these social processes in qualitative ways, for two main reasons. First, for many research questions, researchers only had access to relatively small numbers of documents, meaning that qualitative approaches were both attractive and feasible for such analyses. Second, when large collections of documents were available, analysing those texts tended to be expensive and technically challenging.

In recent years, however, the volume of text-based data available to social scientists has proliferated at an extraordinary rate, largely thanks to the huge collections of texts that have been made available online. The increasing availability of digitized texts has also prompted social scientists to develop a wide array of new methods that can be used to analyse and extract meaning from those texts. As a result, research in the field of “quantitative text analysis”, or “text-as-data”, has grown enormously inside and outsideacademia.

This course provides an overview of text-as-data methods for social science students. The key goals of the course are:
1. to introduce the foundational models and approaches used to analyse large-scale collections of texts in modern social science;
2. to develop students’ abilities to critically evaluate existing text-as-data work in the discipline; and
3. to provide the practical skills required to conduct an original research project which uses quantitative text analysis methods.

Throughout the course, we will think deeply about the things we can (and cannot) measure reliably through the quantitative analysis of text; discover how treating text as data necessitates making assumptions, which can be consequential; learn tools to collect and manipulate large collections of text; and develop a suite of practical computational skills to apply text-as-data analyses to data of widely varying forms.All methods on the module will be implemented using the R programming language.

The topics covered on the course include representing text as data; dictionary methods; similarity and distance metrics; measures of textual complexity; supervised learning for text; collecting text data from the internet (web-scraping); text scaling models; topic models; word-embedding models; causal inference with text; and an introduction to large language models. Throughout the course, we will cover a wide variety of topics and examples from political science, economics, and public policy.

This is an advanced module intended for students who have already had some training in quantitative methods for data analysis. At least one previous course in quantitative methods, statistics, or econometrics is required for all students participating on this course. Students should therefore have a working knowledge of the methods covered in typical introductory quantitative methods courses (i.e. at least to the level of PUBL0055 or equivalent). At a minimum, this should include experience with hypothesis testing and multiple linear regression.

Module deliveries for 2024/25 academic year

Intended teaching term: Term 2 Undergraduate (FHEQ Level 7)

Teaching and assessment

Mode of study
In person
Methods of assessment
100% Coursework
Mark scheme
Numeric Marks

Other information

Number of students on module in previous year
1
Module leader
Dr Jack Blumenau

Intended teaching term: Term 2 Postgraduate (FHEQ Level 7)

Teaching and assessment

Mode of study
In person
Methods of assessment
100% Coursework
Mark scheme
Numeric Marks

Other information

Number of students on module in previous year
33
Module leader
Dr Jack Blumenau

Last updated

This module description was last updated on 19th August 2024.