Sie sind nicht angemeldet

Data Mining for Political and Social Sciences using R


Dozent/in Dr. Andrea De Angelis
Veranstaltungsart Masterseminar
Code FS251550
Semester Frühjahrssemester 2025
Durchführender Fachbereich Politikwissenschaft
Studienstufe Master
Termin/e Do, 20.02.2025, 09:15 - 17:00 Uhr, 3.B58
Fr, 21.02.2025, 09:15 - 17:00 Uhr, 4.B51
Fr, 07.03.2025, 09:15 - 17:00 Uhr, 4.B51
Do, 13.03.2025, 09:15 - 17:00 Uhr, E.509
Fr, 14.03.2025, 09:15 - 17:00 Uhr, 4.B54
Umfang 2 Semesterwochenstunden
Turnus Blockveranstaltung
Inhalt

NOTE: Please register for this course by 10th February 2025 on the UniPortal. After that, if too many students have registered, a draw will be made, and you will get notified whether you can attend the course on 11th February. LUMACSS students are prioritized as this course is mandatory for them.

CONTENT:

Data analysis increasingly involves mining data from the Internet and using innovative tools to handle large datasets. With the rise of Large Language Models (LLMs) such as ChatGPT, data mining practices are undergoing a significant transformation. This course bridges traditional data mining techniques and the potential of LLMs, equipping students with essential skills to automate and enhance their research workflows. 

The course employs a self-learning approach where students leverage LLMs to explore, self-learn, and apply tools for data mining. Under the guidance of the instructor, this course provides hands-on experience in collecting and handling web data, developing reproducible workflows, and critically evaluating LLM outputs. Students will gain both technical and analytical skills in a collaborative learning environment. 

The course is structured in three blocks:

1. An introductory block covers the essential knowledge for working with big data (notions of R programming, developing reproducible code, reporting in automated notebooks, version control, and Git/GitHub; secondary datasets for social science research & MySQL). 

2. A data access block focuses on web scraping and related tools (introduction to regular expressions, HTML language, XML, and JSON data structures).

3. A third block introduces more advanced data access concepts, such as API interaction, and allows students to practice with live coding sessions in class.

E-Learning https://lms.uzh.ch/url/RepositoryEntry/17675780113
Lernziele By the end of the course, active participants will:
1. gain proficiency in data analysis, learning to analyze data efficiently and reproducibly. [Data analysis]
2. develop critical skills to evaluate LLM outputs and integrate them into research workflows. [Incorporating LLMs]
3. learn how to develop and debug complex code throughout the data analysis cycle (mining, tidying, analyzing, reporting). [Programming and statistical skills]
4. develop feasible big data research designs. [Research and analytical skills]
Voraussetzungen An intrinsic motivation to learn.
Sprache Englisch
Begrenzung Begrenzung: Only Master students, and priority for LUMACSS students.
Anmeldung ***Wichtig*** Um Credits zu erwerben ist die Anmeldung zur Lehrveranstaltung über das UniPortal zwingend erforderlich. Die Anmeldung ist ab zwei Wochen vor bis zwei Wochen nach Beginn des Semesters möglich. An- und Abmeldungen sind nach diesem Zeitraum nicht mehr möglich. Die genauen Anmeldedaten finden Sie hier: www.unilu.ch/ksf/semesterdaten
Prüfung Active participation and final capstone project
Abschlussform / Credits Aktive Teilnahme, Essay (benotet) / 4 Credits
Hinweise Begrenzung: priority for LUMACSS students. In case of too many registrations by other disciplines, a draw will be made to decide who may remain in the course. The date of the draw is 11th February. Therefore, registration before this date is essential.
Hörer-/innen Nach Vereinbarung
Kontakt andrea.deangelis@unimi.it
Literatur - QSS: Imai, K. (2017). Quantitative Social Science: An Introduction. Princeton: Princeton University Press.
- R4DS: Wickham, H., and G. Grolemund (2014). R for Data Science. O’Reilly Media. The book is also freely available online: https://r4ds.had.co.nz/
- ADCR: Munzert et al. (2014). Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining. London: Wiley & Sons.