Projects

Human-originated data distributions from crowdsourced labels

Crowdsourced labels are a unique source of label errors that is a sweet spot between adversarial (too hard), and many synthetic models (too easy). This project leverages crowdsourced labels as a source of evaluation for large language models.

Chong, Hong, and Manning (2022). Detecting Label Errors by using Pre-Trained Language Models. EMNLP 2022.

Project Recon

Project Recon aims to champion a new approach for machine learning in the criminal justice system, with a pilot of 35,000 parole cases from California’s Board of Parole Hearings. I spearheaded this project along with my colleague Catalin Voss from 2017–2023. Kristen Bell now leads Project Recon.

Law Publications

Bell, Hong, Voss, Todd, Ugander, McKeown, Alvero, and Searcy (2023). Using Machine Learning to Investigate Attorneys and Commissioners at Parole-Release Hearings. Working Paper.

Bell, Hong, McKeown, and Voss (2021). The Recon Approach: A New Direction for Machine Learning in Criminal Law. Berkeley Technology Law Journal, Vol. 37.

Bell (2019). A Stone of Hope: Legal and Empirical Analysis of California Juvenile Lifer Parole Decisions. Harvard Civil Rights-Civil Liberties Review. Vol. 54. No. 1, Spring 2019.

Natural Language Processing

Hong, Chong, and Manning (2021). Learning from Limited Labels for Long Legal Dialogue. Natural Legal Language Processing Workshop at EMNLP 2021.

Hong, Voss, and Manning (2021). Challenges for Information Extraction from Dialogue in Criminal Law. NLP for Positive Impact at ACL 2021.

Todd, Voss, and Hong (2020). Unsupervised Anomaly Detection in Parole Hearings using Language Models. NLP and Computational Social Science at EMNLP 2020.

Current and Past Team Members

AJ Alvero, Han Lin Aung, Kristen Bell, Travis Chen, Jenny Hong, Christopher Manning, Nick McKeown, Manny Paredes, Jake Searcy, Graham Todd, Johan Ugander, Catalin Voss.

Oral History Text Analysis Project

With Natalie Jean Marine-Street and Estelle Freedman. The Oral History Text Analysis Project is study of history and memory of sexual violence through an analysis of over 4,000 oral history transcripts collected from over 20 collections. Directed research on computational lingustics approaches and advised undergraduate students on building software, such as annotation tools used by other oral historians.

Graph-based multi-word embeddings

With Christopher Potts. A framework for simultaneously node embeddings and word embeddings over graphs where node names are multi-word expressions conveying information that complements the graph structure.

Convex.jl

With Stephen Boyd and Madeleine Udell. An open-source modeling enviroment for disciplined convex programming in Julia. Convex.jl is one of the most starred Julia repositories and has been presented at ISMP, JuliaCon, HPTCDL, Bay Area Julia User Meetup, etc. [Paper] [Documentation] [GitHub]