Projects
Human-originated data distributions from crowdsourced labels
Crowdsourced labels are a unique source of label errors that is a sweet spot between adversarial (too hard), and many synthetic models (too easy). This project leverages crowdsourced labels as a source of evaluation for large language models.
Chong, Hong, and Manning (2022). Detecting Label Errors by using Pre-Trained Language Models. EMNLP 2022.
Project Recon
Project Recon aims to champion a new approach for machine learning in the criminal justice system, with a pilot of 35,000 parole cases from California’s Board of Parole Hearings. I spearheaded this project along with my colleague Catalin Voss from 2017–2023. Kristen Bell now leads Project Recon.
Law Publications
Bell, Hong, Voss, Todd, Ugander, McKeown, Alvero, and Searcy (2023). Using Machine Learning to Investigate Attorneys and Commissioners at Parole-Release Hearings. Working Paper.
Bell, Hong, McKeown, and Voss (2021). The Recon Approach: A New Direction for Machine Learning in Criminal Law. Berkeley Technology Law Journal, Vol. 37.
Bell (2019). A Stone of Hope: Legal and Empirical Analysis of California Juvenile Lifer Parole Decisions. Harvard Civil Rights-Civil Liberties Review. Vol. 54. No. 1, Spring 2019.
Natural Language Processing
Hong, Chong, and Manning (2021). Learning from Limited Labels for Long Legal Dialogue. Natural Legal Language Processing Workshop at EMNLP 2021.
Hong, Voss, and Manning (2021). Challenges for Information Extraction from Dialogue in Criminal Law. NLP for Positive Impact at ACL 2021.
Todd, Voss, and Hong (2020). Unsupervised Anomaly Detection in Parole Hearings using Language Models. NLP and Computational Social Science at EMNLP 2020.
Current and Past Team Members
AJ Alvero, Han Lin Aung, Kristen Bell, Travis Chen, Jenny Hong, Christopher Manning, Nick McKeown, Manny Paredes, Jake Searcy, Graham Todd, Johan Ugander, Catalin Voss.
Oral History Text Analysis Project
With Natalie Jean Marine-Street and Estelle Freedman. The Oral History Text Analysis Project is study of history and memory of sexual violence through an analysis of over 4,000 oral history transcripts collected from over 20 collections. Directed research on computational lingustics approaches and advised undergraduate students on building software, such as annotation tools used by other oral historians.
Graph-based multi-word embeddings
With Christopher Potts. A framework for simultaneously node embeddings and word embeddings over graphs where node names are multi-word expressions conveying information that complements the graph structure.
Convex.jl
With Stephen Boyd and Madeleine Udell. An open-source modeling enviroment for disciplined convex programming in Julia. Convex.jl is one of the most starred Julia repositories and has been presented at ISMP, JuliaCon, HPTCDL, Bay Area Julia User Meetup, etc. [Paper] [Documentation] [GitHub]