Papers
I primarily publish in two areas of computer science: data management and human-computer interaction. Top data management venues are VLDB, SIGMOD, and CIDR. Top HCI venues for my research are UIST, CHI, and CSCW.
Co-first author is my undergrad mentee
I contributed as a mentor
3 representative papers
- Multi-Objective Agentic Rewrites for Unstructured Data ProcessingPreprintCo-first author is my undergrad mentee
- RAG Without the Lag: Interactive Debugging for Retrieval-Augmented Generation PipelinesUnder revision at CHI 2026Co-first author is my undergrad mentee
- Task Cascades for Efficient Unstructured Data ProcessingTo appear at SIGMOD 2026
- Cut Costs, Not Accuracy: LLM-Powered Data Processing with GuaranteesTo appear at SIGMOD 2026
- Supporting Our AI Overlords: Redesigning Data Systems to be Agent-FirstTo appear at CIDR 2026
- Steering Semantic Data Processing with DocWranglerUIST 2025🏆 Best Paper Honorable Mention
- Rethinking Dataset Discovery with DataScoutUIST 2025I contributed as a mentor
- DocETL: Agentic Query Rewriting and Evaluation for Complex Document ProcessingVLDB 2025
- LLM-Powered Proactive Data SystemsIEEE Data Engineering Bulletin 2025
- Querying Templatized Document Collections with Large Language ModelsICDE 2025
- PromptEvals: A Dataset of Assertions and Guardrails for Custom Production Large Language Model PipelinesNAACL 2025Co-first author is my undergrad mentee
- Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human PreferencesUIST 2024
- SPADE: Synthesizing Data Quality Assertions for Large Language Model PipelinesVLDB 2024
- What We've Learned From a Year of Building with LLMsO'Reilly Radar
- Building Reactive Large Language Model Pipelines with MotionSIGMOD 2024 (Demo)
- It Took Longer Than I Was Expecting: Why Is Dataset Search Still So Hard?HILDA 2024 (Workshop on Human-in-the-Loop Data Analytics)
- Revisiting Prompt Engineering via Declarative CrowdsourcingCIDR 2024
- Operationalizing Machine Learning: An Interview StudyCSCW 2024
- Towards Observability for Production Machine Learning PipelinesVLDB 2023
- Bolt-on, Compact, and Rapid Program Slicing for NotebooksVLDB 2023
- Automatic and Precise Data Validation for Machine LearningCIKM 2023
- Rethinking Streaming Machine Learning EvaluationICLR 2022: Workshop on ML Evaluation Standards
- Enabling certification of verification-agnostic networks via memory-efficient semidefinite programmingNeurIPS 2020
- Adversarial examples that fool both computer vision and time-limited humansNIPS 2018
- No classification without representation: Assessing geodiversity issues in open data sets for the developing worldNIPS 2017: Workshop on Machine Learning for the Developing World