Tianze Shi Home Publications

CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases

Tao Yu, Rui Zhang, Heyang Er, Suyi Li, Eric Xue, Bo Pang, Xi Victoria Lin, Yi Chern Tan, Tianze Shi, Zihan Li, Youxuan Jiang, Michihiro Yasunaga, Sungrok Shim, Tao Chen, Alexander Fabbri, Zifan Li, Luyao Chen, Yuwen Zhang, Shreya Dixit, Vincent Zhang, Caiming Xiong, Richard Socher, Walter Lasecki and Dragomir Radev

In EMNLP (2019)

Abstract
We present CoSQL, a corpus for building cross-domain, general-purpose database (DB) querying dialogue systems. It consists of 30k+ turns plus 10k+ annotated SQL queries, obtained from a Wizard-of-Oz (WOZ) collection of 3k dialogues querying 200 complex DBs spanning 138 domains. Each dialogue simulates a real-world DB query scenario with a crowd worker as a user exploring the DB and a SQL expert retrieving answers with SQL, clarifying ambiguous questions, or otherwise informing of unanswerable questions. When user questions are answerable by SQL, the expert describes the SQL and execution results to the user, hence maintaining a natural interaction flow. CoSQL introduces new challenges compared to existing task-oriented dialogue datasets: (1) the dialogue states are grounded in SQL, a domain-independent executable representation, instead of domain-specific slot value pairs, and (2) because testing is done on unseen databases, success requires generalizing to new domains. CoSQL includes three tasks: SQL-grounded dialogue state tracking, response generation from query results, and user dialogue act prediction. We evaluate a set of strong baselines for each task and show that CoSQL presents significant challenges for future research. The dataset, baselines, and leaderboard will be released at https://yale-lily.github.io/cosql.

[pdf] [arXiv]

Bibtex

@inproceedings{yu-etal-2019-cosql,
    title = "{C}o{SQL}: A Conversational Text-to-{SQL} Challenge Towards Cross-Domain Natural Language Interfaces to Databases",
    author = "Yu, Tao  and
    Zhang, Rui  and
    Er, Heyang  and
    Li, Suyi  and
    Xue, Eric  and
    Pang, Bo  and
    Lin, Xi Victoria  and
    Tan, Yi Chern  and
    Shi, Tianze  and
    Li, Zihan  and
    Jiang, Youxuan  and
    Yasunaga, Michihiro  and
    Shim, Sungrok  and
    Chen, Tao  and
    Fabbri, Alexander  and
    Li, Zifan  and
    Chen, Luyao  and
    Zhang, Yuwen  and
    Dixit, Shreya  and
    Zhang, Vincent  and
    Xiong, Caiming  and
    Socher, Richard  and
    Lasecki, Walter  and
    Radev, Dragomir",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/D19-1204",
    doi = "10.18653/v1/D19-1204",
    pages = "1962--1979",
}

Tianze Shi @ Cornell University. Built with jekyll