CoAuthor: a human-AI collaborative writing dataset to improve language tools

Large Language Models (LMs) offer new opportunities for interface design. The great language models have arguably advanced to the point where they can be compared to a real writer. The models do a great job of understanding the subject. Recent LMs (such as GPT-2 and GPT-3) can create a wide range of prose and conversation with unparalleled fluidity. These models can be refined to become more proficient in specific activities, such as composing emails or health consultations.

Language models can greatly help humans in their writing processes. People have already started integrating these technologies into their workflows, with some posts being created using these tools. In this sense, Stanford researchers have created Co-author: an interface, a dataset and an experience all in one.

According to the researchers, these technologies work best when they complement rather than replace human writing. The goal was not to develop a system that could help users write better and faster, but rather to aid in the writing process and to research the successes and failures of these systems. At the same time users work, CoAuthor saves write sessions key by key and creates an extensive database. When the author starts typing, he or she can press the “tab” key, and the system will provide five recommendations generated by GPT-3. Researchers employed over 60 people to generate over 1,440 stories and articles, each supported by CoAuthor.

The writer can then accept, amend or reject the ideas according to his sensitivity. A survey followed each writing session to rate writer satisfaction. The authors said CoAuthor’s comments and insights were often appreciated as new and valuable. Ideas were sometimes ignored because they took the writer down an unexpected road. Also, they sometimes thought the ideas were too repetitive or ambiguous, which didn’t add much value to their tales or essays.

Great language patterns have been discovered to help people quickly compose content without grammatical errors and with better vocabulary. The interface gives authors a prompt (black text) and an example of GPT-3 during each session. They freely write ideas (brown) from GPT-3 (blue); now they can accept or reject suggestions and edit accepted suggestions or past texts in any order they want. (see diagram 1)

All interactions between the authors and the system were time-stamped and key-logged. This key-by-key replay helped designers study the same sessions from many perspectives using this rich, complex, and fine-grained interaction dataset to better understand the productive potential of massive language models. .


Some basic statistics of the generated data:

  • Stories and essays: 418 words
  • Number of requests: 11.8 requests per write session
  • Suggestion acceptance rate: 72.3%
  • Percentage of text written by humans: 72.6%

The dataset, as well as an interface for rehearsing writing sessions, are publicly available on

In this article, researchers identified a critical criterion for understanding the generative capabilities of LMs for interface design. They suggested that collecting and analyzing large sets of interaction data is a viable technique because it can cover a wide variety of interaction situations and allow for various interpretations of exceptional collaboration. They expect more researchers to contribute to the development of CoAuthor and its potential.

This Article Is Based On The Research Paper 'CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities'. All Credit For This Research Goes To The Researchers of This Project. Check out the paper, blog and project.

Please Don't Forget To Join Our ML Subreddit

Scott R. Banks