Can I have your data?

We’re often asked, “I have a cool idea for a project, but I need tons of NYT crossword data. Can I have yours?”

Thanks to this blog, instead of writing long explanations, I can just point people to this post. Score!

The short answer is “no.”

The long answer is, “No. Well, maybe. But probably not.”

The longer answer is that we’ve put many thousands of person-hours into collecting and correcting this data and we’d rather not just give it away, but also, we can’t. Not without explicit permission from the Times.

This opens up many questions about data ownership and fair use, but also, some ethical ones. I’ll leave the latter to your spiritual advisor and your conscience, but here’s how we understand the legal issues. (I’m not a lawyer.)

XWord Info is not part of the New York Times. We operate under a contractual agreement to be able to use their data in specific ways. We have an obligation to take reasonable precautions to prevent it from leaking into the public domain. Yes, you can screen scrape the site and there’s nothing we can do about that, but we can’t just hand over our data files. (Please don’t screen scrape our site!)

The “well, maybe” answer is because on occasion the NYT has relented. Each time, though, it was for legitimate research purposes. If you get permission, you’ll have to sign a Non-Disclosure Agreement, you’ll promise to share your research results when published, and so on.

The NYT claims ownership of each puzzle and marks each with a copyright notice which we dutifully duplicate on our site. The whole puzzle belongs to them. You can’t erase your answers, photocopy the puzzles, and sell them yourself. That much seems obvious.

But what exactly is protected? Here’s my understanding based on discussions with lawyers over the years.

  • The shape of the grid (arrangement of blocks)? No.
  • The collection of clues in a puzzle? Yes.
  • An individual clue? Uncertain. Not litigated.
  • The list of answer words in a puzzle? No.

How do blogs get away with posting complete answer grids? Is that fair use? Probably yes but that question is now moot. Bloggers have been posting answers for well over a decade and publishers have not taken steps to stop them.

Another common question, “Why does XWord Info only show NYT puzzles?” Years ago, I tried to get permission from some other venues. All declined. In a way, it’s too bad, but by concentrating entirely on NYT we’ve been able to assemble a comprehensive archive of that one publisher.

We’ll look at other issues around constructor/editor/publisher rights in a future post. Stay tuned.

Your thoughts?