GenAI developments pose existential questions for publishers

By Sonali Verma

INMA

Toronto, Ontario, Canada

Connect      

Greetings, everyone! I hope you have had a chance to enjoy a bit of a break around this time of the year before plunging back into the fast-changing world of GenAI. 

I’d like to tell you today about a few interesting things that have caught my eye recently. They carry potentially serious implications for our business. 

One is a threat that we perhaps did not anticipate when we paywalled our content and assumed it was safe from bots. The other, much more in line with the sunny weather we are enjoying in the northern hemisphere, is a ray of hope for any news brand trying to create new products — and new revenue streams — with AI.

Thanks for reading,

Sonali

Chatbots can replicate paywalled content

ChatGPT, Perplexity, and Grok can provide summaries of paywalled content, according to a recent investigation. They replicate content from paywalled news articles without actually accessing the content behind the paywalls.

Instead, they use publicly available fragments — such as quoted snippets, social media re-posts, archives, metadata, or archived fragments — and reconstruct full-text or accurate summaries.

This practice is often effective even for content behind paywalls like The Atlantic, The New York Times, or The Financial Times — news brands that have invested extensively in building a hard paywall that is difficult to circumvent.

Is this report, by Henk Van Ess, an internationally recognised expert in online research methods, true? I decided to test it. The blog post I linked to above is paywalled. So, purely for the purpose of verification, I posted the link in ChatGPT and asked for a summary of it. 

The result above is what I got, and it roughly matches what Van Ess himself has said about the investigation. Perplexity told me much the same, adding: “While older debates focused on AI training on paywalled or copyrighted content, the latest and more urgent concern is that AI bots are exploiting the public digital ecosystem to deliver premium content.”

Screenshot of query to Perplexity and its response.
Screenshot of query to Perplexity and its response.

As Van Ess says: We see “AI systems performing real-time searches to actively reconstruct paywalled articles from live, untrained data sources — content they’ve never encountered during their original training.

“Most chatbots have rules not to break paywalls, and say so loudly, but the internal reasoning documents obtained during this investigation show they’re systematically planning and executing these circumvention operations while maintaining plausible deniability about their methods.”

The report found AI systems successfully reconstructed about half of paywalled content across a sampling of top-tier publications, especially popular stories that have already been widely discussed online. 

OK, fine, you say. It’s only half of the content. But take a moment to think about how much content your publication produces — and how much of that the typical reader reads anyway. Is it really more than 50%?

This comes just days after Cloudflare, a prominent content delivery network, said it would block bots from accessing publishers’ content by default and instead ask them to “pay per crawl.”

Cloudflare’s supporters include Condé Nast, Dotdash Meredith (now People), Ziff Davis, The Associated Press, Gannett, The Atlantic, Fortune, and Time. This development has the potential to hinder AI chatbots’ ability to harvest data for training and search purposes.

But, as you can see, the tech companies are not accessing the content directly in order to know what the article said. Instead, they are crawling social media sites and other publicly available forums — and then using the power of generative AI to reassemble the gist of the articles based on screenshots and reader comments.

And also: It appears ChatGPT can now autonomously bypass Cloudflare’s “I am not a robot” test, which is one of the most common security measures employed by sites to block automated traffic. If the LLM can now deceive online verification systems, it means Web sites now need to reevaluate their human-testing methods. (To be fair, LLM agents are perhaps not robots in the strictest sense of the word, but the test is really to see if they are humans or machines.) 

Screenshot from Reddit, where ChatGPT explained the process of getting around Cloudflare’s anti-bot verification measures.
Screenshot from Reddit, where ChatGPT explained the process of getting around Cloudflare’s anti-bot verification measures.

How does one deal with this?

We will be addressing these and other existential questions that the news media industry faces at our Media and Tech Week in San Francisco in October, where we will be hearing from the vice-president of product at Cloudflare and executives from Scalepost and Prorata — two companies that provide ways for publishers to charge Big Tech for using content for their LLMs.

If you have read this far, this is a week that you’ll want to spend with us.  

GenAI as the engine for new products — and revenue

Speaking of excellent conferences: One of the many outstanding speakers at INMA’s Mumbai conference in July was the head of digital design and product at The Times of India, Rohit Garg, who outlined dozens of ways AI is helping one of India’s largest news publishers go faster and further than ever before.

“We always had diverse user needs,” Garg said. “We always had volume. We always had velocity. We always had consistency that we were creating at scale. What AI is helping us do is doing all of these things more efficiently. It helps us ensure that we don't crack under chaos.”

That sounds like an efficiency play, right? But at Times Internet, AI is actually helping the team create new personalised products, with an eye to unlocking new revenue.

For example, they have created an AI anchor (which is, incidentally, quite common in the Indian news industry) for a streaming financial news show.

The next step is personalisation.

“What we think and what we are researching is a future wherein, imagine you as a person put in your five, seven, 10 stocks and you pay for it. And then there is a live anchor which generates this summary, this synopsis specifically for you,” he said. “This can run 24/7. This minimises editorial workload.”

The Times is also using AI to drive personalised push notifications — where different people receive different alerts based on their preferences. The headlines are also rewritten for different types of audiences to increase engagement.

How about news products for meeting a different user need? The Times’ team has created a completely AI-generated satirical take on news in a scroll-first format for social media, where it has garnered more than 1.5 million views in a month.

Taken from a presentation by Rohit Garg, head of product and design at The Times of India (Digital).
Taken from a presentation by Rohit Garg, head of product and design at The Times of India (Digital).

Another creative use of GenAI: creating a crossword puzzle based on the previous day’s news as well as a new game called Connect. 

“The advent of AI has helped us create some of these games at a much faster pace than we were able to do before. Till some time back, we were only able to create games like Sudoku or crosswords using AI. Now, we are using AI to think of new games. A recent example is the game called Connect, which went live last week. It took us 15 days to ideate and 15 days to implement it.

“Within a month, we created a game which never existed earlier, and we have 1,000 copies of it stored in our database because the AI algorithm has helped us create 1,000 copies which are unique to each other. And we are already seeing great traction on that game. It is, in fact, the best-performing game right now on our platform.”

The Times has also used GenAI to create a product with strong audio-visual elements to reach new audiences.

“Can we have news rewritten for young minds between 8 to 11 years old wherein the news is written to make sure that it resonates with them? Not the Gen Zs but somebody who is in school right now. It’s a kids-first AI rewriting engine. All the stories are rewritten keeping kids in mind,” Garg said.

In advertising, The Times has further improved upon its ad chat product, which answers readers’ questions about the product being advertised instead of displaying simply a box ad or a banner. It is now integrated with WhatsApp, a platform used by more than 850 million people in India.

Taken from a presentation by Rohit Garg, head of product and design at The Times of India (Digital).
Taken from a presentation by Rohit Garg, head of product and design at The Times of India (Digital).

“This also leads to automated lead capture,” Garg said, while the back end captures insights from the conversations.

Worthwhile links

About this newsletter

Today’s newsletter is written by Sonali Verma, based in Toronto, and lead for the INMA Generative AI Initiative. Sonali will share research, case studies, and thought leadership on the topic of generative AI and how it relates to all areas of news media.

This newsletter is a public face of the Generative AI Initiative by INMA, outlined here. E-mail Sonali at sonali.verma@inma.org or connect with her on INMA’s Slack channel with thoughts, suggestions, and questions.

About Sonali Verma

By continuing to browse or by clicking “ACCEPT,” you agree to the storing of cookies on your device to enhance your site experience. To learn more about how we use cookies, please see our privacy policy.
x

I ACCEPT