Generative Text AI Will Flood The Web With Spam
AI is going to give super powers to Blackhat SEO, breaking online search as we know it. RIP magic box at the top of the browser where we all reflexively type “how to [blank]”, “[blank] near me”, “best [blank] Reddit”, and anything else we seek a semi-reliable answer to at any given moment. Online search results are about to get flooded with Astroturf and spam content at a level never before possible.
People have discussed Google Search results going down hill for years. A lot of factors influence why Google search lost some of its magic, I’ve written thousands of words on the topic. Experts have universal consensus of two factors for search quality decline, the volume of things on the internet (quantity) and the lower average caliber of all those things (quality). Not the only factors, but everyone from former Google execs to an ad agency demon like myself agree those have major impact on Google’s search product.
If this was only a deluge of content from real human people posting on Twitter, TiKToK and poorly edited blogs the companies in the business of sorting an parsing could handle it. The problem is attempted manipulation of ranking within those companies indexes of the web. AI is about to scale those blackhat systems.
AI Will Clog The Internet’s Toilets
“Typeface, a startup developing an AI-powered dashboard for drafting marketing copy and images, emerged from stealth this week with $65 million in venture equity backing”, so begins the TechCrunch article about the Typeface, Inc. funding round. Typeface is just one of many companies that offer AI tools to create content like job listings, blog posts, and social media posts. These tools are going to clog the web’s toilet, filling the digital world with crap.
Generative text AI is impressive in that it can “create” content that looks unique, and is in perfect English, but it’s all plagiarized, and often inaccurate. I’m not saying ChatGPT, Jasper, or Typeface are setting out to enable blackhat SEO. But the tools will be used to generate volumes of pages that will fill websites that exist only to influence purchasing decisions with no concern for user value.
Into Blackhat SEO
Quick note, while I have dabbled in blackhat SEO Push ROI, the agency I work for never crossed into that world. Push ROI explicitly no longer offers SEO as a service, because the future of AI will make it impossible to meet client expectations of SEO without crossing the line from optimization and strategy to blatantly deceptive manipulation.
I point all this out to avoid people arguing that I don’t know what I’m talking about, or that I’m trying to frame myself as a good actor among sewer rats. I’m not a guy named Erik from a Gaston Leroux novel, I won’t be performing in the subterranean sponge.
Let’s talk about the dark magic used to game search results. Primarily private blog networks (PBNs), and click through rate (CTR) manipulation. I’ll be vague enough about specific tactics that this doesn’t become a guide to wrecking the internet for personal gain. Also since I don’t want a war with a bunch of the worse blackhat’s in the SEO industry I’m not naming names unless they’ve been covered by prior reporting.
What Is CTR Manipulation
CTR manipulation is a tactic using bots or real humans to coordinate clicks on a search engine results page(SERP) to influence a websites ranking for a given term. Think of it this way, when you Google search “Facebook”, just as several hundred million people in the U.S. do every month, Google knows most of those people are looking for Facebook.com. User signals are not the only data point Google uses to order search results, but they are an important factor.
Google has a clear grasp of what percentage of users will click a given result based on position. Let’s say the 8th position on a SERP is expected to get about 6.13% of clicks, but starts getting 11.5% of the clicks for the search. That page may hop around a bit in the results so Google can gather data on how users interact with the result. But if the search result is being engaged with more than expected it will climb in the rankings.
I’ve seen sites where an SEO vendor clearly used CTR manipulation often badly. Here’s a pro tip, if I or any other marketing person can glance at analytics and see that CTR manipulation was used to try and game rankings Google knows the same. But a lighter touch can still be effective, and difficult to detect.
I cannot talk much more about this tactic without creating a guide to using it. CTR manipulation isn’t primarily an AI issue, but it is a catalyst that will enable unfettered use of AI for marketing material.
What Are PBNs
A PBN is a network of websites that exists primarily to link to other websites to help rank them in Google and other search engines. These networks have been on the internet for a very long time, tremendously variable in terms of how spammy they appear. If you need an overview of how search engines historically used links to index and organize the web read this.
PBNs can be built in a few ways. Typically by buying expired domains with some reasonable number of inbound links. Those domains are launched as websites. Once the sites are indexed by Google it’s just a matter of adding the outbound links to the newly created website.
I’m not speaking based on reading a blog post. I’ve created PBNs, and they can rank websites quickly in Google. If someone knows what they are doing, the networks are also nearly indetectable by search engines and those outside of search engines have no idea the sites exist. At one time if you stumbled onto a PBN it was pretty obvious.
Even now many PBNs are setup at a low cost without the effort to hide them from those who know how to look. Slow sites, using simple WordPress or HTML themes, with a list of blogposts usually comprised of spun content from other websites with inserted links. Historically PBN sites were rarely found unless someone went looking for them, that’s not why they existed.
The above screenshot shows a site from a pretty low effort PBN I found. The awkward phrasing like, “Triple cots are very helpful for those families where there is restricted space accessible and there are three youngsters sharing a room.” is the result of content spinning. That page exists only to link out to another website with the anchor text ‘“triple bunk beds”.
This is part of a network of sites with a rather notable footprint. I could show you graphs of how these sites are linked together or I can just tell you that the about pages of many sites on this network list an email address to buy links. Since the latter takes fewer words to explain, have another image.
I assume links from these sites are devalued by search engines, as this PBN has a huge algorithmically detectable footprint. The overlap of the outbound links from the sites being the biggest give away. Third party tools used to track websites backlinks (AHrefs, Majestic and SEMrush as examples) are often blocked from crawling PBNs making it difficult for researchers to track how effective these networks are without buying links from them.
With effort, PBNs can be nearly undetectable algorithmically. Caution around hosting, registration, how links are added to the sites, and blocking crawlers for tools that track links, leave only content and user behavior to give the nature of the sites away. With enough content, and fake user behavior these sites become real enough to compete with human run websites.
PBN sites with the goal of getting real human traffic are a relatively new phenomenon. But the perception is that the traffic of websites is a factor for the value of outbound links, so many of these PBN sites attempt to attract at least some traffic, often augmenting that with CTR manipulation.
Some in the SEO world already have networks of hundreds of Google News indexed PBN websites, access to which can be acquired for a monthly fee. To avoid promoting, or starting a war here’s a nameless example.
Being in the Google news index doesn’t guarantee ranking for any given term, even within Google news, but on preponderance of evidence these links do help with search engine rankings.
Running all those sites costs money, and takes content, both problems are solve by charging SEO providers to place articles. With enough content and a bit of real traffic on these sites they gain more than just the power to rank other websites. Each page has the power to influence real humans.
Manipulating Search Becomes Manipulating People
Jon Christian busted several writers for selling links in major publications a few years ago. That report hardly covered the extent of premium link selling. Months after Christian’s article got writers booted from Forbes, Inc, Entrepreneur and others, I got emails from a new company. That company staffed with many of the same people offered the same old link selling service.
A PBN breaks Google’s guidelines, but buying media coverage crosses a line into deceiving humans. I know people who bought links in Forbes. They wanted the logo on a website and to message investors, and customers that they were in Forbes. It wasn’t about the link. All this has facilitated a mass rebranding of mediocre SEO “experts” to mediocre PR “experts”.
The deliberately nameless site shown above offers guaranteed media coverage in 30 publications with a 24 hour turn around time. The site’s about page says the mission is to keep PR firms alive. They do so by allowing paid placement in prestigious sounding outlets like Wall Street Times and Chicago Journal. Don’t confuse those with Wall Street Journal or Chicago Business Journal.
While some of the publications this sites sells are technically controlled circulation magazines. At this moment most of the publications this site sells access to are little more than a way to pad a monthly PR report. These sites are likely to be built up over time into a category that is difficult to name. Not exactly a PBN, or pink-slime journalism, but also not a typical controlled circulation magazine. Certainly not the journalistic outlets they impersonate.
It’s easy to imagine sites like these being propped up with AI generated content under fake bylines. Some major publications are already using AI to create and publish steaming piles of… content. Men’s Journal and CNET have been panned for using AI to publish inaccurate articles. It’s doubtful the marketers who’ve long launched sites with pages of spun content each containing exactly one link are going to care more about accuracy than the owners of major publications.
Sites like Reviewed for USA Today, and Wirecutter for The New York Times make money from affiliate programs. But I’ve never felt that the reviews are impacted by whatever affiliate program is best. Real testing goes into those reviews. When published those reviews are sucked into generative chat AIs, and can be regenerated as original content.
Mass creation of review sites where the only thing that matters is the affiliate program are just a few clicks away. The PBNs that once hid in the trafficless part of the web can seem as real as the reviews from USA Today. AI can spit out unlimited pages of content that looks unique, and is in perfect English, each offering a chance to earn a little cash.
To preview what that will look like, try to find the best web hosting company. Seriously searching for “best web hosting” is the fastest way to learn what web hosts have the best affiliate commissions. Imagine that but for reviews of everything.
Don’t Forget Politics
Earlier I lamented a future of sites created only to influence purchasing decisions. But other decisions can be influenced. I offered some data analysis for two articles by Steven Monacelli published inThe Texas Observer ( 1, 2) covering several dark money political groups in Dallas.
Since some of these groups appear litigious (and vengeful) I want to clarify, dark money means It’s not clear who funds the groups. I’m not accusing any of these groups of using PBNs, AI generated content or CTR manipulation. This is about the tools that exist and the possibility of groups with unknown funding an unknown intent being able to influence political discourse.
In the current state of the world, marketers build huge networks of websites to influence sales. Major publications are publishing AI generated content containing factual errors. Links in household name media outlets can be bought and sold.
I don’t think anyone will like what happens when a very small group can build and maintain thousands of sites, propped up by nearly cost free AI generated content, funded by affiliate commissions, and able to muddy the waters of any political issue at will. That is the future these generative AI tools are on pace to create. A vomit-filled snow globe of information where accuracy and origin are unknowable.
Article by Mason Pelt of Push ROI. First published in MasonPelt.com on March 17, 2023.