3/4/2024: BIG UPDATE! I just heard back from Brandon Kraft, who describes himself on Mastodon as “Director of Engineering with Automattic working on the Jetpack plugin for #Wordpress.” Kraft had been very helpful answering people’s questions last week, and today he posted, “we’ve audited and confirmed that LLMs did not get Jetpack data for training.” (Bold face mine.) In other words, Kraft is saying that Automattic did not authorize the use of WordPress posts via Jetpack and Firehose for training generative “A.I.” He further explains, “we do not grant access to companies for model training,” and “the AI we’d okay is for “trend analysis” or sentiment analysis things….” You can read Kraft’s entire post here.
This is a great relief to me — if accurate, it means that my WordPress posts were not directly sold to generative “A.I.” companies by Automattic. I am still glad I’ve disabled the Firehose setting in my Jetpack installation – I don’t see any benefit to giving up any data for any reason. And of course I disabled the “A.I.” sharing options on my WordPress.com and Tumblr blogs. (You can find instructions on how to do this at the end of this post.) But at least I don’t have to worry that my data was already sold.
For now, I’ve left the post below intact but inserted commentary noting where this new information changes things. I have also changed “debacle” to “situation” in the headline of this article.
TL/DR: Automattic does seem to be planning to provide WordPress.com and Tumblr posts to generative “A.I” companies for model training. (And I’m still not happy about that!) But an Automattic executive has said that they did not provide WordPress posts via Jetpack’s Firehose to generative “A.I” companies for model training.
_____________________________________
I’m angry.
I started blogging in the late 1990s, hand-coding posts in chronological order on my HTML 1.1 website before I even knew the word “blog.” I upgraded to MovableType around 1999 to better manage my sites and then upgraded to WordPress.org installation years later when it seemed like that was the best way forward. I kept blogging at gregpak.com throughout the long lean years when social media seemed to take over and I’ve become a born again public champion of blogging, starting a new personal blog at gregpak.com/gpn/ as social media has become overwhelmed with disinformation, harassment, and generative “A.I.” garbage.
So I felt sick when I read a 404Media report that Automattic, the company that runs WordPress.com and Tumblr, allegedly had plans to feed user posts to “A.I.” companies for use in building their databases.
To be clear on my biases, I strongly dislike generative “A.I.” for what I view as its hijacking of the work of actual humans without permission, its obscene power consumption, its spread of falsehood and nonsense, and its destruction of livelihoods and viable services. So I hate the idea of my own writing and images feeding so-called “A.I.” services.
Thankfully, after some searching, I found out how to opt out of Automattic’s “A.I.” sharing program for my WordPress.com personal blog (this very site) and my seldom-used Tumblr account. (Please scroll to the bottom of this post if you’re looking for the details of how I did my best to opt out.)
I was also heartened by Automattic’s own assurance that “[w]e are not including content from sites hosted elsewhere even if they use Automattic plugins like Jetpack or WooCommerce.” My gregpak.com business site seems to fit that bill — it’s hosted elsewhere, using a WordPress installation from WordPress.org, and it uses Jetpack, which apparently means that Automattic will not be serving my posts to “A.I.” companies. Great!
But 404Media just reported that through a separate program called the “Firehose,” Automattic has apparently already been providing “A.I.” companies with posts from WordPress.com blogs and other blogs that use Automattic’s Jetpack plug in.
03/04/2024 UPDATE: Please see the note at the top of this post with the comment from Brandon Kraft that says that Automattic hasn’t actually provided any data for “A.I.” training via Jetpack and Firehose. A big relief, if accurate!
A seemingly official WordPress.com Developer page states, “Firehose is a stream of posts — averaging 1 million/day — from the tens of millions of websites published on WordPress.com. Posts are also available for Jetpack-powered WordPress(.org) sites, through a separate feed.” According to 404Media, Automattic “quietly changed the language of a developer page” in September 2023 to note, “These streams are intended for partners like search engines, artificial intelligence (AI) products and market intelligence providers who would like to ingest a real-time stream of new content from a wide spectrum of publishers.”
I had no idea that Jetpack was feeding my posts into this “Firehose” for sale to other companies. So I did some searching and found this page that describes how to opt out of Jetpack’s “Enhanced Distribution” module that feeds the Firehose. Which I obviously immediately opted out of.
Automattic says “We’re not accepting any new customers for Firehose, and are in the process of winding down the service for all current customers.” But it doesn’t say what “A.I.” companies might have already bought the posts of WordPress and Jetpack users and whether discontinuing the service means those posts will be removed from their databases.
I think most of us have been generally aware that “A.I.” companies have likely been vacuuming up everything they can without permission or payment. But I had no idea that Automattic, a company I had never thought to have any worries about, could be actively selling my posts to them.
The entire thing feels unethical and immoral at best.
03/04/2024 UPDATE: Once again, please see the note at the top of this post with the comment from Brandon Kraft that says that Automattic hasn’t actually provided any data for “A.I.” training via Jetpack and Firehose. A big relief, if accurate!
Several key points, just my opinion!
1. Automattic should never have sold anyone’s posts to “A.I.” companies to begin with. I can’t imagine any user wants this. It feels like a terrible trick to discover it’s apparently been done to us without our knowledge. UPDATE: As noted above, Brandon Kraft from Automattic says the company has not sold posts for “A.I.” model training, so this point is less pointed.
2. Any policy that enables sales of posts to “A.I.” companies should be opt-in instead of opt-out, meaning by default it should NOT be allowed. By default, it IS allowed. Again, that feels like a terrible trick that preys on the uninformed and otherwise occupied. UPDATE: This point still stands regarding the first plan discussed in this post.
3. Automattic should tell everyone whose posts might have been sold under the “Firehose” program which, if any, A.I. companies bought their posts and what posts they bought. Did the Firehose give up our entire blog histories? Or just the posts made since September 2023? These are important details and we deserve answers. UPDATE: As noted above, Brandon Kraft from Automattic says the company has not sold posts for “A.I.” model training, so this point no longer applies. Whew!
4. Automattic should ensure that once users opt out of the Firehose, their posts will be removed from the databases of any “A.I.” companies that might have bought them. UPDATE: As noted above, Brandon Kraft from Automattic says the company has not sold posts for “A.I.” model training, so this point no longer applies. Whew!
How to Opt Out of These “Services”
This is what I did to opt out of everything I could opt out of. Please know that this is just the best I could figure out as of this writing and I may have made errors and the procedures might change over time, so I recommend you verify these methods on your own before doing this yourself.
To opt out of Automattic’s plan to share Tumblr posts with “A.I” companies:
I went to “Settings,” then clicked on the icon for my blog under “Blogs,” then scrolled to “Visibility” and found an option to click the setting that reads “Prevent third-party sharing.”
To opt out of Automattic’s plan to share WordPress.com posts with “A.I” companies:
I went to “Settings” and clicked “Prevent third-party sharing” under “Public.”
To opt out of Automattic’s “Firehose” program that affects blogs that use Jetpack:
I followed the instructions at this Jetpack.com link to opt out for my externally hosted site.
For good measure:
I also followed Neil Clarke’s helpful instructions for updating the robots.txt file on my self-hosted website to discourage “A.I.” bots from scraping my posts. This is no guarantee — apparently bots are not required to follow the instructions in robots.txt files. But it’s something.
UPDATE 3/4/24: Walter Haydock lists some additional “A.I.” bots that could be added to a robots.txt file. But he also states, “It’s likely you will be penalized in search result or social media rankings as a result of this type of blocking,” citing as evidence a Substack tool for blocking “A.I.” bots that includes the caveat “blocking training may limit your publication’s discoverability in tools and search engines that return AI-generated results.” I don’t have the expertise to fully evaluate that claim, but I’ve made the personal decision for now that it’s more valuable for me to block “A.I.” bots than be included in “A.I.” driven search results. Anyone considering this procedure should do their own research and make their own choices.
Thanks for reading and all the best to everyone similarly affected by all of this. I’ll continue to keep an eye out for any developments and will update this post accordingly. And if you see any inaccuracies with this post, please feel free to reach out to me via Mastodon or Bluesky and I will do my best to make any necessary corrections.
