Google had given statements that its Panda and Penguin anti-spam fighting filters were running constantly. Now it says there’s still delays between data pushes.
Google has suggested that having to wait months between for Panda and Penguin updates would be a thing of the past, since these were supposed to be happening on an ongoing basis now. But the company flip-flopped about this last week. Both still work on a periodic basis, with months elapsing between updates.
How Panda & Penguin Work
Before getting to the flip-flop, let’s do a reminder of how Panda and Penguin have traditionally worked.
Panda is a filter Google uses to identify low-quality or “thin” content that manages to rank well despite Google’s regular algorithms. To combat this, Google effectively runs the entire web through the Panda filter on periodic basis. Sites that Panda catches will no longer perform as well as before.
The filter hasn’t been run on a regular basis. It hasn’t operated in real-time, nor on a daily or weekly basis. Rather, it’s been a process that’s happened anywhere from a monthly to a quarterly basis. In other words, every month or two or three or four, Google reruns the web sites its knows about through Panda again, to see if any sites have improved, if new ones should be caught and to deal with false positives.
The filter itself also often changes, when one of these updates happens. Google sometimes makes minor tweaks to it, designed (it hopes) to improve the ability to catch what Google considers poor content. Sometimes these changes are more dramatic. When the more dramatic changes are introduced, it’s common that many more sites get impacted.
As for the Penguin, that’s a filter aimed at severe web spam, especially unnatural links. Like Panda, it runs on a periodic basis. Everything described above about Panda is the same for Penguin, other than the type of spam it targets.
The Panda & Penguin Waiting Game
Publishers hit by Panda and Penguin can change their sites all they want to improve, but it’s a wasted effort until the next Panda or Penguin updates happens. Any improvements, if they are the right ones, won’t register with Google’s search results until they are cycled again through Panda or Penguin and new information is “pushed” into Google’s overall ranking algorithm.
For example, let’s say there’s a Panda Update that happens in the month of February, which hits a publisher’s site. The next day, the publisher scrambles to drop content they think is to blame and make other changes. Even if they’ve made all the right moves, they’ll remain penalized by Panda. They won’t have a chance to escape until Panda is run again. Say that happens in April. This means the site has a two month wait until the changes it made will benefit it.
The same is true for someone hit by Penguin. Until that is cycled again, they have to wait until changes happen.
No Notifications Of Being Hit By Panda Or Penguin
Complicating this is the fact that Google doesn’t provide any notifications to publishers if they’ve been hit by Panda or Penguin. Typically, the best way a publisher knows if they’ve been hit is if they see a sudden drop in traffic that comes right after Google confirms generally that a Panda or Penguin update has gone live. That’s a pretty good clue that Panda or Penguin has hit them.
Unfortunately, Google doesn’t always provide general confirmations of when such updates happen. Worse, sometimes Google may push multiple updates at the same time. Someone might think they were hit by Panda when instead it was Penguin or some other update.
This is one reason why we’ve repeatedly encouraged Google to provide notifications within Google Webmaster Tools if a site is hit by Panda or Penguin. It seems a nobrainer. Just let the publisher know they’ve been hit, so they know what to correct in the same way publishers get told if they were hit by a penalty caused by human review.
Google’s been hesitant to do this. When asked, one concern has been that it might help enable spammers. That’s a pretty weak argument. Anyone who is sophisticated enough to be trying spam Google on an industrial basis pretty much knows if they were hit by Panda or Penguin. They’d learn nothing they didn’t already know from an official confirmation. But such confirmations would be a huge benefit to many unsophisticated publishers who do want to do the right thing by Google’s guidelines.
Another issue has been that it’s unnecessary. If Panda or Penguin are really constantly updated without month-long delays, then anyone who’s been hit by one (or both) can make changes and then wait a week or so to see if they get an improvement. In a world of constant Panda or Penguin updates, a week is plenty of time for changes to be registered and have an impact on search results.
When Google Said Panda & Penguin Were Constantly Being Updated
The last confirmed Penguin update — Penguin 3.0 — happened in October 2014. It had been a year since the Penguin update before that. Much weirdness followed Penguin 3.0. First, Google confirmed it was fully rolled out within a day, which is typical. Six weeks later, in December, Google said that Penguin 3.0 was still rolling out. About two weeks after that, Google gave us a statement that Penguin would be constantly updated:
That last big update is still rolling out — though really there won’t be a particularly distinct end-point to the activity, since Penguin is shifting to more continuous updates. The idea is to keep optimizing as we go now.
In March 2015, during the Meet The Search Engines panel at our SMX search conference event, Google webmaster trends analyst Gary Illyes gave a similar statement about Panda, that it was constantly being updated. The last confirmed update — Panda 4.1 — had been in September 2014.
I moderated that panel, and I did several follow-up questions in response to the statement. There was no confusion in my mind that Google was saying Panda was operating in real-time mode. I even summarized things on stage like this (I’m paraphrasing from what I remember):
In the past, Google would take all the pages it knew about, run them through the Panda filter periodically, then push an update. People might then get caught when the update was live, or freed up.
Now, the Panda filter is running at or near real-time. Google comes across a page, there’s no waiting around to run it with others through Panda. It’s effectively checked at the time it’s recrawled and everything updated pro/con when entered into the index.
Separately from that, there’s the make-up of Panda itself. That’s not constantly being tweaked and changed. But periodically, the Panda filter might get an overhaul — and when that happens, there might feel like there’s more of an old-school Panda update, because the overall filter has shifted.
Panda going to real-time updates is obviously big news. But we didn’t run a story declaring this, because we wanted to be absolutely sure this was the case, especially as what was being said didn’t seem to match the reality of what publishers were reporting. So we went back to Google to confirm this.
We got back this as an initial statement:
Most of our ranking algorithms, including Panda, have many moving parts. In the case of Panda, some parts are running real-time, so results may be affected at any time. However, the underlying data for Panda hasn’t been refreshed for some time. We are constantly working on improving our algorithms and we expect to refresh the data in the coming months.
All clear? It wasn’t to us, which is why we didn’t run it. Instead, we checked back again to see if we could get a better explanation of what exactly is real-time and what isn’t.
Google Says No Real-time Updates For Panda & Penguin
Eventually, we got a reply that said nothing. Literally, that Google had nothing more to say about that. We didn’t give up, however. Since then, we’ve been trying to get someone to go on the record to clarify the situation. That finally happened at the end of last week — not directly to us but via one of the regular Google Webmaster Hangouts that happen.
In that, Google webmaster trends analyst John Mueller was asked about the whole real-time situation, and the response came back that both Panda and Penguin are not real-time, that Google still does a manual push of them:
I think both of those algorithms [Panda and Penguin] currently are not updating the data regularly. So that is something for both of them, where we kind of have to push the updates as well.
I bolded the “I think” part because that still puts this statement in the realm of uncertainty. Heck, maybe next week Google will tell us all again that we’re back to real-time or “everflux,” as it’s sometimes called.
That’s the situation as we know it — both Panda and Penguin are not pushing updates in real-time, so that if you’re hit by one of them, you’ll have to wait weeks or months until a new push happens.
We’ll go back to Google again now that we’ve recapped this, to see if they can explain the flip-flopping and definitively reconfirm what Mueller has said.
I’ll also reiterate what I’ve said many times before: Google should simply tell publishers if they’ve been hit by Panda or Penguin. Both are major filters that have severe impacts on search visibility. Disclosing to publishers if they’ve been hit is aligned with Google’s previous actions to reveal manual penalties. Just do it.