Volume 3, Issue 14 (April 11, 2014)

In This Issue:

01) Link Acquisition Tip No. 79
02) SEO Question No. 141
03) Tech Tip No. 72
04) Seen in the Wild
05) Tech Tip No. 73
06) SEO Question No. 142
07) Proprietary Rights
08) Administrative Stuff



SEO Theory now provides an archive of back issues (published in PDF format) that you can purchase at a discount.


* LINK ACQUISITION TIP NO. 79 (Secret Article Giveaway) *

This is a variation on Link Acquisition Tip No. 1, published in Volume 1, Issue 8 (June 27, 2012).  In that article I explained that you can wipe a site clean and start over with fresh content, but distribute old evergreen articles to other Websites.  Just make sure there is a useful link someone in the content you hand out that points back to your new site.

If, like me, you occasionally find that other people are just coming in and grabbing articles off your Website without your permission and republishing them completely on their own sites you can sometimes leverage this theft to your advantage.

First run a few quality assurance tests on the other site.  If they are just someone who really likes your article and they mean no harm they will usually write an introductory sentence or two explaining why they took the article.  That’s a GOOD sign, but if they do this often that’s a BAD sign.

If you don’t find any other stolen content on the site make sure it doesn’t add “rel=’nofollow’” to the links or use “robots” directives to prevent robots from following the links.

Make sure they left your self-referential links in place.

Assuming this is just some enthusiastic copier who really loves your work, add a “rel=’canonical’” to YOUR version of the page and point the canonical reference to THEIR site.  Your once internal links will now be legitimate external links.

Some people who find their sites are being scraped add “rel=’canonical’” directives to their HTML code in the hope that the scrapers won’t notice or care.  That’s an acceptable practice for a site that is being constantly scraped but most so-called “scrapers” today are just using RSS feeds to aggregate content.  You won’t be able to slip a canonical statement into your RSS feed (and this is why I don’t publish full feeds).

The idea is NOT to reward people who take your content but rather to get as much value from the theft as possible.  You can only canonicalize to 1 Website at a time for any given page so if an article is scraped by several people you probably just want to contact them all and ask that they embed the “rel=’canonical’” tags on their copies of your article.

* SEO QUESTION NO. 141 (How Many Sites Can We Safely Use the Same Theme on?) *

“Michael, I want to build 5-8 sites for [keyword redacted].  I was thinking of using the [redacted] theme on all of them.  Is that safe?  What are the risks?  I don’t want to get a penalty.”

I think this is a good question but not for the reason it was asked.  Any time I see someone asking about “how safe” something is (for search optimization) I get the feeling they are trying to run as close to the limit of what is permitted as possible.  And yet experience teaches us that a search engine may tolerate a practice on one site while it penalizes another site for the same thing.  Why do they do that?  Probably due to a mix of human judgment, serendipity, and mitigating factors.

__1] Human judgment may vary from spam team member to spam team member.
__2] Serendipity may mean your site is manually reviewed and mine isn’t simply because of pure blind luck/random chance.
__3] Mitigating factors may include the fact you’re a multi-billion dollar brand and I’m just some affiliate trying to earn a few dollars.

When you’re asking if what you want to do is safe, or how safe it is, I assume you are trying to be at least somewhat aggressive in your marketing.  You’re looking for an advantage, even if it’s only some efficiency in your production process.  Saving time and money is okay, but being sloppy gets a lot of people into trouble.

That said, I don’t know of any reason why you cannot use the same theme on 1,000 Websites.  Some themes are used on 100,000+ Websites and people are not being penalized for using them.  My partner Randy Ray and I try to use a variety of themes and design styles but we can each recognize the other’s work very quickly as “classic Randy Ray” or “classic Michael Martinez” design.  We all have design preferences, biases, limitations, etc. that influence the way our Websites look.  I don’t think that is being risky.

On the other hand, the question came with a little more information.  That is, these proposed sites are being constructed around a very specific topic.  I have to ask how you plan to populate those Websites and manage their presentation.  I’ll give you the benefit of the doubt and assume you’re not just going to publish the same content on all the sites; but will your editorial calendars be identical?

Even if you hire a different dedicated freelance writer for each site I think you run a huge risk of FAILING TO DIFFERENTIATE if you give them all the same topic assignments.  If you don’t differentiate your Websites from each other then they are trying to poach each other’s visitors.  A small portfolio of Websites can work together by covering different (sub)-topics or they can compete with each other for the same eyeballs.

Sure, you can dream about controlling all ten listings on the front page of a search result but you’re still limiting the size of your potential audience.  And you’re asking for an algorithmic downgrade or manual review, in my opinion, if you just roll out cookie-cutter Websites.  That’s a churn-and-burn tactic.

Many affiliate marketers follow a very basic, extremely rigid design philosophy for their sites.  They use a brazenly ugly page layout and they put a huge table loaded with affiliate links near the top of their most important pages.  Nothing screams “made-for-affiliate-referrals” more than that table.  But it works.  Some affiliates are making millions of dollars from this design style so you’re not going to get them to abandon it.

The page layout is not nearly as important as what you put on the page to differentiate it from other Websites.  If you’re building several similar sites at the same time then I think it’s important that you differentiate them from each other.  That improves your chances of differentiating them from your competitors’ sites, too.

Some cost-cutting is fine.  Just don’t let efficiency in production make all the business decisions for your portfolio.

* TECH TIP NO. 72 (How to Win a Wikipedia Revert War) *

If you have ever made a change at Wikipedia only to see someone else come in and undo that change, you know the frustration of having to deal with idiots and an idiotic system.  Wikipedia has a rule against anyone making more than 3 reversions on a specific change.  So the first person to do a revert will always win.

When you’re trying to correct false information in Wikipedia the odds are stacked against you but if you know 2-3 other people (all in separate cities/regions) who have established Wikipedia accounts you can coordinate with them to out-revert someone who doesn’t want a change made to an article.

Each of you logs in to Wikipedia as normal and sets up a WATCH alert for the article in question.

Then 1 of you makes the desired change.

If a reversion occurs the 2nd person in the group reverts the article.  With each reversion you leave an explanatory comment about the undesired reversion in the TALK page for the article.

If 3 or 4 people are undoing the reversion by 1 or 2 other people your team will win by virtue of superior numbers.  If necessary you can also use Wikipedia’s system to complain about the people who are undoing the changes.

Just remember that:

__1] Your changes must be acceptable to Wikipedia rules in the first place
__2] Your justifications will be challenged and must be ironclad
__3] Each of the people on your team should be able to show that they have a legitimate interest in updating the article

This strategy works best if your team actually takes the article under its wing and makes substantial improvements prior to fixing whatever problem it is you’re trying to resolve.  The most common issue in my experience is that someone has a reputation management problem.

You are not permitted to hide facts on Wikipedia.  If someone robbed a bank and the Wikipedia community accepts the article about that person they are stuck with being the topic of a Wikipedia article.  You may, however, be able to change the wording of the pertinent sentences that describe the bank robbery by removing or contracting the person’s name.

Wikipedia takes a dim view of marketers trying to influence its content but has a high tolerance for petty and manipulative article editing by people who have personal grudges and biases.  Even with 3 allies you may in extreme situations find you are on the losing side of “good faith consensus”.  The rule of thumb for any system that relies on good faith assumptions is that it is being manipulated by people with secret agendae, and Wikipedia has been widely manipulated by people with secret agendas.


Barry Schwartz spotted an interesting discussion in the Google Webmaster Forums (see Barry’s summary here: http://cli.gs/q1rve7r) where JohnMu said that Google is largely ignoring so-called “Web 2.0 links” from sites like YouTube, Pinterest, etc.  Normally when John makes a comment like that I read it as he is limiting his meaning to the context of the original poster’s Website but this time I think I am inclined to agree with Barry, in that it looks like John means they really do ignore all those easy-to-get links.

In another Webmaster forum discussion someone complained that Google was displaying the HTTPS version of their site instead of the HTTP version in search results.  We dithered around for a couple of days and then I looked at the site in question a little more closely.  It became obvious to me that the site owner never intended to create an HTTPS version of their site.  They are a Bluehost client and apparently accidentally set up an HTTPS identity that points at the same folder as the main site.  After I pointed this out to the OP they had Bluehost fix the configuration.  What I find interesting is that Google preferred the HTTPS version over the HTTP version, even though navigation for the HTTPS version was a little wonky.

So you have all probably heard about the Heartbleed security vulnerability by now but if you have been changing passwords you may be doing that too soon.  Mashable has published a list of major service providers who were and were not affected by the OpenSSL vulnerability (Cf. http://cli.gs/3o3xtn9).  Although it seems risky, you should wait a few more days if you’re using services that may be affected before changing passwords.  I checked the security on my dedicated server and thankfully discovered that we are not vulnerable.  Meanwhile, someone keeps trying to hack into user accounts at Vbulletin.Org (the support community for Vbulletin users).  Members are receiving frequent notices of their accounts being locked out (I received three the day before I sent out this newsletter).  It’s not clear if the hackers think that there is a connection between Heartbleed and user accounts on forums and blogs (there is NOT).  The vulnerability, if it exists, is in your server software
or any application that handles its own security protocols (very few would do that).

Over at Marketing Land Ric Dragon published an interesting article (Cf. http://cli.gs/gxmhi98) in which he reported on the results of an experiment run by Will Critchlow.  Critchlow’s company, Distilled, has seen some impressive results by publishing high-value content on a blog without using categories.  What does that mean?  Critchlow (one of the smarter guys in this business) thinks it may be time to do away with blog categories.  Lisa Barone wrote a rebuttal (Cf. http://cli.gs/jjo86s4) on Overit.  Here’s the thing: blog categories really DO earn less traffic than individual content pieces, but if you have configured your site to hide the categories from search engines then the lack of data will blind you to the reality that blog categories CAN and DO earn links and traffic.  It comes down to organization, presentation, and the value you create for the visitor in your category pages.  Many people install SEO plugins on their blogs — or use “SEO-ready” themes — that by de
fault NOINDEX category, tag, date archive, and author pages.  It is really stupid to do that by default; but if you’re making a business decision on the basis of the data you see in your analytics, make sure you’re getting COMPLETE data before making the decision.

The botnet attacks on “xmlrpc.php” have become so widespread that at least one hosting provider is now advising its clients to use this plugin (Cf. https://wordpress.org/plugins/disable-xml-rpc/) to disable the script.  ”xmlrpc.php” is the script you need to use when you publish to your blog remotely (like from your desktop).  The JETPACK plugin also uses it (1 time) to authenticate your site with WordPress.com when you activate certain features.  ”xmlrpc.php” also facilitates pingbacks between blogs.  Still, as recently as last month (March 2014) security experts have noted that as many as 160,000 WordPress blogs have been used in DDoS attacks by exploiting the vulnerability of this script (Cf. http://cli.gs/vnijlus).  You should take precautions to protect your site from being used in DDoS attacks but you also need to understand what this script is used for in case you are actually using it for legitimate reasons.

Several wallpaper galleries lost a significant amount of Google traffic over the past 1-2 weeks.  I did not look very closely at the Websites reporting the traffic losses but it appears these were lesser-known sites that may have scraped their wallpapers from other sources.  One site owner complained about seeing traffic drop from 1.2 million uniques per day to 300,000 uniques per day.

* TECH TIP NO. 73 (Rebuilding a WordPress Media Library) *

This is a very common problem and the fact that a solution has been available for years without people knowing about it underscores just how hard it is to find good information on the Web.  I have been running queries year after year looking for a WordPress plugin that will rebuild a media library.  It turns out I was simply using the wrong queries and got lucky this time.

The problem: You have old articles on a blog and after moving the blog (incorrectly) your image links don’t show up any more.  The images are right there in your “UPLOADS” folder but WordPress cannot see them.

If you’re using a plugin like “WordPress Importer” it doesn’t actually grab any images and upload them for you but it does have an option that reads like it does (it just repopulates the Media Library with data about images that are SUPPOSED to be there).  If you don’t select that option when you IMPORT content then all your image links will be broken.  If you DO select the option then — just to be on the safe side — the Importer will alter the image links so that “image.jpg” references become “image1.jpg” references.  And if you move the blog twice without fixing the problem then “image1.jpg” references become “image11.jpg” references.  It’s very annoying but with this problem you can at least go through your posts and change the image names.

But what do you do when you’ve got a library full of images you have used all over the place and the images don’t appear in the library?  You can’t just upload the images again because your URLs are different (they go into the current month’s image folder, if you allow WordPress to compartmentalize your images like that).

There is a plugin called ADD FROM SERVER (Cf. http://wordpress.org/plugins/add-from-server/) that will allow you to “walk” through the directory tree of your UPLOADS folder and flag images to be added to the media library.  So now when you restore an old backup you don’t have to worry about WordPress “forgetting” that it actually has the images being referenced.  Sure, you can edit the blog posts and fix the broken image file names but if you want to reuse those images in the future you really want them to be shown in the Media Library.  This plugin allows you to do that.

I did find, however, that people run into a problem with this plugin. This has been going on for years.  An this week it finally struck me, too.  No matter what you do, ADD FROM SERVER cannot add an image to the media library even though “it is RIGHT THERE”.  The plugin gives you an error message about not being able to create a directory.  The spooky thing is that the directory already exists.  I found several discussions on WordPress.org going back for years where people could not get the plugin to work.

It turns out that if you develop a WordPress blog offline and upload it there may be something in SETTINGS==>MEDIA that shows a local hard drive path (or it could be an absolute hard drive path from a previous hosting account).  When you move the site the chances that your new hosting provider will use the same file naming conventions on their hard drive are very, very slim.  So naturally when ADD FROM SERVER tries to add an image to the Media Library it will try to create a directory it cannot find.  The MEDIA setting is telling your blog to set up the library in a non-existent folder path.

Simply deleting the old path fixes the problem.

I can’t begin to count how many old WordPress sites I have had to update and move — knowing nothing of their history — only to find that upon finishing the move all the images are either not there or the references are broken and the Media Library is empty.  I’m now to the point where I can pretty much fix these problems if I have to without too much effort.

* SEO QUESTION NO. 142 (The Case of the Quirky Freshness Factor) *

“Michael, we have an old article that used to do well but time has passed and it stopped ranking even for its title (which was a popular kw ph.).  Last month we saw a spike in traffic.  Some little blog had linked to it and suddenly we were in the game again.  What do you think happened?  This is a popular query.”

Looking a little more deeply into this I discovered that the query is one of those that probably receives “freshness” evaluation.  I think many people assume that “Query Deserves Freshness” means Google is looking for fresh content.  I’m not so sure of that.

The site in question had actually tried to reoptimize for the targeted query.  They embedded links in old articles on “high PR” sites hoping that the additional anchor text would help but there wasn’t much of a boost.  But suddenly along comes Little Blogger Polly and she links to the old article and POOF! there is a resurgence in traffic.  And Little Blogger Polly just doesn’t have that much Toolbar PR to speak of.

It’s possible that the next time Google updates its Toolbar PR data Polly’s blog will see a boost — but in theory a Website could see a sudden spike in internal PageRank that goes away (maybe Google penalizes all the links) and then the Toolbar PR — when it’s finally updated — doesn’t change.

The article in question is dropping off in the SERPs again.  I see this happen on my own sites.  I don’t really agonize over it because people are constantly publishing new content about established topics.  If you want to dominate the SERPs consistently your best strategy is to break new ground.

But what causes a resurgence of old content?  I think it may be related to what sustains old content: new links.  The newly crawled links are probably added to the crawl queues so an old article’s crawl priority may be temporarily adjusted when the search engine sees a new link pointing to old content.

The PR value of the Website provding the link may still matter — after all if it has too little internal PageRank the link won’t pass value.  But if Google sees a new link, especially from a trusted Website, pointing to old content the old content may receive a little boost in crawl priority.

Does crawl priority correlate to QDF in some way?  I don’t know.  In the original QDF patent (Cf. http://cli.gs/4khx5us) Google wrote: “According to another aspect, a method for scoring documents is provided. The method may include determining an age of linkage data associated with a linked document and ranking the linked document based on a decaying function of the age of the linkage data.”  In fact, much of the patent discusses how links pointing to a document can be used to evaluate freshness.

So think about it.  You surreptitiously place links in old articles hoping for a boost.  Eventually Google will come around and find those links and factor them into the link graph.  But the query you’re targeting is looking for “freshness”.  Yes, the links are fresh but the content is not — and if Google is not crawling those old articles very often then how can it know how long the links have been there?  Also, the sudden appearance of new links in old content may trigger some trust filtering (or at least evaluation).

Whereas if a new blog is publishing new content often enough that Google crawls that blog often then it can detect a close correlation between the publication of the content and the link; hence, Google may be able to trust that link better in terms of understanding that it was probably an original link.

Where Freshness is a factor it is conceivable that Google wants to credit only links that it feels are legitimately fresh — not just recently discovered by its crawls but also recently published in tandem with the content hosts it (as verified by whatever algorithms Google uses to confirm such close correlation).

Judging a Website by the Toolbar PR assigned to its root URL is a very shallow way of evaluating crawl priorities and potential.  The search engine may be looking directly at the leaf page where the link is embedded rather than at the root URL of the Website (after all, how many Homepage Backlink networks has Google taken down?).  If the linking page is relatively new and there is no reason to believe the link was added later then the link may be viewed as editorially given.

Hence, a fresh/new link POINTING TO an old article for a topic that still deserves freshness may be seen as a sign of renewed interest in and continued relevance of an old article.  This would match the pattern of behavior we have seen where Websites with an inordinate amount of links continue to receive an inordinate amount of new links — the so-called “rich keep getting richer principle”.

To compensate for staleness of content you could try rewriting old articles, leaving their URLs intact, and then acquiring new links to the rewritten content.  The new links should be published in new content.  This is a way of keeping a URL “evergreen”, if not the content itself.

But remember that some linking content may also be updated from time to time.  For example, an “Interesting Links” page on a random Website may be updated several times a year.  If the search engine sees new links added (and dead links removed) from that page over time it may be more inclined to trust those links than new links that suddenly appear in an old article.

This is all very speculative and hard to prove but think about what is EASIEST to achieve: that is most likely to be seen as SPAMMIEST (or LEAST TRUSTWORTHY) by a search engine.  Just because a search engine doesn’t trust a link doesn’t mean it will be seen as spam; but if you create a lot of untrustworthy links then you’re probably going about it the wrong way.