Do You Even Need to Do This?
Prerendering Snapshot Pages and Bot-Specific Serving
- Detect search engine bots, either by looking at the URL if you used #! ("hashbang") or by simply checking the user agent of the request.
- If the bot is detected, redirect the request to your rendering engine of choice, such as PhantomJS. This engine should wait until all of the AJAX content is loaded.
- Once loaded, take the source of the rendered page and output it back to the bot.
Ironically this is a form of cloaking – sending specific content to bots – often a no-no in SEO. This type of cloaking is considered "ethical," however, as the content is pretty much the same as what users would see. Hence the search engines are OK with it.
Google has a full AJAX crawling specification that covers the nuts and bolts. The basic idea is one either adds the #! identifier to your URLs or includes the HTML [meta name="fragment" content="!"] tag to the page header. This alerts Google that the page uses URL fragments and gets them to index the page.
As mentioned in my earlier post, History.pushState is another option for creating indexable AJAX URLs. PushState has issues including the fact that it is not supported by IE 9 – still a fairly widespread browser.
Potential Issues With Prerendering
There are a couple of things to look out for if you decide to go with prerendering:
- Bot detection. Make sure you are serving to all the bots, not just Googlebot (i.e. Bingbot, et al).
- Page load time. Similarly, if you run the snapshot page process at the time of the URL request from the bot, the page may load slowly. Page load speed is an increasingly important SEO ranking factor, so if your page appears to load slowly to the bot, you may be negatively impacted. This is why it’s desirable to cache the pages in advance. This has the added benefit of actually making your site appear faster to engines than it is to users.
- Batch processing. If you have a lot of pages, you may want to run the snapshot process at a specific time during off hours, or its own server. The process can be resource-intensive.
Checking Snapshot Pages
Since you are using bot detection, it is harder to verify that your process is actually working. One way to check your work is to use the "Fetch as Google" feature in Google Webmaster Tools. Enter the URL of a snapshot page (it may be different from the live URL shown to users) in GWT and check to see if Google can pull it correctly. This requires a live page, so plan accordingly.
Currently, Fetch as Google supports #! but not pushState URLs. If you URLs are static looking, you will have no problems.
Use the Google Webmaster Tools "Fetch as Google" utility to check your snapshot pages.
Building a full prerendering capability can be non-trivial. For a larger site it may involve a setting up a caching server, a database for the content, a service for the caching calls, a scheduler, and an administrative utility. Fortunately, several companies have come forward with solutions to address parts or all of the prerendering approach. Several that I am familiar with include:
- Prerender.io. Prerender.io runs a service that works with PhantomJS to host and serve snapshot pages. They take away the headache of running your own prerender server. They are reasonably priced based on the number of pages hosted and the frequency in which the snapshots are taken.
- Brombone.com. Brombone runs a similar service in which snapshot pages can be generated based on the timestamp for the URL in the sitemap or scheduled on a custom basis.
- AjaxSnapshots. AjaxSnapshots offers prerendering service with some nice configuration options. At the moment their website lists the pricing as free (at least for now), so there’s that.
Paid Search and Landing Pages
Furthermore, if your products or product content is found in a single page application, it can be a challenge to force this state via the paid search destination URL. Creating #! Or static looking URLs is pretty much a requirement here.