Wednesday, December 24, 2014

Advanced SEO for JavaScript Sites: Snapshot Pages

As JavaScript becomes ever more integrated with HTML on the Web, SEOs need to develop an understanding of how to make JavaScript sites search-friendly.
We covered some basic approaches to SEO for JavaScript in an earlier post. However, a complex subject deserves an in-depth treatment. So let’s look at the specifics behind the emerging #! / snapshots approach to JavaScript sites.

Do You Even Need to Do This?

Search engines are processing more and more JavaScript every day. That doesn’t mean that your AngularJS site will be indexed and ranked, however. The engines are now parsing some JavaScript, but complete indexing of all states of JavaScript Web apps seems a long way off. If your JavaScript is running in the browser and pulling content from the database, odds are the engines won’t see it.

Prerendering Snapshot Pages and Bot-Specific Serving

This is perhaps the most popular approach for client-side JavaScript SEO issues. The basic flow is as follows:
  1. Detect search engine bots, either by looking at the URL if you used #! ("hashbang") or by simply checking the user agent of the request.
  2. If the bot is detected, redirect the request to your rendering engine of choice, such as PhantomJS. This engine should wait until all of the AJAX content is loaded.
  3. Once loaded, take the source of the rendered page and output it back to the bot.
Ironically this is a form of cloaking – sending specific content to bots – often a no-no in SEO. This type of cloaking is considered "ethical," however, as the content is pretty much the same as what users would see. Hence the search engines are OK with it.
Google has a full AJAX crawling specification that covers the nuts and bolts. The basic idea is one either adds the #! identifier to your URLs or includes the HTML [meta name="fragment" content="!"] tag to the page header. This alerts Google that the page uses URL fragments and gets them to index the page.
As mentioned in my earlier post, History.pushState is another option for creating indexable AJAX URLs. PushState has issues including the fact that it is not supported by IE 9 – still a fairly widespread browser.

Potential Issues With Prerendering

There are a couple of things to look out for if you decide to go with prerendering:
  • Bot detection. Make sure you are serving to all the bots, not just Googlebot (i.e. Bingbot, et al).
  • Snapshot timing. Consider the fact that your JavaScript elements may take a while to process via PhantomJS. You may need to build in a delay to allow for the full content to load prior to saving the page to the cache. Otherwise your snapshot page may be incomplete or partially rendered.
  • Page load time. Similarly, if you run the snapshot page process at the time of the URL request from the bot, the page may load slowly. Page load speed is an increasingly important SEO ranking factor, so if your page appears to load slowly to the bot, you may be negatively impacted. This is why it’s desirable to cache the pages in advance. This has the added benefit of actually making your site appear faster to engines than it is to users.
  • Batch processing. If you have a lot of pages, you may want to run the snapshot process at a specific time during off hours, or its own server. The process can be resource-intensive.

Checking Snapshot Pages

Since you are using bot detection, it is harder to verify that your process is actually working. One way to check your work is to use the "Fetch as Google" feature in Google Webmaster Tools. Enter the URL of a snapshot page (it may be different from the live URL shown to users) in GWT and check to see if Google can pull it correctly. This requires a live page, so plan accordingly.
Currently, Fetch as Google supports #! but not pushState URLs. If you URLs are static looking, you will have no problems.
Use the Google Webmaster Tools "Fetch as Google" utility to check your snapshot pages.

Prerendering Services

Building a full prerendering capability can be non-trivial. For a larger site it may involve a setting up a caching server, a database for the content, a service for the caching calls, a scheduler, and an administrative utility. Fortunately, several companies have come forward with solutions to address parts or all of the prerendering approach. Several that I am familiar with include:
  • runs a service that works with PhantomJS to host and serve snapshot pages. They take away the headache of running your own prerender server. They are reasonably priced based on the number of pages hosted and the frequency in which the snapshots are taken.
  • Brombone runs a similar service in which snapshot pages can be generated based on the timestamp for the URL in the sitemap or scheduled on a custom basis.
  • AjaxSnapshots. AjaxSnapshots offers prerendering service with some nice configuration options. At the moment their website lists the pricing as free (at least for now), so there’s that.

Paid Search and Landing Pages

JavaScript sites have challenges with paid search as well as with SEO. For example, AdWords determines quality score based on, among other things, the content the AdWords bot sees on the page. One approach is to serve the snapshot page to the Google AdsBot (a full list of Google crawlers and user agents is available here).
Furthermore, if your products or product content is found in a single page application, it can be a challenge to force this state via the paid search destination URL. Creating #! Or static looking URLs is pretty much a requirement here.
As paid search landing pages often need to be highly tailored in order to convert well, it may be better in the long run to create dedicated pages for your PPC campaigns, leaving your core JavaScript Web experience to users and SEO.


JavaScript-heavy sites are here to stay. While it can be non-trivial to build a JavaScript site that works well in SEO, it’s even more work to try to fix an existing site that was built without SEO in mind. Until the frameworks and tools are improved to make it easy to incorporate SEO requirements, SEOs will need to work closely with developers to insure SEO is factored in. Getting SEO and JavaScript to live together in harmony may not be easy, but it’s worth it.

No comments: