Wednesday, January 28, 2009

How Google Makes The Net Suck

Some people like to compare developers to artists. When it comes to web development, some people say there's always a man behind the curtain. Whether you agree or not, there are definitely certain freedoms that web developers enjoy. As a web developer, what are the greatest limitations and obstacles in your way? Once it may have been browser quirks. Now maybe it's all those annoying users who still use IE6. However, I think the greatest obstacle to progress is Google.

Now Google would have you believe just the opposite. I do not think they are disingenuous. In a large organization, it's all too easy for different groups to have different motivations. But ask yourself this, how much money does Google from Chrome? What does Google make money from? That's easy: advertising on search. And that is what is hold us all back.

If you have endured my purple prose to this point, I will finally cut to the chase. One of the most important aspects of any web page is how its PageRank. If your web page is all about deep sea diving, where does it surface when somebody searches Google for deep sea diving? The black art of making your page get a higher PageRank has given birth to an entire cottage industry known as Search Engine Optimization (SEO.)

As a developer I have never given much thought to SEO. I always thought that SEO was about the content of the page, and web developers are not responsible for the content. We are responsible for retrieving/generating that content from all kinds of sources, as well as creating applications that are easy and intuitive for the user to interact with a meaningful way. But, if we go back to the deep sea diving example, we're not responsible for providing information about deep sea diving. Heck you are lucky if most developers even known how to swim, but I digress.

But I was wrong. SEO is not just about content. It is about structure. If you want a good PageRank, then quality content about deep sea diving will lead to other people linking to your page and that will increase your PageRank. But there are much more instantly gratifying things you can do. For example, your page should a title and it better contain the term deep sea diving. No big deal, right? The title is really just part of the template outside of the main contents of the page. Its value has little effect on anything, besides PageRank that is. However, it gets worse.

To maximize your PageRank, then immediately after your page's body tag you should have an H1 tag whose contents should contain the term deep sea diving. Oh maybe you put the phrase on the page, but you put it in a div that styled quite nicely? Not good enough. It needs to be in an H1 tag. Maybe you used some JavaScript to create the H1 tag? That is no good at all. Why? Because The All-Mighty Googlebot does not understand how the page looks to a user. It only understand basic HTML constructs. That's right, it's time to party like it's 1999.

Oh, maybe your organization hired an artist who created a killer deep sea diving logo and you load it on to the page as an image? Not good enough. If you put deep sea diving as the alt text, that will win you some bonus points from the Googlebot, but it is still dwarfed by the rewards you could receive by busting out the H1. Nothing compares to the mighty H1 tag. And don't just put that H1 tag anywhere on the page. Heck you might even get penalized for having more than one! Nope only one, and it better appear (in the HTML source code) as close to the body tag as possible.

Ok, so maybe you give in and put the catch phrase in an H1 tag. That wasn't too bad, right? Now back to your regularly scheduled hacking? Not so fast. Do you have some hierarchical information on the page? Sections, headings, menus, etc? How are you going to do those? Again you better not even think about using things like JavaScript to create them dynamically. Nope, they have got to be static on the page. Back to divs, spans (maybe a table or two), along with some oh-so-clever CSS? Forget about it. Let me introduce you to H1's other friends: H2, H3, H4, H5, H6. That's right, if you want that damn Googlebot to "understand" the hierarchy of concepts on your page, then you better put away your divs and spans.

Maybe you think that's going overboard, but it's not. Do you have a section on your deep sea diving page called "Gear" ? Then if you want to show up on a search for deep sea diving gear, you better have the term Gear wrapped in an H2 or an H3 tag.

What about RIA technologies? Again, if you dynamically create things with JavaScript, it will get picked up, but it is non-optimal. You have to do things the way that the Googlebot wants it done to get best results. What about Flash or Silverlight or JavaFX? Flash will get you screwed on about the same level as JavaScript. Silverlight or Java might as well be black holes. Whatever is in there, is never getting out.

There are tricks you can employ like progressive enhancement. There you do things the way that the Googlebot wants them, then dynamically obliterate that garbage and replace it with rich content that your users actually want. This can backfire. If the Googlebot figures out that you are tricking it, then it will banish you to purgatory.

What if you just make a great web application that users will love and don't bother to worry about the Googlebot? That's fine of course, it just means that people will not find your application by searching for it. Is your business model and marketing efforts robust enough to not need SEO? Yeah, I didn't think so.

So now do you understand? The Googlebot ties your hands, or at the very least makes you jump through all kinds of hoops. There are all these great technologies you could use to make your site as interactive as any desktop application, but The Googlebot does not like this. You've got to play his game whether you like it or not.


Josh McDonald said...

Hi Michael,

I honestly think nearly every one of these things is a good thing. It forces people to write apps that are apps, and docs that are docs. Google doesn't work best with semantic markup because they're jerks, it works best with semantic markup because it's nothing but a huge collection of small stupid processes, which need all the help they can get to understand what your content actually means. More sensible uses of modern CSS with JavaScript to enhance the older browser user's experience rather than the other way around is the way to go.

People need to realise that their page is information, not an image. It doesn't need to look exactly the same on every browser, so long as it looks good and the content is there. The 'bot is pretty clever about detecting people lying to it, and I don't think it'll be punishing people for run-time client-side visual modifications via JavaScript and alternative sources using <link/> etc.

I look forward to the xhtml web being a source of information, with many different user experiences and ways to view that information layered on top of it.

Ferdinand said...

Sounds to me that the googlebot wants pages as I want them. I run by default without plugins or javascript and only enable it when I have to. Anything in a plugin should be considered a black hole and not a part of a webpage because it takes over the control from the browser. Javascript should be an optional enhancement and not a requirement for using the webpage.
By the way. Don't you think that large important webpages would complain to google if they were restricted during development of webpages?

Michael Galpin said...

@Ferdinand Glad you like living in 1998 :-) And the Googlebot is not as much of a problem for big websites. I happen to work such a site. If you're a hugely popular site, then you don't need Google (as much) to push traffic your way. You can get away with being unfriendly to the Googlebot. This is not true for most sites though. It's all the small guys who get screwed the most by Google.