PDF Optimization: How Does Google Handle PDFs, How Should You?
Are PDFs good for SEO? Does Google, or Googlebot, crawl PDFs and index them? Do PDF documents rank in Google search results as well as HTML pages? These are good questions which I’m sure you have wondered about, along with so many of us in the business of managing Web sites and planning for search. After all, the PDF format is a great way to store and deliver documents and maintain their formatting and design, and to discourage unauthorized edits. As such, PDFs have become pervasive and universal. You see them everywhere. Some sites, especially intranets, are packed with more PDFs for downloading than HTML pages.
So how does Google Relate to PDFs?
Google has been indexing PDFs for quite a few years already and is becoming quite good at it. But it’s very important to understand the difference between text-based PDFs and graphical PDFs. When you generate your PDF, use the option to make it text searchable. Here’s the test: Try a search within your PDF (CTRL-F, or use the PDF’s Edit/Find menu). If you can find and select actual words and characters (with your mouse) in the document, so can Google. If you can’t, then it’s an image, not text, like a fax. In that case, Google might still be able to OCR the doc and figure out many of the words, but it’s much more difficult and far less accurate. Don’t make Google work harder than necessary, if you are aiming for optimal indexing and, hopefully, ranking of your content.
So we know Google can crawl and index your text-based PDFs rather well. What about ranking?
How Does Google Rank PDF vs HTML Documents?
Much like with any Web page or document, when Google evaluates a link or a page, its algorithm tries to determine if this will be the best document to serve to the searcher. Will this be the best result for the user? Unfortunately, there is no black and white rule, but try to answer the question yourself. Clearly some document types are better suited to PDFs than others. It’s very common to see restaurant menus as PDFs, for example. Some people like to download them and print them, and there are hundreds of restaurant and menu aggregation sites and directories that like to grab these or link to them as well. Still, some people just don’t like PDFs no matter what. So it’s your job to consider if this is likely to be well received by your users or not. Will this be the best way to present your content? If you are confident it is, go for it. See what happens. You can always change it later. And you can often offer both and make one the canonical page. More on that option another time!
A Few Additional PDF Optimization Tips
- To facilitate indexing of any images in your PDFs, you should create HTML pages for them. Google does not index images from the PDF itself.
- Links in PDF files are usually treated similarly to links in HTML: they can pass PageRank for example. And Google may follow them. So use good anchor text as always and link to content that you want to expose and to which you want page rank to flow.
- PDFs have meta titles within the files. Optimize those as you would any HTML page. And be mindful of the anchor text of any links that point to your PDFs. Use meaningful keywords there, rather than “click here” to download, for example!
For more info, watch this YouTube Video, in which Matt Cutts of Google’s search team, discusses the best practices of PDF optimization.
And as always, let us know if we can help and how you do with your PDFs.