I often receive queries like:
I am converting the Word file of my book to a PDF (a bit later in epub). To be read on as many devices as possible, is PDF or PDF/A better? PDF/A-1a or PDF/A-1b?
This may seem like a rather nit-picky question, and the bottom-line answer is straightforward: stick to PDF/A to maximize portability, and the lower conformance level "b" is fine. But some interesting strategic points are illustrated by the details underlying this answer. The rest of this post delves into some of these details, reaching the possibly surprising conclusion that, even though Adobe invented the PDF format and supply the PDF software embedded in many devices, the bar of de facto PDF compatibility is now being set by Apple, not Adobe. A future post will look at how the situation with varying levels of PDF format support exemplifies control issues and other problems with open standards that evolve from proprietary origins.
On a side note, it's encouraging that the author posing this question was already savvy to the fact that he should ideally be offering epub, the open standard for reflow-centric digital publications. EPUB, built on Web standards including HTML and XML, was architected to enable optimized reading experiences on different-size screens.
But despite the rapid proliferation of epub support on devices, there are many reasons why it might be practical to publish a PDF format edition of an eBook. The reason hinted at by the "a bit later in epub" is expediency. PDF is a pre-paginated "typeset" format that is almost universally produced in the process of creating print (or print-on-demand) books, and tools for creating and assembling PDFs are widespread. So as a self-publishing author you probably already have, or be able to instantly create, a PDF of your book. By contrast, epub is a newer standard and tools are still emerging, and not yet quite push-button in nature, especially if your book has complex page layouts. You might perhaps want to put content out through a distribution channel that requires PDF (although self-publishing services ranging from Scribd to Lulu have recently added epub support, it is not yet as universally supported as PDF).
But despite the rapid proliferation of epub support on devices, there are many reasons why it might be practical to publish a PDF format edition of an eBook. The reason hinted at by the "a bit later in epub" is expediency. PDF is a pre-paginated "typeset" format that is almost universally produced in the process of creating print (or print-on-demand) books, and tools for creating and assembling PDFs are widespread. So as a self-publishing author you probably already have, or be able to instantly create, a PDF of your book. By contrast, epub is a newer standard and tools are still emerging, and not yet quite push-button in nature, especially if your book has complex page layouts. You might perhaps want to put content out through a distribution channel that requires PDF (although self-publishing services ranging from Scribd to Lulu have recently added epub support, it is not yet as universally supported as PDF).
OK, on to PDF format details. Adobe supplies the PDF software on the Amazon Kindle, Sony Reader, Barnes&Noble Nook, and many other devices, via their Reader Mobile SDK. The description of Adobe's RMSDK is rather vague about PDF support specifics, even in its FAQ. The FAQ for Adobe Digital Editions, Adobe's desktop eBook reading application (which utilizes the RMSDK engine), is a bit more detailed about PDF support, although it still hedges a bit:
Digital Editions supports a superset of ISO standard 19005-1 (PDF/A). PDF/A is designed to support more secure, long-term information archiving; it is based on a subset of PDF 1.4 (the version of PDF supported by Acrobat 5.0). Additional PDF capabilities in Digital Editions beyond PDF/A include basic encryption, DRM-based encryption, JBIG2 image compression, transparency, and compressed object streams. The intention is to support in Digital Editions those PDF features reasonably needed by eBooks and other commercially published content, balancing 100% coverage with a focus on small size and high performance...Both FAQs note that various enterprise-oriented features, such as interactive forms, security of the Livecycle PolicyServer variety, JavaScript, and digital signatures, are not supported.
So although there is some wiggle room, especially around future capabilities, this is relatively clear, and provides clear guidance that PDF/A (aka PDF 1.4 aka "Save As Acrobat 5 Compatible") can be safely employed.
The next question is what additional implementations of PDF might need to be considered beyond RMSDK, given the goal of publishing content that can be read on as many devices as possible. PDF has always been an openly published format, so there are many implementations of PDF support. Undoubtedly the most widely used beyond Adobe's is Apple Preview, the image and document viewing application built-in to OS/X on Macs and iPhones. While Adobe's enterprise-featured free Adobe Reader and its even heftier sibling Acrobat software run on Macs, many users prefer the nimbler, streamlined, and multi-format supporting Preview. As PDF support is built-in (including creation) to OS/X, many Mac users don't ever both to install Adobe Reader or Acrobat... and iPhone users don't even have these options. Given that iPhone is arguably the most popular device for eBook reading, and that any content being consumed on desktops will certainly end up being opened in Preview, support by Preview is almost certainly the most important consideration. And this is before the advent of the Apple Tablet.
Unfortunately, Apple apparently does not appear to document anywhere exactly what level of PDF is supported by Preview. The latest version of Preview is 5.0, which shipped with Snow Leopard (OS/X 10.6.0), although most users are likely to have an older version, and it's even less clear what is supported by Preview on iPhones. So what's a harried self-publishing author to do? Especially one who might use Windows desktops, and may be iPhone-less?
The answer is simple: look to Adobe's mobile PDF implementation, since it's forced to follow Apple's lead!. That is, there are millions of PDF files out in the wild that have been created by, and/or viewed on, OS/X, so any PDF language feature supported by Preview is likely one that publishers and end users will expect to be supported by Adobe's RMSDK. Thus, while Adobe's full software clearly defines the high end of PDF compatibility requirements, Apple has in effect set the bar for the low end requirements, a bar that even Adobe needs to meet. As evidence, consider that with the exception of DRM (which I will touch on momentarily), all the added features beyond PDF/A listed in Adobe's FAQ, such as basic encryption and transparency, are supported in Apple Preview. And after Apple Preview 5.0 shipped with support for JPEG 2000 images, several months later Adobe followed suit with the latest RMSDK 9.1 release. Coincidence? Unlikely.
So, if your file views properly on Adobe Digital Editions, you should be in good shape. Ideally you should test this, but if you save as PDF/A, that should be good enough. The features beyond PDF/A that are supported in this lower-bar of compatibility are in most cases not going to be critical, although they may leave your eBook a bit larger than it could be if it took full advantage of what one might consider to be the "Apple Preview de facto standard" for proliferated PDF format support.
As far as the difference between PDF/A-1a and PDF/A-1b, these represent two different conformance levels, PDF/A-1a indicating support for "Tagged PDF" data structures that provide for accessibility. Some reading software supports read-out-loud features for accessibility, and may also support the option of "reflowing" a PDF on a smaller screen. However, most PDF creation software - even some from Adobe - does not include "Tagged PDF" data structures. This is in part because the PDF creation often flows out of a printing process which does not have access to the high-level data model of the authoring application. And when these data structures are included, they are often incomplete or even inaccurate. As a result, PDF reading software that support accessibility and/or reflow views tend to use various heuristics to reconstruct the "reading order" of text - heuristics that in some cases may not even utilize, and almost certainly would not require, Tagged PDF. And, regardless, a PDF eBook is most likely not going to deliver a fantastic auditory rendition, since PDF is a "typeset at the factory" paper-replica format.
Regarding DRM, the goal of having content readable on as many devices as possible precludes DRM support, since Adobe's proprietary ACS4 DRM is not presently supported by Apple on Macs or iPhones. In any case, for almost any self-publishing author (and arguably major publishers) increased exposure to, and maximum convenience for, end users should outweigh any increase in piracy resulting from foregoing copy-limiting technology. And most pirated editions originate from scanned print books, and no DRM technology can protect against that.
This post has touched on only a few of the considerations around PDF format level compatibility. My next post will go over some additional issues, including the ISO-32000 standard, PDF Portfolios, and other bleeding-edge features, and draw some additional conclusions.