From 2280c4f6d1993bd8940f10f7a1ef0f9f22f6c518 Mon Sep 17 00:00:00 2001 From: Jay Berkenbilt Date: Sat, 28 Jul 2012 19:25:42 -0400 Subject: Update documentation and version numbers 3.0.rc1 --- manual/qpdf-manual.xml | 532 +++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 494 insertions(+), 38 deletions(-) (limited to 'manual') diff --git a/manual/qpdf-manual.xml b/manual/qpdf-manual.xml index 6fa48759..bd102cdf 100644 --- a/manual/qpdf-manual.xml +++ b/manual/qpdf-manual.xml @@ -5,8 +5,8 @@ - - + + ]> @@ -26,6 +26,8 @@ QPDF is a program that does structural, content-preserving transformations on PDF files. QPDF's website is located at http://qpdf.sourceforge.net/. + QPDF's source code is hosted on github at https://github.com/qpdf/qpdf. QPDF has been released under the terms of - QPDF is not a PDF content creation library, a - PDF viewer, or a program capable of converting PDF into other - formats. In particular, QPDF knows nothing about the semantics of - PDF content streams. If you are looking for something that can do + With QPDF, it is possible to copy objects from one PDF file into + another and to manipulate the list of pages in a PDF file. This + makes it possible to merge and split PDF files. The QPDF library + also makes it possible for you to create PDF files from scratch. + In this mode, you are responsible for supplying all the contents of + the file, while the QPDF library takes care off all the syntactical + representation of the objects, creation of cross references tables + and, if you use them, object streams, encryption, linearization, + and other syntactic details. You are still responsible for + generating PDF content on your own. + + + QPDF has been designed with very few external dependencies, and it + is intentionally very lightweight. QPDF is + not a PDF content creation library, a PDF + viewer, or a program capable of converting PDF into other formats. + In particular, QPDF knows nothing about the semantics of PDF + content streams. If you are looking for something that can do that, you should look elsewhere. However, once you have a valid PDF file, QPDF can be used to transform that file in ways perhaps - your original PDF creation can't handle. For example, programs - generate simple PDF files but can't password-protect them, + your original PDF creation can't handle. For example, many + programs generate simple PDF files but can't password-protect them, web-optimize them, or perform other transformations of that type. @@ -112,17 +128,34 @@ -u. + + + A C++ compiler that works well with STL and has the long + long type. Most modern C++ compilers should fit the + bill fine. QPDF is tested with gcc and Microsoft Visual C++. + + Part of qpdf's test suite does comparisons of the contents PDF - files by converting them images and comparing the images. You can - optionally disable this part of the test suite by running - configure with the - flag. If you leave - this enabled, the following additional requirements are required - by the test suite. Note that in no case are these items required - to use qpdf. + files by converting them images and comparing the images. The + image comparison tests are disabled by default. Those tests are + not required for determining correctness of a qpdf build if you + have not modified the code since the test suite also contains + expected output files that are compared literally. The image + comparison tests provide an extra check to make sure that any + content transformations don't break the rendering of pages. + Transformations that affect the content streams themselves are off + by default and are only provided to help developers look into the + contents of PDF files. If you are making deep changes to the + library that cause changes in the contents of the files that qpdf + generates, then you should enable the image comparison tests. + Enable them by running configure with the + flag. If you enable + this, the following additional requirements are required by the + test suite. Note that in no case are these items required to use + qpdf. @@ -132,13 +165,12 @@ GhostScript version 8.60 or newer: http://pages.cs.wisc.edu/~ghost/ + url="http://www.ghostscript.com">http://www.ghostscript.com - This option is primarily intended for use by packagers of qpdf so - that they can avoid having the qpdf packages depend on tiff and - ghostscript software. + If you do not enable this, then you do not need to have tiff and + ghostscript. If Adobe Reader is installed as acroread, some @@ -158,7 +190,7 @@ To build the PDF version of the documentation, you need Apache fop (http://xml.apache.org/fop/) - version 0.94 of higher. + version 0.94 or higher. @@ -182,9 +214,9 @@ make Building on Windows is a little bit more complicated. For details, please see README-windows.txt in the source distribution. You can also download a binary distribution - for Windows. There is a port of qpdf in the - contrib area generously contributed by Jian - Ma. This is also discussed in more detail in + for Windows. There is a port of qpdf to Visual C++ version 6 in + the contrib area generously contributed by + Jian Ma. This is also discussed in more detail in README-windows.txt. @@ -215,7 +247,12 @@ make identical to the input file but may have been structurally reorganized. Also, orphaned objects will be removed from the file. Many transformations are available as controlled by the - options below. + options below. In place of , the + parameter may be specified. This causes + qpdf to use a dummy input file that contains zero pages. The only + normal use case for using would be if you + were going to add pages from another source, as discussed in . does not have to be seekable, even @@ -248,7 +285,35 @@ make - Causes generation of a linearized (web optimized) output file. + Causes generation of a linearized (web-optimized) output file. + + + + + + + + Encrypt the file using the same encryption parameters, + including user and owner password, as the specified file. Use + to specify a password + if one is needed to open this file. Note that copying the + encryption parameters from a file also copies the first half + of /ID from the file since this is part of + the encryption parameters. + + + + + + + + If the file specified with + requires a password, specify the password using this option. + Note that only one of the user or owner password is required. + Both passwords will be preserved since QPDF does not + distinguish between the two passwords. It is possible to + preserve encryption parameters, including the owner password, + from a file even if you don't know the file's owner password. @@ -271,6 +336,16 @@ make + + + + + Select specific pages from one or more input files. See for details on how to do page + selection (splitting and merging). + + + @@ -289,6 +364,25 @@ make restrictions or other restrictions placed on files by their producers. + + In all cases where qpdf allows specification of a password, care + must be taken if the password contains characters that fall + outside of the 7-bit US-ASCII character range to ensure that the + exact correct byte sequence is provided. It is possible that a + future version of qpdf may handle this more gracefully. For + example, if a password was encrypted using a password that was + encoded in ISO-8859-1 and your terminal is configured to use + UTF-8, the password you supply may not work properly. There are + various approaches to handling this. For example, if you are + using Linux and have the iconv executable (part of the ICU + package) installed, you could pass to qpdf where + password is a password specified in + your terminal's locale. A detailed discussion of this is out of + scope for this manual, but just be aware of this issue if you have + trouble with a password that contains 8-bit characters. + Encryption Options @@ -474,6 +568,126 @@ make The default for each permission option is to be fully permissive. + + Page Selection Options + + Starting with qpdf 3.0, it is possible to split and merge PDF + files by selecting pages from one or more input files. Whatever + file is given as the primary input file is used as the starting + point, but its pages are replaced with pages as specified. + + + + Multiple input files may be specified. Each one is given as the + name of the input file, an optional password (if required to open + the file), and the range of pages. Note that + “” terminates parsing of page + selection flags. + + + For each file that pages should be taken from, specify the file, a + password needed to open the file (if needed), and a page range. + If the primary input file file requires a password, that password + must be specified outside the option and + does not need to be repeated inside the . + The same file can be repeated multiple times. If a file that is + repeated has a password, the password only has to be given the + first time. All non-page data (info, outlines, page numbers, + etc.) are taken from the primary input file. To discard these, + use as the primary input. One subtlety + about specifying passwords is that specifying a password as + doesn't prevent you + from having to repeat that password of that is also one of the + input files. If in doubt, it's never an error to specify the + password multiple times. + + + It is not presently possible to specify the same page from the + same file directly more than once, but you can make this work by + specifying two different paths to the same file (such as by + putting ./ somewhere in the path). This can + also be used if you want to repeat a page from one of the input + files in the output file. This may be made more convenient in a + future version of qpdf if there is enough demand for this feature. + + + The page range is a set of numbers separated by commas, ranges of + numbers separated dashes, or combinations of those. The character + “z” represents the last page. Pages can appear in any + order. Ranges can appear with a high number followed by a low + number, which causes the pages to appear in reverse. Repeating a + number will cause an error, but you can use the workaround + discussed above should you really want to include the same page + twice. + + + Example page ranges: + + + + 1,3,5-9,15-12: pages 1, 2, 3, 5, 6, 7, 8, + 9, 15, 14, 13, and 12. + + + + + z-1: all pages in the document in reverse + + + + + + Note that qpdf doesn't presently do anything special about other + constructs in a PDF file that may know about pages, so semantics + of splitting and merging vary across features. For example, the + document's outlines (bookmarks) point to actual page objects, so + if you select some pages and not others, bookmarks that point to + pages that are in the output file will work, and remaining + bookmarks will not work. On the other hand, page labels (page + numbers specified in the file) are just sequential, so page labels + will be messed up in the output file. A future version of + qpdf may do a better job at handling these + issues. (Note that the qpdf library already contains all of the + APIs required in order to implement this in your own application + if you need it.) In the mean time, you can always use + as the primary input file to avoid + copying all of that from the first file. For example, to take + pages 1 through 5 from a infile.pdf while + preserving all metadata associated with that file, you could use + + qpdf + + If you wanted pages 1 through 5 from + infile.pdf but you wanted the rest of the + metadata to be dropped, you could instead run + + qpdf + + If you wanted to take pages 1–5 from + file1.pdf and pages 11–15 from + file2.pdf in reverse, you would run + + qpdf + + If, for some reason, you wanted to take the first page of an + encrypted file called encrypted.pdf with + password pass and repeat it twice in an output + file, and if you wanted to drop metadata (like page numbers and + outlines) but preserve encryption, you would use + + qpdf + + Note that we had to specify the password all three times because + giving a password as + doesn't count for page selection, and as far as qpdf is concerned, + encrypted.pdf and + ./encrypted.pdf are separated files. These + are all corner cases that most users should hopefully never have + to be bothered with. + + Advanced Transformation Options @@ -1053,6 +1267,14 @@ make your system understands how to read libtool .la files, this may not be necessary. + + The qpdf library is safe to use in a multithreaded program, but no + individual QPDF object instance (including + QPDF, QPDFObjectHandle, or + QPDFWriter) can be used in more than one thread at a + time. Multiple threads may simultaneously work with different + instances of these and all other QPDF objects. + Design and Library Notes @@ -1156,17 +1378,15 @@ make which objects are direct and which objects are indirect. - There is no public interface for creating instances of - QPDFObjectHandle. They can be created only inside the QPDF - library. This is generally done through a call to the private - method QPDF::readObject which uses - QPDFTokenizer to read an indirect object at - a given file position and return a - QPDFObjectHandle that encapsulates it. - There are also internal methods to create fabricated indirect - objects from existing direct objects or to change an indirect - object into a direct object, though these steps are not performed - except to support rewriting. + Instances of QPDFObjectHandle can be + directly created and modified using static factory methods in the + QPDFObjectHandle class. There are factory + methods for each type of object as well as a convenience method + QPDFObjectHandle::parse that creates an + object from a string representation of the object. Existing + instances of QPDFObjectHandle can also be + modified in several ways. See comments in + QPDFObjectHandle.hh for details. When the QPDF class creates a new object, @@ -1377,6 +1597,86 @@ make files. + + Adding and Removing Pages + + While qpdf's API has supported adding and modifying objects for + some time, version 3.0 introduces specific methods for adding and + removing pages. These are largely convenience routines that + handle two tricky issues: pushing inheritable resources from the + /Pages tree down to individual pages and + manipulation of the /Pages tree itself. For + details, see addPage and surrounding methods + in QPDF.hh. + + + + Reserving Object Numbers + + Version 3.0 of qpdf introduced the concept of reserved objects. + These are seldom needed for ordinary operations, but there are + cases in which you may want to add a series of indirect objects + with references to each other to a QPDF + object. This causes a problem because you can't determine the + object ID that a new indirect object will have until you add it to + the QPDF object with + QPDF::makeIndirectObject. The only way to + add two mutually referential objects to a + QPDF object prior to version 3.0 would be + to add the new objects first and then make them refer to each + other after adding them. Now it is possible to create a + reserved object using + QPDFObjectHandle::newReserved. This is an + indirect object that stays “unresolved” even if it is + queried for its type. So now, if you want to create a set of + mutually referential objects, you can create reservations for each + one of them and use those reservations to construct the + references. When finished, you can call + QPDF::replaceReserved to replace the reserved + objects with the real ones. This functionality will never be + needed by most applications, but it is used internally by QPDF + when copying objects from other PDF files, as discussed in . For an example of how to use + reserved objects, search for newReserved in + test_driver.cc in qpdf's sources. + + + + Copying Objects From Other PDF Files + + Version 3.0 of qpdf introduced the ability to copy objects into a + QPDF object from a different + QPDF object, which we refer to as + foreign objects. This allows arbitrary + merging of PDF files. The qpdf command-line + tool provides limited support for basic page selection, including + merging in pages from other files, but the library's API makes it + possible to implement arbitrarily complex merging operations. The + main method for copying foreign objects is + QPDF::copyForeignObject. This takes an + indirect object from another QPDF and + copies it recursively into this object while preserving all object + structure, including circular references. This means you can add + a direct object that you create from scratch to a + QPDF object with + QPDF::makeIndirectObject, and you can add an + indirect object from another file with + QPDF::copyForeignObject. The fact that + QPDF::makeIndirectObject does not + automatically detect a foreign object and copy it is an explicit + design decision. Copying a foreign object seems like a + sufficiently significant thing to do that it should be done + explicitly. + + + The other way to copy foreign objects is by passing a page from + one QPDF to another by calling + QPDF::addPage. In contrast to + QPDF::makeIndirectObject, this method + automatically distinguishes between indirect objects in the + current file, foreign objects, and direct objects. + + Writing PDF Files @@ -1892,8 +2192,8 @@ print "\n"; The specification recommends limiting the number of objects in object stream for efficiency in reading and decoding. Acrobat 6 - uses no more than objects per object stream for linearized files - and no more 200 objects per stream for non-linearized files. + uses no more than 100 objects per object stream for linearized + files and no more 200 objects per stream for non-linearized files. QPDFWriter, in object stream generation mode, never puts more than 100 objects in an object stream. @@ -2085,6 +2385,119 @@ print "\n"; For a detailed list of changes, please see the file ChangeLog in the source distribution. + + + 3.0.rc1: July 29, 2012 + + + + + Acknowledgment: I would like to express gratitude for the + contributions of Tobias Hoffmann toward the release of qpdf + version 3.0. He is responsible for most of the implementation + and design of the new API for manipulating pages, and + contributed code and ideas for many of the improvements made + in version 3.0. Without his work, this release would + certainly not have happened as soon as it did, if at all. + + + + + Non-compatible API change: The version of + QPDFObjectHandle::replaceStreamData that + uses a StreamDataProvider no longer + requires (or accepts) a length parameter. + See for an explanation. + While care is taken to avoid non-compatible API changes in + general, an exception was made this time because the new + interface offers an opportunity to significantly simplify + calling code. + + + + + Support has been added for large files. The test suite + verifies support for files larger than 4 gigabytes, and manual + testing has verified support for files larger than 10 + gigabytes. Large file support is available for both 32-bit + and 64-bit platforms as long as the compiler and underlying + platforms support it. + + + + + Support for page selection (splitting and merging PDF files) + has been added to the qpdf command-line + tool. See . + + + + + Options have been added to the qpdf + command-line tool for copying encryption parameters from + another file. See . + + + + + New methods have been added to the QPDF + object for adding and removing pages. See . + + + + + New methods have been added to the QPDF + object for copying objects from other PDF files. See + + + + + A new method QPDFObjectHandle::parse has + been added for constructing + QPDFObjectHandle objects from a string + description. + + + + + Methods have been added to QPDFWriter + to allow writing to an already open stdio FILE* + addition to writing to standard output or a named file. + Methods have been added to QPDF to be + able to process a file from an already open stdio + FILE*. This makes it possible to read and write + PDF from secure temporary files that have been unlinked prior + to being fully read or written. + + + + + The QPDF::emptyPDF can be used to allow + creation of PDF files from scratch. The example + examples/pdf-create.cc illustrates how it + can be used. + + + + + Several methods to take + PointerHolder<Buffer> can now + also accept std::string arguments. + + + + + Many new convenience methods have been added to the library, + most in QPDFObjectHandle. See + ChangeLog for a full list. + + + + + + 2.3.1: December 28, 2011 @@ -2728,4 +3141,47 @@ print "\n"; + + Upgrading to 3.0 + + For the most part, the API for qpdf version 3.0 is backward + compatible with versions 2.1 and later. There are two exceptions: + + + + The method + QPDFObjectHandle::replaceStreamData that + uses a StreamDataProvider to provide the + stream data no longer takes a length + parameter. While it would have been easy enough to keep the + parameter for backward compatibility, in this case, the + parameter was removed since this provides the user an + opportunity to simplify the calling code. This method was + introduced in version 2.2. At the time, the + length parameter was required in order to + ensure that calls to the stream data provider returned the same + length for a specific stream every time they were invoked. In + particular, the linearization code depends on this. Instead, + qpdf 3.0 and newer check for that constraint explicitly. The + first time the stream data provider is called for a specific + stream, the actual length is saved, and subsequent calls are + required to return the same number of bytes. This means the + calling code no longer has to compute the length in advance, + which can be a significant simplification. If your code fails + to compile because of the extra argument and you don't want to + make other changes to your code, just omit the argument. + + + + + Many methods take long long instead of other + integer types. Most if not all existing code should compile + fine with this change since such parameters had always + previously been smaller types. This change was required to + support files larger than two gigabytes in size. + + + + + -- cgit v1.2.3-54-g00ecf