diff options
Diffstat (limited to 'manual/qpdf-manual.xml')
-rw-r--r-- | manual/qpdf-manual.xml | 2020 |
1 files changed, 1952 insertions, 68 deletions
diff --git a/manual/qpdf-manual.xml b/manual/qpdf-manual.xml index 848c340b..d43d96a6 100644 --- a/manual/qpdf-manual.xml +++ b/manual/qpdf-manual.xml @@ -5,8 +5,8 @@ <!ENTITY mdash "—"> <!ENTITY ndash "–"> <!ENTITY nbsp " "> -<!ENTITY swversion "8.1.0"> -<!ENTITY lastreleased "June 22, 2018"> +<!ENTITY swversion "8.4.0"> +<!ENTITY lastreleased "February 1, 2019"> ]> <book> <bookinfo> @@ -16,7 +16,7 @@ <firstname>Jay</firstname><surname>Berkenbilt</surname> </author> <copyright> - <year>2005–2018</year> + <year>2005–2019</year> <holder>Jay Berkenbilt</holder> </copyright> </bookinfo> @@ -196,15 +196,6 @@ ghostscript. </para> <para> - If Adobe Reader is installed as <command>acroread</command>, some - additional test cases will be enabled. These test cases simply - verify that Adobe Reader can open the files that qpdf creates. - They require version 8.0 or newer to pass. However, in order to - avoid having qpdf depend on non-free (as in liberty) software, the - test suite will still pass without Adobe reader, and the test - suite still exercises the full functionality of the software. - </para> - <para> Pre-built documentation is distributed with qpdf, so you should generally not need to rebuild the documentation. In order to build the documentation from its docbook sources, you need the @@ -251,6 +242,56 @@ make top-level <filename>Makefile</filename>. </para> </sect1> + <sect1 id="ref.packaging"> + <title>Notes for Packagers</title> + <para> + If you are packaging qpdf for an operating system distribution, + here are some things you may want to keep in mind: + <itemizedlist> + <listitem> + <para> + Passing <option>--enable-show-failed-test-output</option> to + <command>./configure</command> will cause any failed test + output to be written to the console. This can be very useful + for seeing test failures generated by autobuilders where you + can't access qtest.log after the fact. + </para> + </listitem> + <listitem> + <para> + If qpdf's build environment detects the presence of autoconf + and related tools, it will check to ensure that automatically + generated files are up-to-date with recorded checksums and fail + if it detects a discrepancy. This feature is intended to + prevent you from accidentally forgetting to regenerate + automatic files after modifying their sources. If your + packaging environment automatically refreshes automatic files, + it can cause this check to fail. Suppress qpdf's checks by + passing <option>--disable-check-autofiles</option> to + <command>/.configure</command>. This is safe since qpdf's + <command>autogen.sh</command> just runs autotools in the normal + way. + </para> + </listitem> + <listitem> + <para> + QPDF's <command>make install</command> does not install + completion files by default, but as a packager, it's good if + you install them wherever your distribution expects such files + to go. You can find completion files to install in the + <filename>completions</filename> directory. + </para> + </listitem> + <listitem> + <para> + Packagers are encouraged to install the source files from the + <filename>examples</filename> directory along with qpdf + development packages. + </para> + </listitem> + </itemizedlist> + </para> + </sect1> </chapter> <chapter id="ref.using"> <title>Running QPDF</title> @@ -300,6 +341,19 @@ make inspection commands do not. These are specifically noted. </para> </sect1> + <sect1 id="ref.shell-completion"> + <title>Shell Completion</title> + <para> + Starting in qpdf version 8.3.0, qpdf provides its own completion + support for zsh and bash. You can enable bash completion with + <command>eval $(qpdf --completion-bash)</command> and zsh + completion with <command>eval $(qpdf --completion-zsh)</command>. + If <command>qpdf</command> is not in your path, you should invoke + it above with an absolute path. If you invoke it with a relative + path, it will warn you, and the completion won't work if you're in + a different directory. + </para> + </sect1> <sect1 id="ref.basic-options"> <title>Basic Options</title> <para> @@ -307,6 +361,48 @@ make commonly needed transformations. <variablelist> <varlistentry> + <term><option>--help</option></term> + <listitem> + <para> + Display command-line invocation help. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term><option>--version</option></term> + <listitem> + <para> + Display the current version of qpdf. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term><option>--copyright</option></term> + <listitem> + <para> + Show detailed copyright information. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term><option>--completion-bash</option></term> + <listitem> + <para> + Output a completion command you can eval to enable shell + completion from bash. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term><option>--completion-zsh</option></term> + <listitem> + <para> + Output a completion command you can eval to enable shell + completion from zsh. + </para> + </listitem> + </varlistentry> + <varlistentry> <term><option>--password=password</option></term> <listitem> <para> @@ -328,6 +424,24 @@ make </listitem> </varlistentry> <varlistentry> + <term><option>--progress</option></term> + <listitem> + <para> + Indicate progress while writing files. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term><option>--no-warn</option></term> + <listitem> + <para> + Suppress writing of warnings to stderr. If warnings were + detected and suppressed, <command>qpdf</command> will still + exit with exit code 3. + </para> + </listitem> + </varlistentry> + <varlistentry> <term><option>--linearize</option></term> <listitem> <para> @@ -412,6 +526,83 @@ make </listitem> </varlistentry> <varlistentry> + <term><option>--suppress-password-recovery</option></term> + <listitem> + <para> + Ordinarily, qpdf attempts to automatically compensate for + passwords specified in the wrong character encoding. This + option suppresses that behavior. Under normal conditions, + there are no reasons to use this option. See <xref + linkend="ref.unicode-passwords"/> for a discussion + </para> + </listitem> + </varlistentry> + <varlistentry> + <term><option>--password-mode=<replaceable>mode</replaceable></option></term> + <listitem> + <para> + This option can be used to fine-tune how qpdf interprets + Unicode (non-ASCII) password strings passed on the command + line. With the exception of the <option>hex-bytes</option> + mode, these only apply to passwords provided when encrypting + files. The <option>hex-bytes</option> mode also applies to + passwords specified for reading files. For additional + discussion of the supported password modes and when you might + want to use them, see <xref linkend="ref.unicode-passwords"/>. + The following modes are supported: + <itemizedlist> + <listitem> + <para> + <option>auto</option>: Automatically determine whether the + specified password is a properly encoded Unicode (UTF-8) + string, and transcode it as required by the PDF spec based + on the type encryption being applied. On Windows starting + with version 8.4.0, and on almost all other modern + platforms, incoming passwords will be properly encoded in + UTF-8, so this is almost always what you want. + </para> + </listitem> + <listitem> + <para> + <option>unicode</option>: Tells qpdf that the incoming + password is UTF-8, overriding whatever its automatic + detection determines. The only difference between this mode + and <option>auto</option> is that qpdf will fail with an + error message if the password is not valid UTF-8 instead of + falling back to <option>bytes</option> mode with a warning. + </para> + </listitem> + <listitem> + <para> + <option>bytes</option>: Interpret the password as a literal + byte string. For non-Windows platforms, this is what + versions of qpdf prior to 8.4.0 did. For Windows platforms, + there is no way to specify strings of binary data on the + command line directly, but you can use the + <option>@filename</option> option to do it, in which case + this option forces qpdf to respect the string of bytes as + provided. This option will allow you to encrypt PDF files + with passwords that will not be usable by other readers. + </para> + </listitem> + <listitem> + <para> + <option>hex-bytes</option>: Interpret the password as a + hex-encoded string. This provides a way to pass binary data + as a password on all platforms including Windows. As with + <option>bytes</option>, this option may allow creation of + files that can't be opened by other readers. This mode + affects qpdf's interpretation of passwords specified for + decrypting files as well as for encrypting them. It makes + it possible to specify strings that are encoded in some + manner other than the system's default encoding. + </para> + </listitem> + </itemizedlist> + </para> + </listitem> + </varlistentry> + <varlistentry> <term><option>--rotate=[+|-]angle[:page-range]</option></term> <listitem> <para> @@ -436,6 +627,35 @@ make </listitem> </varlistentry> <varlistentry> + <term><option>--keep-files-open=<replaceable>[yn]</replaceable></option></term> + <listitem> + <para> + This option controls whether qpdf keeps individual files open + while merging. Prior to version 8.1.0, qpdf always kept all + files open, but this meant that the number of files that could + be merged was limited by the operating system's open file + limit. Version 8.1.0 opened files as they were referenced and + closed them after each read, but this caused a major + performance impact. Version 8.2.0 optimized the performance + but did so in a way that, for local file systems, there was a + small but unavoidable performance hit, but for networked file + systems, the performance impact could be very high. Starting + with version 8.2.1, the default behavior is that files are + kept open if no more than 200 files are specified, but that + the behavior can be explicitly overridden with the + <option>--keep-files-open</option> flag. If you are merging + more than 200 files but less than the operating system's max + open files limit, you may want to use + <option>--keep-files-open=y</option>, especially if working + over a networked file system. If you are using a local file + system where the overhead is low and you might sometimes merge + more than the OS limit's number of files from a script and are + not worried about a few seconds additional processing time, + you may want to specify <option>--keep-files-open=n</option>. + </para> + </listitem> + </varlistentry> + <varlistentry> <term><option>--pages options --</option></term> <listitem> <para> @@ -446,6 +666,16 @@ make </listitem> </varlistentry> <varlistentry> + <term><option>--collate</option></term> + <listitem> + <para> + When specified, collate rather than concatenate pages from + files specified with <option>--pages</option>. See <xref + linkend="ref.page-selection"/> for additional details. + </para> + </listitem> + </varlistentry> + <varlistentry> <term><option>--split-pages=[n]</option></term> <listitem> <para> @@ -518,6 +748,26 @@ make </para> </listitem> </varlistentry> + <varlistentry> + <term><option>--overlay options --</option></term> + <listitem> + <para> + Overlay pages from another file onto the output pages. See + <xref linkend="ref.overlay-underlay"/> for details on + overlay/underlay. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term><option>--underlay options --</option></term> + <listitem> + <para> + Overlay pages from another file onto the output pages. See + <xref linkend="ref.overlay-underlay"/> for details on + overlay/underlay. + </para> + </listitem> + </varlistentry> </variablelist> </para> <para> @@ -537,22 +787,17 @@ make producers. </para> <para> - In all cases where qpdf allows specification of a password, care - must be taken if the password contains characters that fall - outside of the 7-bit US-ASCII character range to ensure that the - exact correct byte sequence is provided. It is possible that a - future version of qpdf may handle this more gracefully. For - example, if a password was encrypted using a password that was - encoded in ISO-8859-1 and your terminal is configured to use - UTF-8, the password you supply may not work properly. There are - various approaches to handling this. For example, if you are - using Linux and have the iconv executable installed, you could - pass <option>--password=`echo <replaceable>password</replaceable> - | iconv -t iso-8859-1`</option> to qpdf where - <replaceable>password</replaceable> is a password specified in - your terminal's locale. A detailed discussion of this is out of - scope for this manual, but just be aware of this issue if you have - trouble with a password that contains 8-bit characters. + Prior to 8.4.0, in the case of passwords that contain characters + that fall outside of 7-bit US-ASCII, qpdf left the burden of + supplying properly encoded encryption and decryption passwords to + the user. Starting in qpdf 8.4.0, qpdf does this automatically in + most cases. For an in-depth discussion, please see <xref + linkend="ref.unicode-passwords"/>. Previous versions of this + manual described workarounds using the <command>iconv</command> + command. Such workarounds are no longer required or recommended + with qpdf 8.4.0. However, for backward compatibility, qpdf + attempts to detect those workarounds and do the right thing in + most cases. </para> </sect1> <sect1 id="ref.encryption-options"> @@ -624,7 +869,12 @@ make <listitem> <para> Determines whether or not to allow accessibility to visually - impaired. + impaired. The qpdf library disregards this field when AES is + used or when 256-bit encryption is used. You should really + never disable accessibility, but qpdf lets you do it in case + you need to configure a file this way for testing purposes. + The PDF spec says that conforming readers should disregard + this permission and always allow accessibility. </para> </listitem> </varlistentry> @@ -637,6 +887,45 @@ make </listitem> </varlistentry> <varlistentry> + <term><option>--assemble=[yn]</option></term> + <listitem> + <para> + Determines whether document assembly (rotation and reordering + of pages) is allowed. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term><option>--annotate=[yn]</option></term> + <listitem> + <para> + Determines whether modifying annotations is allowed. This + includes adding comments and filling in form fields. Also + allows editing of form fields if + <option>--modify-other=y</option> is given. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term><option>--form=[yn]</option></term> + <listitem> + <para> + Determines whether filling form fields is allowed. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term><option>--modify-other=[yn]</option></term> + <listitem> + <para> + Allow all document editing except those controlled separately + by the <option>--assemble</option>, + <option>--annotate</option>, and <option>--form</option> + options. + </para> + </listitem> + </varlistentry> + <varlistentry> <term><option>--print=<replaceable>print-opt</replaceable></option></term> <listitem> <para> @@ -667,10 +956,10 @@ make <term><option>--modify=<replaceable>modify-opt</replaceable></option></term> <listitem> <para> - Controls modify access. + Controls modify access. This way of controlling modify access + has less granularity than new options added in qpdf 8.4. <option><replaceable>modify-opt</replaceable></option> may be - one of the following, each of which implies all the options - that follow it: + one of the following: <itemizedlist> <listitem> <para> @@ -679,12 +968,14 @@ make </listitem> <listitem> <para> - <option>annotate</option>: allow comment authoring and form operations + <option>annotate</option>: allow comment authoring, form + operations, and document assembly </para> </listitem> <listitem> <para> <option>form</option>: allow form field fill-in and signing + and document assembly </para> </listitem> <listitem> @@ -698,6 +989,12 @@ make </para> </listitem> </itemizedlist> + Using the <option>--modify</option> option does not allow you + to create certain combinations of permissions such as allowing + form filling but not allowing document assembly. Starting with + qpdf 8.4, you can either just use the other options to control + fields individually, or you can use something like + <option>--modify=form --assembly=n</option> to fine tune. </para> </listitem> </varlistentry> @@ -794,6 +1091,11 @@ make selection flags. </para> <para> + Starting with qpf 8.4, the special input file name + “<filename>.</filename>” can be used shortcut for the + primary input filename. + </para> + <para> For each file that pages should be taken from, specify the file, a password needed to open the file (if any), and a page range. The password needs to be given only once per file. If any of the @@ -816,15 +1118,6 @@ make <command>qpdf --empty out.pdf --pages *.pdf --</command>. </para> <para> - It is not presently possible to specify the same page from the - same file directly more than once, but you can make this work by - specifying two different paths to the same file (such as by - putting <filename>./</filename> somewhere in the path). This can - also be used if you want to repeat a page from one of the input - files in the output file. This may be made more convenient in a - future version of qpdf if there is enough demand for this feature. - </para> - <para> The page range is a set of numbers separated by commas, ranges of numbers separated dashes, or combinations of those. The character “z” represents the last page. A number preceded by an @@ -864,25 +1157,56 @@ make </itemizedlist> </para> <para> - Note that qpdf doesn't presently do anything special about other - constructs in a PDF file that may know about pages, so semantics - of splitting and merging vary across features. For example, the - document's outlines (bookmarks) point to actual page objects, so - if you select some pages and not others, bookmarks that point to - pages that are in the output file will work, and remaining - bookmarks will not work. On the other hand, page labels (page - numbers specified in the file) are just sequential, so page labels - will be messed up in the output file. A future version of - <command>qpdf</command> may do a better job at handling these - issues. (Note that the qpdf library already contains all of the - APIs required in order to implement this in your own application - if you need it.) In the mean time, you can always use - <option>--empty</option> as the primary input file to avoid - copying all of that from the first file. For example, to take - pages 1 through 5 from a <filename>infile.pdf</filename> while - preserving all metadata associated with that file, you could use + Starting in qpdf version 8.3, you can specify the + <option>--collate</option> option. Note that this option is + specified outside of <option>--pages ... --</option>. + When <option>--collate</option> is specified, it changes the + meaning of <option>--pages</option> so that the specified files, + as modified by page ranges, are collated rather than concatenated. + For example, if you add the files <filename>odd.pdf</filename> and + <filename>even.pdf</filename> containing odd and even pages of a + document respectively, you could run <command>qpdf --collate + odd.pdf --pages odd.pdf even.pdf -- all.pdf</command> to collate + the pages. This would pick page 1 from odd, page 1 from even, page + 2 from odd, page 2 from even, etc. until all pages have been + included. Any number of files and page ranges can be specified. If + any file has fewer pages, that file is just skipped when its pages + have all been included. For example, if you ran <command>qpdf + --collate --empty --pages a.pdf 1-5 b.pdf 6-4 c.pdf r1 -- + out.pdf</command>, you would get the following pages in this + order: + <itemizedlist> + <listitem><para>a.pdf page 1</para></listitem> + <listitem><para>b.pdf page 6</para></listitem> + <listitem><para>c.pdf last page</para></listitem> + <listitem><para>a.pdf page 2</para></listitem> + <listitem><para>b.pdf page 5</para></listitem> + <listitem><para>a.pdf page 3</para></listitem> + <listitem><para>b.pdf page 4</para></listitem> + <listitem><para>a.pdf page 4</para></listitem> + <listitem><para>a.pdf page 5</para></listitem> + </itemizedlist> + </para> + <para> + Starting in qpdf version 8.3, when you split and merge files, any + page labels (page numbers) are preserved in the final file. It is + expected that more document features will be preserved by + splitting and merging. In the mean time, semantics of splitting + and merging vary across features. For example, the document's + outlines (bookmarks) point to actual page objects, so if you + select some pages and not others, bookmarks that point to pages + that are in the output file will work, and remaining bookmarks + will not work. A future version of <command>qpdf</command> may do + a better job at handling these issues. (Note that the qpdf library + already contains all of the APIs required in order to implement + this in your own application if you need it.) In the mean time, + you can always use <option>--empty</option> as the primary input + file to avoid copying all of that from the first file. For + example, to take pages 1 through 5 from a + <filename>infile.pdf</filename> while preserving all metadata + associated with that file, you could use - <programlisting><command>qpdf</command> <option>infile.pdf --pages infile.pdf 1-5 -- outfile.pdf</option> + <programlisting><command>qpdf</command> <option>infile.pdf --pages . 1-5 -- outfile.pdf</option> </programlisting> If you wanted pages 1 through 5 from <filename>infile.pdf</filename> but you wanted the rest of the @@ -894,13 +1218,13 @@ make <filename>file1.pdf</filename> and pages 11–15 from <filename>file2.pdf</filename> in reverse, you would run - <programlisting><command>qpdf</command> <option>file1.pdf --pages file1.pdf 1-5 file2.pdf 15-11 -- outfile.pdf</option> + <programlisting><command>qpdf</command> <option>file1.pdf --pages file1.pdf 1-5 . 15-11 -- outfile.pdf</option> </programlisting> If, for some reason, you wanted to take the first page of an encrypted file called <filename>encrypted.pdf</filename> with password <literal>pass</literal> and repeat it twice in an output - file, and if you wanted to drop metadata (like page numbers and - outlines) but preserve encryption, you would use + file, and if you wanted to drop document-level metadata but + preserve encryption, you would use <programlisting><command>qpdf</command> <option>--empty --copy-encryption=encrypted.pdf --encryption-file-password=pass --pages encrypted.pdf --password=pass 1 ./encrypted.pdf --password=pass 1 -- @@ -914,6 +1238,100 @@ outfile.pdf</option> are all corner cases that most users should hopefully never have to be bothered with. </para> + <para> + Prior to version 8.4, it was not possible to specify the same page + from the same file directly more than once, and the workaround of + specifying the same file in more than one way was required. + Version 8.4 removes this limitation. + </para> + </sect1> + <sect1 id="ref.overlay-underlay"> + <title>Overlay and Underlay Options</title> + <para> + Starting with qpdf 8.4, it is possible to overlay or underlay + pages from other files onto the output generated by qpdf. Specify + overlay or underlay as follows: + + <programlisting>{ <option>--overlay</option> | <option>--underlay</option> } <replaceable>file</replaceable> [ <option>options</option> ] <option>--</option> +</programlisting> + Overlay and underlay options are processed late, so they can be + combined with other like merging and will apply to the final + output. The <option>--overlay</option> and + <option>--underlay</option> options work the same way, except + underlay pages are drawn underneath the page to which they are + applied, possibly obscured by the original page, and overlay files + are drawn on top of the page to which they are applied, possibly + obscuring the page. You can combine overlay and underlay. + </para> + <para> + The default behavior of overlay and underlay is that pages are + taken from the overlay/underlay file in sequence and applied to + corresponding pages in the output until there are no more output + pages. If the overlay or underlay file runs out of pages, + remaining output pages are left alone. This behavior can be + modified by options, which are provided between the + <option>--overlay</option> or <option>--underlay</option> flag and + the <option>--</option> option. The following options are + supported: + <itemizedlist> + <listitem> + <para> + <option>--password=password</option>: supply a password if the + overlay/underlay file is encrypted. + </para> + </listitem> + <listitem> + <para> + <option>--to=page-range</option>: a range of pages in the same + form at described in <xref linkend="ref.page-selection"/> + indicates which pages in the output should have the + overlay/underlay applied. If not specified, overlay/underlay + are applied to all pages. + </para> + </listitem> + <listitem> + <para> + <option>--from=[page-range]</option>: a range of pages that + specifies which pages in the overlay/underlay file will be used + for overlay or underlay. If not specified, all pages will be + used. This can be explicitly specified to be empty if + <option>--repeat</option> is used. + </para> + </listitem> + <listitem> + <para> + <option>--repeat=page-range</option>: an optional range of + pages that specifies which pages in the overlay/underlay file + will be repeated after the “from” pages are used + up. If you want to repeat a range of pages starting at the + beginning, you can explicitly use <option>--from=</option>. + </para> + </listitem> + </itemizedlist> + </para> + <para> + Here are some examples. + <itemizedlist> + <listitem> + <para> + <command>--overlay o.pdf --to=1-5 --from=1-3 + --repeat=4 --</command>: overlay the first three pages from file + <filename>o.pdf</filename> onto the first three pages of the + output, then overlay page 4 from <filename>o.pdf</filename> + onto pages 4 and 5 of the output. Leave remaining output pages + untouched. + </para> + </listitem> + <listitem> + <para> + <command>--underlay footer.pdf --from= --repeat=1,2 --</command>: + Underlay page 1 of <filename>footer.pdf</filename> on all odd + output pages, and underlay page 2 of + <filename>footer.pdf</filename> on all even output pages. + </para> + </listitem> + </itemizedlist> + </para> </sect1> <sect1 id="ref.advanced-parsing"> <title>Advanced Parsing Options</title> @@ -1145,8 +1563,8 @@ outfile.pdf</option> case, please report it as a bug. </para> <para> - See also <option>--preserve-unreferenced-resources</option>, - which does something completely different. + See also <option>--preserve-unreferenced</option>, which does + something completely different. </para> </listitem> </varlistentry> @@ -1200,6 +1618,210 @@ outfile.pdf</option> </listitem> </varlistentry> <varlistentry> + <term><option>--flatten-annotations=<replaceable>option</replaceable></option></term> + <listitem> + <para> + This option collapses annotations into the pages' contents + with special handling for form fields. Ordinarily, an + annotation is rendered separately and on top of the page. + Combining annotations into the page's contents effectively + freezes the placement of the annotations, making them look + right after various page transformations. The library + functionality backing this option was added for the benefit of + programs that want to create <emphasis>n-up</emphasis> page + layouts and other similar things that don't work well with + annotations. The <replaceable>option</replaceable> parameter + may be any of the following: + <itemizedlist> + <listitem> + <para> + <option>all</option>: include all annotations that are not + marked invisible or hidden + </para> + </listitem> + <listitem> + <para> + <option>print</option>: only include annotations that + indicate that they should appear when the page is printed + </para> + </listitem> + <listitem> + <para> + <option>screen</option>: omit annotations that indicate + they should not appear on the screen + </para> + </listitem> + </itemizedlist> + </para> + <para> + Note that form fields are special because the annotations that + are used to render filled-in form fields may become out of + date from the fields' values if the form is filled in by a + program that doesn't know how to update the appearances. If + qpdf detects this case, its default behavior is not to flatten + those annotations because doing so would cause the value of + the form field to be lost. This gives you a chance to go back + and resave the form with a program that knows how to generate + appearances. QPDF itself can generate appearances with some + limitations. See the <option>--generate-appearances</option> + option below. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term><option>--generate-appearances</option></term> + <listitem> + <para> + If a file contains interactive form fields and indicates that + the appearances are out of date with the values of the form, + this flag will regenerate appearances, subject to a few + limitations. Note that there is not usually a reason to do + this, but it can be necessary before using the + <option>--flatten-annotations</option> option. Most of these + are not a problem with well-behaved PDF files. The limitations + are as follows: + <itemizedlist> + <listitem> + <para> + Radio button and checkbox appearances use the pre-set + values in the PDF file. QPDF just makes sure that the + correct appearance is displayed based on the value of the + field. This is fine for PDF files that create their forms + properly. Some PDF writers save appearances for fields when + they change, which could cause some controls to have + inconsistent appearances. + </para> + </listitem> + </itemizedlist> + <itemizedlist> + <listitem> + <para> + For text fields and list boxes, any characters that fall + outside of US-ASCII or, if detected, “Windows + ANSI” or “Mac Roman” encoding, will be + replaced by the <literal>?</literal> character. + </para> + </listitem> + </itemizedlist> + <itemizedlist> + <listitem> + <para> + Quadding is ignored. Quadding is used to specify whether + the contents of a field should be left, center, or right + aligned with the field. + </para> + </listitem> + </itemizedlist> + <itemizedlist> + <listitem> + <para> + Rich text, multi-line, and other more elaborate formatting + directives are ignored. + </para> + </listitem> + </itemizedlist> + <itemizedlist> + <listitem> + <para> + There is no support for multi-select fields or signature + fields. + </para> + </listitem> + </itemizedlist> + If qpdf doesn't do a good enough job with your form, use an + external application to save your filled-in form before + processing it with qpdf. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term><option>--optimize-images</option></term> + <listitem> + <para> + This flag causes qpdf to recompress all images that are not + compressed with DCT (JPEG) using DCT compression as long as + doing so decreases the size in bytes of the image data and the + image does not fall below minimum specified dimensions. Useful + information is provided when used in combination with + <option>--verbose</option>. See also the + <option>--oi-min-width</option>, + <option>--oi-min-height</option>, and + <option>--oi-min-area</option> options. By default, starting + in qpdf 8.4, inline images are converted to regular images + and optimized as well. Use + <option>--keep-inline-images</option> to prevent inline images + from being included. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term><option>--oi-min-width=<replaceable>width</replaceable></option></term> + <listitem> + <para> + Avoid optimizing images whose width is below the specified + amount. If omitted, the default is 128 pixels. Use 0 for no + minimum. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term><option>--oi-min-height=<replaceable>height</replaceable></option></term> + <listitem> + <para> + Avoid optimizing images whose height is below the specified + amount. If omitted, the default is 128 pixels. Use 0 for no + minimum. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term><option>--oi-min-area=<replaceable>area-in-pixels</replaceable></option></term> + <listitem> + <para> + Avoid optimizing images whose pixel count + (width × height) is below the specified amount. If + omitted, the default is 16,384 pixels. Use 0 for no minimum. + </para> + </listitem> + </varlistentry> + + + + <varlistentry> + <term><option>--externalize-inline-images</option></term> + <listitem> + <para> + Convert inline images to regular images. By default, images + whose data is at least 1,024 bytes are converted when this + option is selected. Use <option>--ii-min-bytes</option> to + change the size threshold. This option is implicitly selected + when <option>--optimize-images</option> is selected. Use + <option>--keep-inline-images</option> to exclude inline images + from image optimization. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term><option>--ii-min-bytes=<replaceable>bytes</replaceable></option></term> + <listitem> + <para> + Avoid converting inline images whose size is below the + specified minimum size to regular images. If omitted, the + default is 1,024 bytes. Use 0 for no minimum. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term><option>--keep-inline-images</option></term> + <listitem> + <para> + Prevent inline images from being included in image + optimization. This option has no affect when + <option>--optimize-images</option> is not specified. + </para> + </listitem> + </varlistentry> + <varlistentry> <term><option>--qdf</option></term> <listitem> <para> @@ -1468,7 +2090,7 @@ outfile.pdf</option> </listitem> </varlistentry> <varlistentry> - <term><option>--show-object=obj[,gen]</option></term> + <term><option>--show-object=trailer|obj[,gen]</option></term> <listitem> <para> Show the contents of the given object. This is especially @@ -1534,6 +2156,44 @@ outfile.pdf</option> </listitem> </varlistentry> <varlistentry> + <term><option>--json</option></term> + <listitem> + <para> + Generate a JSON representation of the file. This is described + in depth in <xref linkend="ref.json"/> + </para> + </listitem> + </varlistentry> + <varlistentry> + <term><option>--json-help</option></term> + <listitem> + <para> + Describe the format of the JSON output. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term><option>--json-key=key</option></term> + <listitem> + <para> + This option is repeatable. If specified, only top-level keys + specified will be included in the JSON output. If not + specified, all keys will be shown. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term><option>--json-object=trailer|obj[,gen]</option></term> + <listitem> + <para> + This option is repeatable. If specified, only specified + objects will be shown in the + “<literal>objects</literal>” key of the JSON + output. If absent, all objects will be shown. + </para> + </listitem> + </varlistentry> + <varlistentry> <term><option>--check</option></term> <listitem> <para> @@ -1576,6 +2236,121 @@ outfile.pdf</option> content stream, in which case it will produce unusable results. </para> </sect1> + <sect1 id="ref.unicode-passwords"> + <title>Unicode Passwords</title> + <para> + At the library API level, all methods that perform encryption and + decryption interpret passwords as strings of bytes. It is up to + the caller to ensure that they are appropriately encoded. Starting + with qpdf version 8.4.0, qpdf will attempt to make this easier for + you when interact with qpdf via its command line interface. The + PDF specification requires passwords used to encrypt files with + 40-bit or 128-bit encryption to be encoded with PDF Doc encoding. + This encoding is a single-byte encoding that supports ISO-Latin-1 + and a handful of other commonly used characters. It has a large + overlap with Windows ANSI but is not exactly the same. There is + generally not a way to provide PDF Doc encoded strings on the + command line. As such, qpdf versions prior to 8.4.0 would often + create PDF files that couldn't be opened with other software when + given a password with non-ASCII characters to encrypt a file with + 40-bit or 128-bit encryption. Starting with qpdf 8.4.0, qpdf + recognizes the encoding of the parameter and transcodes it as + needed. The rest of this section provides the details about + exactly how qpdf behaves. Most users will not need to know this + information, but it might be useful if you have been working + around qpdf's old behavior or if you are using qpdf to generate + encrypted files for testing other PDF software. + </para> + <para> + A note about Windows: when qpdf builds, it attempts to determine + what it has to do to use <function>wmain</function> instead of + <function>main</function> on Windows. The + <function>wmain</function> function is an alternative entry point + that receives all arguments as UTF-16-encoded strings. When qpdf + starts up this way, it converts all the strings to UTF-8 encoding + and then invokes the regular main. This means that, as far as qpdf + is concerned, it receives its command-line arguments with UTF-8 + encoding, just as it would in any modern Linux or UNIX + environment. + </para> + <para> + If a file is being encrypted with 40-bit or 128-bit encryption and + the supplied password is not a valid UTF-8 string, qpdf will fall + back to the behavior of interpreting the password as a string of + bytes. If you have old scripts that encrypt files by passing the + output of <command>iconv</command> to qpdf, you no longer need to + do that, but if you do, qpdf should still work. The only exception + would be for the extremely unlikely case of a password that is + encoded with a single-byte encoding but also happens to be valid + UTF-8. Such a password would contain strings of even numbers of + characters that alternate between accented letters and symbols. In + the extremely unlikely event that you are intentionally using such + passwords and qpdf is thwarting you by interpreting them as UTF-8, + you can use <option>--password-mode=bytes</option> to suppress + qpdf's automatic behavior. + </para> + <para> + The <option>--password-mode</option> option, as described earlier + in this chapter, can be used to change qpdf's interpretation of + supplied passwords. There are very few reasons to use this option. + One would be the unlikely case described in the previous paragraph + in which the supplied password happens to be valid UTF-8 but isn't + supposed to be UTF-8. Your best bet would be just to provide the + password as a valid UTF-8 string, but you could also use + <option>--password-mode=bytes</option>. Another reason to use + <option>--password-mode=bytes</option> would be to intentionally + generate PDF files encrypted with passwords that are not properly + encoded. The qpdf test suite does this to generate invalid files + for the purpose of testing its password recovery capability. If + you were trying to create intentionally incorrect files for a + similar purposes, the <option>bytes</option> password mode can + enable you to do this. + </para> + <para> + When qpdf attempts to decrypt a file with a password that contains + non-ASCII characters, it will generate a list of alternative + passwords by attempting to interpret the password as each of a + handful of different coding systems and then transcode them to the + required format. This helps to compensate for the supplied + password being given in the wrong coding system, such as would + happen if you used the <command>iconv</command> workaround that + was previously needed. It also generates passwords by doing the + reverse operation: translating from correct in incorrect encoding + of the password. This would enable qpdf to decrypt files using + passwords that were improperly encoded by whatever software + encrypted the files, including older versions of qpdf invoked + without properly encoded passwords. The combination of these two + recovery methods should make qpdf transparently open most + encrypted files with the password supplied correctly but in the + wrong coding system. There are no real downsides to this behavior, + but if you don't want qpdf to do this, you can use the + <option>--suppress-password-recovery</option> option. One reason + to do that is to ensure that you know the exact password that was + used to encrypt the file. + </para> + <para> + With these changes, qpdf now generates compliant passwords in most + cases. There are still some exceptions. In particular, the PDF + specification directs compliant writers to normalize Unicode + passwords and to perform certain transformations on passwords with + bidirectional text. Implementing this functionality requires using + a real Unicode library like ICU. If a client application that uses + qpdf wants to do this, the qpdf library will accept the resulting + passwords, but qpdf will not perform these transformations itself. + It is possible that this will be addressed in a future version of + qpdf. The <classname>QPDFWriter</classname> methods that enable + encryption on the output file accept passwords as strings of + bytes. + </para> + <para> + Please note that the <option>--password-is-hex-key</option> option + is unrelated to all this. This flag bypasses the normal process of + going from password to encryption string entirely, allowing the + raw encryption key to be specified directly. This is useful for + forensic purposes or for brute-force recovery of files with + unknown passwords. + </para> + </sect1> </chapter> <chapter id="ref.qdf"> <title>QDF Mode</title> @@ -1730,6 +2505,8 @@ outfile.pdf</option> </chapter> <chapter id="ref.using-library"> <title>Using the QPDF Library</title> + <sect1 id="ref.using.from-cxx"> + <title>Using QPDF from C++</title> <para> The source tree for the qpdf package has an <filename>examples</filename> directory that contains a few @@ -1761,6 +2538,291 @@ outfile.pdf</option> time. Multiple threads may simultaneously work with different instances of these and all other QPDF objects. </para> + </sect1> + <sect1 id="ref.using.other-languages"> + <title>Using QPDF from other languages</title> + <para> + The qpdf library is implemented in C++, which makes it hard to use + directly in other languages. There are a few things that can help. + </para> + <variablelist> + <varlistentry> + <term>“C”</term> + <listitem> + <para> + The qpdf library includes a “C” language interface + that provides a subset of the overall capabilities. The header + file <filename>qpdf/qpdf-c.h</filename> includes information + about its use. As long as you use a C++ linker, you can link C + programs with qpdf and use the C API. For languages that can + directly load methods from a shared library, the C API can also + be useful. People have reported success using the C API from + other languages on Windows by directly calling functions in the + DLL. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term>Python</term> + <listitem> + <para> + A Python module called <ulink + url="https://pypi.org/project/pikepdf/">pikepdf</ulink> + provides a clean and highly functional set of Python bindings + to the qpdf library. Using pikepdf, you can work with PDF files + in a natural way and combine qpdf's capabilities with other + functionality provided by Python's rich standard library and + available modules. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term>Other Languages</term> + <listitem> + <para> + Starting with version 8.3.0, the <command>qpdf</command> + command-line tool can produce a JSON representation of the PDF + file's non-content data. This can facilitate interacting + programmatically with PDF files through qpdf's command line + interface. For more information, please see <xref + linkend="ref.json"/>. + </para> + </listitem> + </varlistentry> + </variablelist> + </sect1> + </chapter> + <chapter id="ref.json"> + <title>QPDF JSON</title> + <sect1 id="ref.json-overview"> + <title>Overview</title> + <para> + Beginning with qpdf version 8.3.0, the <command>qpdf</command> + command-line program can produce a JSON representation of the + non-content data in a PDF file. It includes a dump in JSON format + of all objects in the PDF file excluding the content of streams. + This JSON representation makes it very easy to look in detail at + the structure of a given PDF file, and it also provides a great way + to work with PDF files programmatically from the command-line in + languages that can't call or link with the qpdf library directly. + Note that stream data can be extracted from PDF files using other + qpdf command-line options. + </para> + </sect1> + <sect1 id="ref.json-guarantees"> + <title>JSON Guarantees</title> + <para> + The qpdf JSON representation includes a JSON serialization of the + raw objects in the PDF file as well as some computed information in + a more easily extracted format. QPDF provides some guarantees about + its JSON format. These guarantees are designed to simplify the + experience of a developer working with the JSON format. + <variablelist> + <varlistentry> + <term>Compatibility</term> + <listitem> + <para> + The top-level JSON object output is a dictionary. The JSON + output contains various nested dictionaries and arrays. With + the exception of dictionaries that are populated by the fields + of objects from the file, all instances of a dictionary are + guaranteed to have exactly the same keys. Future versions of + qpdf are free to add additional keys but not to remove keys or + change the type of object that a key points to. The qpdf + program validates this guarantee, and in the unlikely event + that a bug in qpdf should cause it to generate data that + doesn't conform to this rule, it will ask you to file a bug + report. + </para> + <para> + The top-level JSON structure contains a + “<literal>version</literal>” key whose value is + simple integer. The value of the <literal>version</literal> key + will be incremented if a non-compatible change is made. A + non-compatible change would be any change that involves removal + of a key, a change to the format of data pointed to by a key, + or a semantic change that requires a different interpretation + of a previously existing key. A strong effort will be made to + avoid breaking compatibility. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term>Documentation</term> + <listitem> + <para> + The <command>qpdf</command> command can be invoked with the + <option>--json-help</option> option. This will output a JSON + structure that has the same structure as the JSON output that + qpdf generates, except that each field in the help output is a + description of the corresponding field in the JSON output. The + specific guarantees are as follows: + <itemizedlist> + <listitem> + <para> + A dictionary in the help output means that the corresponding + location in the actual JSON output is also a dictionary with + exactly the same keys; that is, no keys present in help are + absent in the real output, and no keys will be present in + the real output that are not in help. + </para> + </listitem> + <listitem> + <para> + A string in the help output is a description of the item + that appears in the corresponding location of the actual + output. The corresponding output can have any format. + </para> + </listitem> + <listitem> + <para> + An array in the help output always contains a single + element. It indicates that the corresponding location in the + actual output is also an array, and that each element of the + array has whatever format is implied by the single element + of the help output's array. + </para> + </listitem> + </itemizedlist> + For example, the help output indicates includes a + “<literal>pagelabels</literal>” key whose value is + an array of one element. That element is a dictionary with keys + “<literal>index</literal>” and + “<literal>label</literal>”. In addition to + describing the meaning of those keys, this tells you that the + actual JSON output will contain a <literal>pagelabels</literal> + array, each of whose elements is a dictionary that contains an + <literal>index</literal> key, a <literal>label</literal> key, + and no other keys. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term>Directness and Simplicity</term> + <listitem> + <para> + The JSON output contains the value of every object in the file, + but it also contains some processed data. This is analogous to + how qpdf's library interface works. The processed data is + similar to the helper functions in that it allows you to look + at certain aspects of the PDF file without having to understand + all the nuances of the PDF specification, while the raw objects + allow you to mine the PDF for anything that the higher-level + interfaces are lacking. + </para> + </listitem> + </varlistentry> + </variablelist> + </para> + </sect1> + <sect1 id="json.limitations"> + <title>Limitations of JSON Representation</title> + <para> + There are a few limitations to be aware of with the JSON structure: + <itemizedlist> + <listitem> + <para> + Strings, names, and indirect object references in the original + PDF file are all converted to strings in the JSON + representation. In the case of a “normal” PDF file, + you can tell the difference because a name starts with a slash + (<literal>/</literal>), and an indirect object reference looks + like <literal>n n R</literal>, but if there were to be a string + that looked like a name or indirect object reference, there + would be no way to tell this from the JSON output. Note that + there are certain cases where you know for sure what something + is, such as knowing that dictionary keys in objects are always + names and that certain things in the higher-level computed data + are known to contain indirect object references. + </para> + </listitem> + <listitem> + <para> + The JSON format doesn't support binary data very well. Mostly + the details are not important, but they are presented here for + information. When qpdf outputs a string in the JSON + representation, it converts the string to UTF-8, assuming usual + PDF string semantics. Specifically, if the original string is + UTF-16, it is converted to UTF-8. Otherwise, it is assumed to + have PDF doc encoding, and is converted to UTF-8 with that + assumption. This causes strange things to happen to binary + strings. For example, if you had the binary string + <literal><038051></literal>, this would be output to the + JSON as <literal>\u0003•Q</literal> because + <literal>03</literal> is not a printable character and + <literal>80</literal> is the bullet character in PDF doc + encoding and is mapped to the Unicode value + <literal>2022</literal>. Since <literal>51</literal> is + <literal>Q</literal>, it is output as is. If you wanted to + convert back from here to a binary string, would have to + recognize Unicode values whose code points are higher than + <literal>0xFF</literal> and map those back to their + corresponding PDF doc encoding characters. There is no way to + tell the difference between a Unicode string that was originally + encoded as UTF-16 or one that was converted from PDF doc + encoding. In other words, it's best if you don't try to use the + JSON format to extract binary strings from the PDF file, but if + you really had to, it could be done. Note that qpdf's + <option>--show-object</option> option does not have this + limitation and will reveal the string as encoded in the original + file. + </para> + </listitem> + </itemizedlist> + </para> + </sect1> + <sect1 id="json.considerations"> + <title>JSON: Special Considerations</title> + <para> + For the most part, the built-in JSON help tells you everything you + need to know about the JSON format, but there are a few + non-obvious things to be aware of: + <itemizedlist> + <listitem> + <para> + While qpdf guarantees that keys present in the help will be + present in the output, those fields may be null or empty if the + information is not known or absent in the file. Also, if you + specify <option>--json-keys</option>, the keys that are not + listed will be excluded entirely except for those that + <option>--json-help</option> says are always present. + </para> + </listitem> + <listitem> + <para> + In a few places, there are keys with names containing + <literal>pageposfrom1</literal>. The values of these keys are + null or an integer. If an integer, they point to a page index + within the file numbering from 1. Note that JSON indexes from + 0, and you would also use 0-based indexing using the API. + However, 1-based indexing is easier in this case because the + command-line syntax for specifying page ranges is 1-based. If + you were going to write a program that looked through the JSON + for information about specific pages and then use the + command-line to extract those pages, 1-based indexing is + easier. Besides, it's more convenient to subtract 1 from a + program in a real programming language than it is to add 1 from + shell code. + </para> + </listitem> + <listitem> + <para> + The image information included in the <literal>page</literal> + section of the JSON output includes the key + “<literal>filterable</literal>”. Note that the + value of this field may depend on the + <option>--decode-level</option> that you invoke qpdf with. The + JSON output includes a top-level key + “<literal>parameters</literal>” that indicates the + decode level used for computing whether a stream was + filterable. For example, jpeg images will be shown as not + filterable by default, but they will be shown as filterable if + you run <command>qpdf --json --decode-level=all</command>. + </para> + </listitem> + </itemizedlist> + </para> + </sect1> </chapter> <chapter id="ref.design"> <title>Design and Library Notes</title> @@ -3240,6 +4302,830 @@ print "\n"; </para> <variablelist> <varlistentry> + <term>8.4.0: February 1, 2019</term> + <listitem> + <itemizedlist> + <listitem> + <para> + Command-line Enhancements + </para> + <itemizedlist> + <listitem> + <para> + <emphasis>Non-compatible CLI change:</emphasis> The qpdf + command-line tool interprets passwords given at the + command-line differently from previous releases when the + passwords contain non-ASCII characters. In some cases, the + behavior differs from previous releases. For a discussion of + the current behavior, please see <xref + linkend="ref.unicode-passwords"/>. The incompatibilities are + as follows: + <itemizedlist> + <listitem> + <para> + On Windows, qpdf now receives all command-line options as + Unicode strings if it can figure out the appropriate + compile/link options. This is enabled at least for MSVC + and mingw builds. That means that if non-ASCII strings + are passed to the qpdf CLI in Windows, qpdf will now + correctly receive them. In the past, they would have + either been encoded as Windows code page 1252 (also known + as “Windows ANSI” or as something + unintelligible. In almost all cases, qpdf is able to + properly interpret Unicode arguments now, whereas in the + past, it would almost never interpret them properly. The + result is that non-ASCII passwords given to the qpdf CLI + on Windows now have a much greater chance of creating PDF + files that can be opened by a variety of readers. In the + past, usually files encrypted from the Windows CLI using + non-ASCII passwords would not be readable by most + viewers. Note that the current version of qpdf is able to + decrypt files that it previously created using the + previously supplied password. + </para> + </listitem> + <listitem> + <para> + The PDF specification requires passwords to be encoded as + UTF-8 for 256-bit encryption and with PDF Doc encoding + for 40-bit or 128-bit encryption. Older versions of qpdf + left it up to the user to provide passwords with the + correct encoding. The qpdf CLI now detects when a + password is given with UTF-8 encoding and automatically + transcodes it to what the PDF spec requires. While this + is almost always the correct behavior, it is possible to + override the behavior if there is some reason to do so. + This is discussed in more depth in <xref + linkend="ref.unicode-passwords"/>. + </para> + </listitem> + </itemizedlist> + </para> + </listitem> + <listitem> + <para> + New options <option>--externalize-inline-images</option>, + <option>--ii-min-bytes</option>, and + <option>--keep-inline-images</option> control qpdf's + handling of inline images and possible conversion of them to + regular images. By default, + <option>--optimize-images</option> now also applies to + inline images. These options are discussed in <xref + linkend="ref.advanced-transformation"/>. + </para> + </listitem> + <listitem> + <para> + Add options <option>--overlay</option> and + <option>--underlay</option> for overlaying or underlaying + pages of other files onto output pages. See <xref + linkend="ref.overlay-underlay"/> for details. + </para> + </listitem> + <listitem> + <para> + When opening an encrypted file with a password, if the + specified password doesn't work and the password contains + any non-ASCII characters, qpdf will try a number of + alternative passwords to try to compensate for possible + character encoding errors. This behavior can be suppressed + with the <option>--suppress-password-recovery</option> + option. See <xref linkend="ref.unicode-passwords"/> for a + full discussion. + </para> + </listitem> + <listitem> + <para> + Add the <option>--password-mode</option> option to fine-tune + how qpdf interprets password arguments, especially when they + contain non-ASCII characters. See <xref + linkend="ref.unicode-passwords"/> for more information. + </para> + </listitem> + <listitem> + <para> + In the <option>--pages</option> option, it is now possible + to copy the same page more than once from the same file + without using the previous workaround of specifying two + different paths to the same file. + </para> + </listitem> + <listitem> + <para> + In the <option>--pages</option> option, allow use of + “.” as a shortcut for the primary input file. + That way, you can do <command>qpdf in.pdf --pages . 1-2 -- + out.pdf</command> instead of having to repeat + <filename>in.pdf</filename> in the command. + </para> + </listitem> + <listitem> + <para> + When encrypting with 128-bit and 256-bit encryption, new + encryption options <option>--assemble</option>, + <option>--annotate</option>, <option>--form</option>, and + <option>--modify-other</option> allow more fine-grained + granularity in configuring options. Before, the + <option>--modify</option> option only configured certain + predefined groups of permissions. + </para> + </listitem> + </itemizedlist> + </listitem> + <listitem> + <para> + Bug Fixes and Enhancements + </para> + <itemizedlist> + <listitem> + <para> + <emphasis>Potential data-loss bug:</emphasis> Versions of + qpdf between 8.1.0 and 8.3.0 had a bug that could cause page + splitting and merging operations to drop some font or image + resources if the PDF file's internal structure shared these + resource lists across pages and if some but not all of the + pages in the output did not reference all the fonts and + images. Using the + <option>--preserve-unreferenced-resources</option> option + would work around the incorrect behavior. This bug was the + result of a typo in the code and a deficiency in the test + suite. The case that triggered the error was known, just not + handled properly. This case is now exercised in qpdf's test + suite and properly handled. + </para> + </listitem> + <listitem> + <para> + When optimizing images, detect and refuse to optimize + images that can't be converted to JPEG because of bit depth + or color space. + </para> + </listitem> + <listitem> + <para> + Linearization and page manipulation APIs now detect and + recover from files that have duplicate Page objects in the + pages tree. + </para> + </listitem> + <listitem> + <para> + Using older option <option>--stream-data=compress</option> + with object streams, object streams and xref streams were + not compressed. + </para> + </listitem> + <listitem> + <para> + When the tokenizer returns inline image tokens, delimiters + following <literal>ID</literal> and <literal>EI</literal> + operators are no longer excluded. This makes it possible to + reliably extract the actual image data. + </para> + </listitem> + </itemizedlist> + </listitem> + <listitem> + <para> + Library Enhancements + </para> + <itemizedlist> + <listitem> + <para> + Add method + <function>QPDFPageObjectHelper::externalizeInlineImages</function> + to convert inline images to regular images. + </para> + </listitem> + <listitem> + <para> + Add method + <function>QUtil::possible_repaired_encodings()</function> to + generate a list of strings that represent other ways the + given string could have been encoded. This is the method the + QPDF CLI uses to generate the strings it tries when + recovering incorrectly encoded Unicode passwords. + </para> + </listitem> + <listitem> + <para> + Add new versions of + <function>QPDFWriter::setR{3,4,5,6}EncryptionParameters</function> + that allow more granular setting of permissions bits. See + <filename>QPDFWriter.hh</filename> for details. + </para> + </listitem> + <listitem> + <para> + Add new versions of the transcoders from UTF-8 to + single-byte coding systems in <classname>QUtil</classname> + that report success or failure rather than just substituting + a specified unknown character. + </para> + </listitem> + <listitem> + <para> + Add method <function>QUtil::analyze_encoding()</function> to + determine whether a string has high-bit characters and is + appears to be UTF-16 or valid UTF-8 encoding. + </para> + </listitem> + <listitem> + <para> + Add new method + <function>QPDFPageObjectHelper::shallowCopyPage()</function> + to copy a new page that is a “shallow copy” of a + page. The resulting object is an indirect object ready to be + passed to + <function>QPDFPageDocumentHelper::addPage()</function> for + either the original <classname>QPDF</classname> object or a + different one. This is what the <command>qpdf</command> + command-line tool uses to copy the same page multiple times + from the same file during splitting and merging operations. + </para> + </listitem> + <listitem> + <para> + Add method <function>QPDF::getUniqueId()</function>, which + returns a unique identifier for the given QPDF object. The + identifier will be unique across the life of the + application. The returned value can be safely used as a map + key. + </para> + </listitem> + <listitem> + <para> + Add method <function>QPDF::setImmediateCopyFrom</function>. + This further enhances qpdf's ability to allow a + <classname>QPDF</classname> object from which objects are + being copied to go out of scope before the destination + object is written. If you call this method on a + <classname>QPDF</classname> instances, objects copied + <emphasis>from</emphasis> this instance will be copied + immediately instead of lazily. This option uses more memory + but allows the source object to go out of scope before the + destination object is written in all cases. See comments in + <filename>QPDF.hh</filename> for details. + </para> + </listitem> + <listitem> + <para> + Add method + <function>QPDFPageObjectHelper::getAttribute</function> for + retrieving an attribute from the page dictionary taking + inheritance into consideration, and optionally making a copy + if your intention is to modify the attribute. + </para> + </listitem> + <listitem> + <para> + Fix long-standing limitation of + <function>QPDFPageObjectHelper::getPageImages</function> so + that it now properly reports images from inherited resources + dictionaries, eliminating the need to call + <function>QPDFPageDocumentHelper::pushInheritedAttributesToPage</function> + in this case. + </para> + </listitem> + <listitem> + <para> + Add method + <function>QPDFObjectHandle::getUniqueResourceName</function> + for finding an unused name in a resource dictionary. + </para> + </listitem> + <listitem> + <para> + Add method + <function>QPDFPageObjectHelper::getFormXObjectForPage</function> + for generating a form XObject equivalent to a page. The + resulting object can be used in the same file or copied to + another file with <function>copyForeignObject</function>. + This can be useful for implementing underlay, overlay, n-up, + thumbnails, or any other functionality requiring replication + of pages in other contexts. + </para> + </listitem> + <listitem> + <para> + Add method + <function>QPDFPageObjectHelper::placeFormXObject</function> + for generating content stream text that places a given form + XObject on a page, centered and fit within a specified + rectangle. This method takes care of computing the proper + transformation matrix and may optionally compensate for + rotation or scaling of the destination page. + </para> + </listitem> + </itemizedlist> + </listitem> + <listitem> + <para> + Build Improvements + </para> + <itemizedlist> + <listitem> + <para> + Add new configure option + <option>--enable-avoid-windows-handle</option>, which causes + the preprocessor symbol + <literal>AVOID_WINDOWS_HANDLE</literal> to be defined. When + defined, qpdf will avoid referencing the Windows + <classname>HANDLE</classname> type, which is disallowed with + certain versions of the Windows SDK. + </para> + </listitem> + <listitem> + <para> + For Windows builds, attempt to determine what options, if + any, have to be passed to the compiler and linker to enable + use of <function>wmain</function>. This causes the + preprocessor symbol <literal>WINDOWS_WMAIN</literal> to be + defined. If you do your own builds with other compilers, you + can define this symbol to cause <function>wmain</function> + to be used. This is needed to allow the Windows + <command>qpdf</command> command to receive Unicode + command-line options. + </para> + </listitem> + </itemizedlist> + </listitem> + </itemizedlist> + </listitem> + </varlistentry> + <varlistentry> + <term>8.3.0: January 7, 2019</term> + <listitem> + <itemizedlist> + <listitem> + <para> + Command-line Enhancements + </para> + <itemizedlist> + <listitem> + <para> + Shell completion: you can now use eval <command>$(qpdf + --completion-bash)</command> and eval <command>$(qpdf + --completion-zsh)</command> to enable shell completion for + bash and zsh. + </para> + </listitem> + <listitem> + <para> + Page numbers (also known as page labels) are now preserved + when merging and splitting files with the + <option>--pages</option> and <option>--split-pages</option> + options. + </para> + </listitem> + <listitem> + <para> + Bookmarks are partially preserved when splitting pages with + the <option>--split-pages</option> option. Specifically, the + outlines dictionary and some supporting metadata are copied + into the split files. The result is that all bookmarks from + the original file appear, those that point to pages that are + preserved work, and those that point to pages that are not + preserved don't do anything. This is an interim step toward + proper support for bookmarks in splitting and merging + operations. + </para> + </listitem> + <listitem> + <para> + Page collation: add new option <option>--collate</option>. + When specified, the semantics of <option>--pages</option> + change from concatenation to collation. See <xref + linkend="ref.page-selection"/> for examples and discussion. + </para> + </listitem> + <listitem> + <para> + Generation of information in JSON format, primarily to + facilitate use of qpdf from languages other than C++. Add + new options <option>--json</option>, + <option>--json-key</option>, and + <option>--json-object</option> to generate a JSON + representation of the PDF file. Run <command>qpdf + --json-help</command> to get a description of the JSON + format. For more information, see <xref linkend="ref.json"/>. + </para> + </listitem> + <listitem> + <para> + The <option>--generate-appearances</option> flag will cause + qpdf to generate appearances for form fields if the PDF file + indicates that form field appearances are out of date. This + can happen when PDF forms are filled in by a program that + doesn't know how to regenerate the appearances of the + filled-in fields. + </para> + </listitem> + <listitem> + <para> + The <option>--flatten-annotations</option> flag can be used + to <emphasis>flatten</emphasis> annotations, including form + fields. Ordinarily, annotations are drawn separately from + the page. Flattening annotations is the process of combining + their appearances into the page's contents. You might want + to do this if you are going to rotate or combine pages using + a tool that doesn't understand about annotations. You may + also want to use <option>--generate-appearances</option> + when using this flag since annotations for outdated form + fields are not flattened as that would cause loss of + information. + </para> + </listitem> + <listitem> + <para> + The <option>--optimize-images</option> flag tells qpdf to + recompresses every image using DCT (JPEG) compression as + long as the image is not already compressed with lossy + compression and recompressing the image reduces its size. + The additional options <option>--oi-min-width</option>, + <option>--oi-min-height</option>, and + <option>--oi-min-area</option> prevent recompression of + images whose width, height, or pixel area + (width × height) are below a specified + threshold. + </para> + </listitem> + <listitem> + <para> + The <option>--show-object</option> option can now be given + as <option>--show-object=trailer</option> to show the + trailer dictionary. + </para> + </listitem> + </itemizedlist> + </listitem> + <listitem> + <para> + Bug Fixes and Enhancements + </para> + <itemizedlist> + <listitem> + <para> + QPDF now automatically detects and recovers from dangling + references. If a PDF file contained an indirect reference to + a non-existent object, which is valid, when adding a new + object to the file, it was possible for the new object to + take the object ID of the dangling reference, thereby + causing the dangling reference to point to the new object. + This case is now prevented. + </para> + </listitem> + <listitem> + <para> + Fixes to form field setting code: strings are always written + in UTF-16 format, and checkboxes and radio buttons are + handled properly with respect to synchronization of values + and appearance states. + </para> + </listitem> + <listitem> + <para> + The <function>QPDF::checkLinearization()</function> no + longer causes the program to crash when it detects problems + with linearization data. Instead, it issues a normal warning + or error. + </para> + </listitem> + <listitem> + <para> + Ordinarily qpdf treats an argument of the form + <option>@file</option> to mean that command-line options + should be read from <filename>file</filename>. Now, if + <filename>file</filename> does not exist but + <filename>@file</filename> does, qpdf will treat + <filename>@file</filename> as a regular option. This makes + it possible to work more easily with PDF files whose names + happen to start with the <literal>@</literal> character. + </para> + </listitem> + </itemizedlist> + </listitem> + <listitem> + <para> + Library Enhancements + </para> + <itemizedlist> + <listitem> + <para> + Remove the restriction in most cases that the source QPDF + object used in a + <function>QPDF::copyForeignObject</function> call has to + stick around until the destination QPDF is written. The + exceptional case is when the source stream gets is data + using a QPDFObjectHandle::StreamDataProvider. For a more + in-depth discussion, see comments around + <function>copyForeignObject</function> in + <filename>QPDF.hh</filename>. + </para> + </listitem> + <listitem> + <para> + Add new method + <function>QPDFWriter::getFinalVersion()</function>, which + returns the PDF version that will ultimately be written to + the final file. See comments in + <filename>QPDFWriter.hh</filename> for some restrictions on + its use. + </para> + </listitem> + <listitem> + <para> + Add several methods for transcoding strings to some of the + character sets used in PDF files: + <function>QUtil::utf8_to_ascii</function>, + <function>QUtil::utf8_to_win_ansi</function>, + <function>QUtil::utf8_to_mac_roman</function>, and + <function>QUtil::utf8_to_utf16</function>. For the + single-byte encodings that support only a limited character + sets, these methods replace unsupported characters with a + specified substitute. + </para> + </listitem> + <listitem> + <para> + Add new methods to + <classname>QPDFAnnotationObjectHelper</classname> and + <classname>QPDFFormFieldObjectHelper</classname> for + querying flags and interpretation of different field types. + Define constants in <filename>qpdf/Constants.h</filename> to + help with interpretation of flag values. + </para> + </listitem> + <listitem> + <para> + Add new methods + <function>QPDFAcroFormDocumentHelper::generateAppearancesIfNeeded</function> + and + <function>QPDFFormFieldObjectHelper::generateAppearance</function> + for generating appearance streams. See discussion in + <filename>QPDFFormFieldObjectHelper.hh</filename> for + limitations. + </para> + </listitem> + <listitem> + <para> + Add two new helper functions for dealing with resource + dictionaries: + <function>QPDFObjectHandle::getResourceNames()</function> + returns a list of all second-level keys, which correspond to + the names of resources, and + <function>QPDFObjectHandle::mergeResources()</function> + merges two resources dictionaries as long as they have + non-conflicting keys. These methods are useful for certain + types of objects that resolve resources from multiple places, + such as form fields. + </para> + </listitem> + <listitem> + <para> + Add methods + <function>QPDFPageDocumentHelper::flattenAnnotations()</function> + and + <function>QPDFAnnotationObjectHelper::getPageContentForAppearance()</function> + for handling low-level details of annotation flattening. + </para> + </listitem> + <listitem> + <para> + Add new helper classes: + <classname>QPDFOutlineDocumentHelper</classname>, + <classname>QPDFOutlineObjectHelper</classname>, + <classname>QPDFPageLabelDocumentHelper</classname>, + <classname>QPDFNameTreeObjectHelper</classname>, and + <classname>QPDFNumberTreeObjectHelper</classname>. + </para> + </listitem> + <listitem> + <para> + Add method <function>QPDFObjectHandle::getJSON()</function> + that returns a JSON representation of the object. Call + <function>serialize()</function> on the result to convert it + to a string. + </para> + </listitem> + <listitem> + <para> + Add a simple JSON serializer. This is not a complete or + general-purpose JSON library. It allows assembly and + serialization of JSON structures with some restrictions, + which are described in the header file. This is the + serializer used by qpdf's new JSON representation. + </para> + </listitem> + <listitem> + <para> + Add new <classname>QPDFObjectHandle::Matrix</classname> + class along with a few convenience methods for dealing with + six-element numerical arrays as matrices. + </para> + </listitem> + <listitem> + <para> + Add new method + <function>QPDFObjectHandle::wrapInArray</function>, which returns + the object itself if it is an array, or an array containing + the object otherwise. This is a common construct in PDF. + This method prevents you from having to explicitly test + whether something is a single element or an array. + </para> + </listitem> + </itemizedlist> + </listitem> + <listitem> + <para> + Build Improvements + </para> + <itemizedlist> + <listitem> + <para> + It is no longer necessary to run + <command>autogen.sh</command> to build from a pristine + checkout. Automatically generated files are now committed so + that it is possible to build on platforms without autoconf + directly from a clean checkout of the repository. The + <command>configure</command> script detects if the files are + out of date when it also determines that the tools are + present to regenerate them. + </para> + </listitem> + <listitem> + <para> + Pull requests and the master branch are now built + automatically in <ulink + url="https://dev.azure.com/qpdf/qpdf/_build">Azure + Pipelines</ulink>, which is free for open source projects. + The build includes Linux, mac, Windows 32-bit and 64-bit + with mingw and MSVC, and an AppImage build. Official qpdf + releases are now built with Azure Pipelines. + </para> + </listitem> + </itemizedlist> + </listitem> + <listitem> + <para> + Notes for Packagers + </para> + <itemizedlist> + <listitem> + <para> + A new section has been added to the documentation with notes + for packagers. Please see <xref linkend="ref.packaging"/>. + </para> + </listitem> + <listitem> + <para> + The qpdf detects out-of-date automatically generated files. + If your packaging system automatically refreshes libtool or + autoconf files, it could cause this check to fail. To avoid + this problem, pass + <option>--disable-check-autofiles</option> to + <command>configure</command>. + </para> + </listitem> + <listitem> + <para> + If you would like to have qpdf completion enabled + automatically, you can install completion files in the + distribution's default location. You can find sample + completion files to install in the + <filename>completions</filename> directory. + </para> + </listitem> + </itemizedlist> + </listitem> + </itemizedlist> + </listitem> + </varlistentry> + <varlistentry> + <term>8.2.1: August 18, 2018</term> + <listitem> + <itemizedlist> + <listitem> + <para> + Command-line Enhancements + </para> + <itemizedlist> + <listitem> + <para> + Add + <option>--keep-files-open=<replaceable>[yn]</replaceable></option> + to override default determination of whether to keep files + open when merging. Please see the discussion of + <option>--keep-files-open</option> in <xref + linkend="ref.basic-options"/> for additional details. + </para> + </listitem> + </itemizedlist> + </listitem> + </itemizedlist> + </listitem> + </varlistentry> + <varlistentry> + <term>8.2.0: August 16, 2018</term> + <listitem> + <itemizedlist> + <listitem> + <para> + Command-line Enhancements + </para> + <itemizedlist> + <listitem> + <para> + Add <option>--no-warn</option> option to suppress issuing + warning messages. If there are any conditions that would + have caused warnings to be issued, the exit status is still + 3. + </para> + </listitem> + </itemizedlist> + </listitem> + <listitem> + <para> + Bug Fixes and Optimizations + </para> + <itemizedlist> + <listitem> + <para> + Performance fix: optimize page merging operation to avoid + unnecessary open/close calls on files being merged. This + solves a dramatic slow-down that was observed when merging + certain types of files. + </para> + </listitem> + <listitem> + <para> + Optimize how memory was used for the TIFF predictor, + drastically improving performance and memory usage for files + containing high-resolution images compressed with Flate + using the TIFF predictor. + </para> + </listitem> + <listitem> + <para> + Bug fix: end of line characters were not properly handled + inside strings in some cases. + </para> + </listitem> + <listitem> + <para> + Bug fix: using <option>--progress</option> on very small + files could cause an infinite loop. + </para> + </listitem> + </itemizedlist> + </listitem> + <listitem> + <para> + API enhancements + </para> + <itemizedlist> + <listitem> + <para> + Add new class <classname>QPDFSystemError</classname>, derived + from <classname>std::runtime_error</classname>, which is now + thrown by <function>QUtil::throw_system_error</function>. + This enables the triggering <classname>errno</classname> + value to be retrieved. + </para> + </listitem> + <listitem> + <para> + Add <function>ClosedFileInputSource::stayOpen</function> + method, enabling a + <classname>ClosedFileInputSource</classname> to stay open + during manually indicated periods of high activity, thus + reducing the overhead of frequent open/close operations. + </para> + </listitem> + </itemizedlist> + </listitem> + <listitem> + <para> + Build Changes + </para> + <itemizedlist> + <listitem> + <para> + For the mingw builds, change the name of the DLL import + library from <filename>libqpdf.a</filename> to + <filename>libqpdf.dll.a</filename> to more accurately + reflect that it is an import library rather than a static + library. This potentially clears the way for supporting a + static library in the future, though presently, the qpdf + Windows build only builds the DLL and executables. + </para> + </listitem> + </itemizedlist> + </listitem> + </itemizedlist> + </listitem> + </varlistentry> + <varlistentry> <term>8.1.0: June 23, 2018</term> <listitem> <itemizedlist> @@ -3872,8 +5758,6 @@ print "\n"; </itemizedlist> </listitem> </varlistentry> - </variablelist> - <variablelist> <varlistentry> <term>6.0.0: November 10, 2015</term> <listitem> |