diff options
Diffstat (limited to 'manual/cli.rst')
-rw-r--r-- | manual/cli.rst | 1675 |
1 files changed, 1675 insertions, 0 deletions
diff --git a/manual/cli.rst b/manual/cli.rst new file mode 100644 index 00000000..e8a07f5f --- /dev/null +++ b/manual/cli.rst @@ -0,0 +1,1675 @@ +.. _ref.using: + +Running QPDF +============ + +This chapter describes how to run the qpdf program from the command +line. + +.. _ref.invocation: + +Basic Invocation +---------------- + +When running qpdf, the basic invocation is as follows: + +:: + + qpdf [ options ] { infilename | --empty } outfilename + +This converts PDF file :samp:`infilename` to PDF file +:samp:`outfilename`. The output file is functionally +identical to the input file but may have been structurally reorganized. +Also, orphaned objects will be removed from the file. Many +transformations are available as controlled by the options below. In +place of :samp:`infilename`, the parameter +:samp:`--empty` may be specified. This causes qpdf to +use a dummy input file that contains zero pages. The only normal use +case for using :samp:`--empty` would be if you were +going to add pages from another source, as discussed in :ref:`ref.page-selection`. + +If :samp:`@filename` appears as a word anywhere in the +command-line, it will be read line by line, and each line will be +treated as a command-line argument. Leading and trailing whitespace is +intentionally not removed from lines, which makes it possible to handle +arguments that start or end with spaces. The :samp:`@-` +option allows arguments to be read from standard input. This allows qpdf +to be invoked with an arbitrary number of arbitrarily long arguments. It +is also very useful for avoiding having to pass passwords on the command +line. Note that the :samp:`@filename` can't appear in +the middle of an argument, so constructs such as +:samp:`--arg=@option` will not work. You would have to +include the argument and its options together in the arguments file. + +:samp:`outfilename` does not have to be seekable, even +when generating linearized files. Specifying ":samp:`-`" +as :samp:`outfilename` means to write to standard +output. If you want to overwrite the input file with the output, use the +option :samp:`--replace-input` and omit the output file +name. You can't specify the same file as both the input and the output. +If you do this, qpdf will tell you about the +:samp:`--replace-input` option. + +Most options require an output file, but some testing or inspection +commands do not. These are specifically noted. + +.. _ref.exit-status: + +Exit Status +~~~~~~~~~~~ + +The exit status of :command:`qpdf` may be interpreted as +follows: + +- ``0``: no errors or warnings were found. The file may still have + problems qpdf can't detect. If + :samp:`--warning-exit-0` was specified, exit status 0 + is used even if there are warnings. + +- ``2``: errors were found. qpdf was not able to fully process the + file. + +- ``3``: qpdf encountered problems that it was able to recover from. In + some cases, the resulting file may still be damaged. Note that qpdf + still exits with status ``3`` if it finds warnings even when + :samp:`--no-warn` is specified. With + :samp:`--warning-exit-0`, warnings without errors + exit with status 0 instead of 3. + +Note that :command:`qpdf` never exists with status ``1``. +If you get an exit status of ``1``, it was something else, like the +shell not being able to find or execute :command:`qpdf`. + +.. _ref.shell-completion: + +Shell Completion +---------------- + +Starting in qpdf version 8.3.0, qpdf provides its own completion support +for zsh and bash. You can enable bash completion with :command:`eval +$(qpdf --completion-bash)` and zsh completion with +:command:`eval $(qpdf --completion-zsh)`. If +:command:`qpdf` is not in your path, you should invoke it +above with an absolute path. If you invoke it with a relative path, it +will warn you, and the completion won't work if you're in a different +directory. + +qpdf will use ``argv[0]`` to figure out where its executable is. This +may produce unwanted results in some cases, especially if you are trying +to use completion with copy of qpdf that is built from source. You can +specify a full path to the qpdf you want to use for completion in the +``QPDF_EXECUTABLE`` environment variable. + +.. _ref.basic-options: + +Basic Options +------------- + +The following options are the most common ones and perform commonly +needed transformations. + +:samp:`--help` + Display command-line invocation help. + +:samp:`--version` + Display the current version of qpdf. + +:samp:`--copyright` + Show detailed copyright information. + +:samp:`--show-crypto` + Show a list of available crypto providers, each on a line by itself. + The default provider is always listed first. See :ref:`ref.crypto` for more information about crypto + providers. + +:samp:`--completion-bash` + Output a completion command you can eval to enable shell completion + from bash. + +:samp:`--completion-zsh` + Output a completion command you can eval to enable shell completion + from zsh. + +:samp:`--password={password}` + Specifies a password for accessing encrypted files. To read the + password from a file or standard input, you can use + :samp:`--password-file`, added in qpdf 10.2. Note + that you can also use :samp:`@filename` or + :samp:`@-` as described above to put the password in + a file or pass it via standard input, but you would do so by + specifying the entire + :samp:`--password={password}` + option in the file. Syntax such as + :samp:`--password=@filename` won't work since + :samp:`@filename` is not recognized in the middle of + an argument. + +:samp:`--password-file={filename}` + Reads the first line from the specified file and uses it as the + password for accessing encrypted files. + :samp:`{filename}` + may be ``-`` to read the password from standard input. Note that, in + this case, the password is echoed and there is no prompt, so use with + caution. + +:samp:`--is-encrypted` + Silently exit with status 0 if the file is encrypted or status 2 if + the file is not encrypted. This is useful for shell scripts. Other + options are ignored if this is given. This option is mutually + exclusive with :samp:`--requires-password`. Both this + option and :samp:`--requires-password` exit with + status 2 for non-encrypted files. + +:samp:`--requires-password` + Silently exit with status 0 if a password (other than as supplied) is + required. Exit with status 2 if the file is not encrypted. Exit with + status 3 if the file is encrypted but requires no password or the + correct password has been supplied. This is useful for shell scripts. + Note that any supplied password is used when opening the file. When + used with a :samp:`--password` option, this option + can be used to check the correctness of the password. In that case, + an exit status of 3 means the file works with the supplied password. + This option is mutually exclusive with + :samp:`--is-encrypted`. Both this option and + :samp:`--is-encrypted` exit with status 2 for + non-encrypted files. + +:samp:`--verbose` + Increase verbosity of output. For now, this just prints some + indication of any file that it creates. + +:samp:`--progress` + Indicate progress while writing files. + +:samp:`--no-warn` + Suppress writing of warnings to stderr. If warnings were detected and + suppressed, :command:`qpdf` will still exit with exit + code 3. See also :samp:`--warning-exit-0`. + +:samp:`--warning-exit-0` + If warnings are found but no errors, exit with exit code 0 instead 3. + When combined with :samp:`--no-warn`, the effect is + for :command:`qpdf` to completely ignore warnings. + +:samp:`--linearize` + Causes generation of a linearized (web-optimized) output file. + +:samp:`--replace-input` + If specified, the output file name should be omitted. This option + tells qpdf to replace the input file with the output. It does this by + writing to + :file:`{infilename}.~qpdf-temp#` + and, when done, overwriting the input file with the temporary file. + If there were any warnings, the original input is saved as + :file:`{infilename}.~qpdf-orig`. + +:samp:`--copy-encryption=file` + Encrypt the file using the same encryption parameters, including user + and owner password, as the specified file. Use + :samp:`--encryption-file-password` to specify a + password if one is needed to open this file. Note that copying the + encryption parameters from a file also copies the first half of + ``/ID`` from the file since this is part of the encryption + parameters. + +:samp:`--encryption-file-password=password` + If the file specified with :samp:`--copy-encryption` + requires a password, specify the password using this option. Note + that only one of the user or owner password is required. Both + passwords will be preserved since QPDF does not distinguish between + the two passwords. It is possible to preserve encryption parameters, + including the owner password, from a file even if you don't know the + file's owner password. + +:samp:`--allow-weak-crypto` + Starting with version 10.4, qpdf issues warnings when requested to + create files using RC4 encryption. This option suppresses those + warnings. In future versions of qpdf, qpdf will refuse to create + files with weak cryptography when this flag is not given. See :ref:`ref.weak-crypto` for additional details. + +:samp:`--encrypt options --` + Causes generation an encrypted output file. Please see :ref:`ref.encryption-options` for details on how to specify + encryption parameters. + +:samp:`--decrypt` + Removes any encryption on the file. A password must be supplied if + the file is password protected. + +:samp:`--password-is-hex-key` + Overrides the usual computation/retrieval of the PDF file's + encryption key from user/owner password with an explicit + specification of the encryption key. When this option is specified, + the argument to the :samp:`--password` option is + interpreted as a hexadecimal-encoded key value. This only applies to + the password used to open the main input file. It does not apply to + other files opened by :samp:`--pages` or other + options or to files being written. + + Most users will never have a need for this option, and no standard + viewers support this mode of operation, but it can be useful for + forensic or investigatory purposes. For example, if a PDF file is + encrypted with an unknown password, a brute-force attack using the + key directly is sometimes more efficient than one using the password. + Also, if a file is heavily damaged, it may be possible to derive the + encryption key and recover parts of the file using it directly. To + expose the encryption key used by an encrypted file that you can open + normally, use the :samp:`--show-encryption-key` + option. + +:samp:`--suppress-password-recovery` + Ordinarily, qpdf attempts to automatically compensate for passwords + specified in the wrong character encoding. This option suppresses + that behavior. Under normal conditions, there are no reasons to use + this option. See :ref:`ref.unicode-passwords` for a + discussion + +:samp:`--password-mode={mode}` + This option can be used to fine-tune how qpdf interprets Unicode + (non-ASCII) password strings passed on the command line. With the + exception of the :samp:`hex-bytes` mode, these only + apply to passwords provided when encrypting files. The + :samp:`hex-bytes` mode also applies to passwords + specified for reading files. For additional discussion of the + supported password modes and when you might want to use them, see + :ref:`ref.unicode-passwords`. The following modes + are supported: + + - :samp:`auto`: Automatically determine whether the + specified password is a properly encoded Unicode (UTF-8) string, + and transcode it as required by the PDF spec based on the type + encryption being applied. On Windows starting with version 8.4.0, + and on almost all other modern platforms, incoming passwords will + be properly encoded in UTF-8, so this is almost always what you + want. + + - :samp:`unicode`: Tells qpdf that the incoming + password is UTF-8, overriding whatever its automatic detection + determines. The only difference between this mode and + :samp:`auto` is that qpdf will fail with an error + message if the password is not valid UTF-8 instead of falling back + to :samp:`bytes` mode with a warning. + + - :samp:`bytes`: Interpret the password as a literal + byte string. For non-Windows platforms, this is what versions of + qpdf prior to 8.4.0 did. For Windows platforms, there is no way to + specify strings of binary data on the command line directly, but + you can use the :samp:`@filename` option to do it, + in which case this option forces qpdf to respect the string of + bytes as provided. This option will allow you to encrypt PDF files + with passwords that will not be usable by other readers. + + - :samp:`hex-bytes`: Interpret the password as a + hex-encoded string. This provides a way to pass binary data as a + password on all platforms including Windows. As with + :samp:`bytes`, this option may allow creation of + files that can't be opened by other readers. This mode affects + qpdf's interpretation of passwords specified for decrypting files + as well as for encrypting them. It makes it possible to specify + strings that are encoded in some manner other than the system's + default encoding. + +:samp:`--rotate=[+|-]angle[:page-range]` + Apply rotation to specified pages. The + :samp:`page-range` portion of the option value has + the same format as page ranges in :ref:`ref.page-selection`. If the page range is omitted, the + rotation is applied to all pages. The :samp:`angle` + portion of the parameter may be either 0, 90, 180, or 270. If + preceded by :samp:`+` or :samp:`-`, + the angle is added to or subtracted from the specified pages' + original rotations. This is almost always what you want. Otherwise + the pages' rotations are set to the exact value, which may cause the + appearances of the pages to be inconsistent, especially for scans. + For example, the command :command:`qpdf in.pdf out.pdf + --rotate=+90:2,4,6 --rotate=180:7-8` would rotate pages + 2, 4, and 6 90 degrees clockwise from their original rotation and + force the rotation of pages 7 through 8 to 180 degrees regardless of + their original rotation, and the command :command:`qpdf in.pdf + out.pdf --rotate=+180` would rotate all pages by 180 + degrees. + +:samp:`--keep-files-open={[yn]}` + This option controls whether qpdf keeps individual files open while + merging. Prior to version 8.1.0, qpdf always kept all files open, but + this meant that the number of files that could be merged was limited + by the operating system's open file limit. Version 8.1.0 opened files + as they were referenced and closed them after each read, but this + caused a major performance impact. Version 8.2.0 optimized the + performance but did so in a way that, for local file systems, there + was a small but unavoidable performance hit, but for networked file + systems, the performance impact could be very high. Starting with + version 8.2.1, the default behavior is that files are kept open if no + more than 200 files are specified, but this default behavior can be + explicitly overridden with the + :samp:`--keep-files-open` flag. If you are merging + more than 200 files but less than the operating system's max open + files limit, you may want to use + :samp:`--keep-files-open=y`, especially if working + over a networked file system. If you are using a local file system + where the overhead is low and you might sometimes merge more than the + OS limit's number of files from a script and are not worried about a + few seconds additional processing time, you may want to specify + :samp:`--keep-files-open=n`. The threshold for + switching may be changed from the default 200 with the + :samp:`--keep-files-open-threshold` option. + +:samp:`--keep-files-open-threshold={count}` + If specified, overrides the default value of 200 used as the + threshold for qpdf deciding whether or not to keep files open. See + :samp:`--keep-files-open` for details. + +:samp:`--pages options --` + Select specific pages from one or more input files. See :ref:`ref.page-selection` for details on how to do + page selection (splitting and merging). + +:samp:`--collate={n}` + When specified, collate rather than concatenate pages from files + specified with :samp:`--pages`. With a numeric + argument, collate in groups of :samp:`{n}`. + The default is 1. See :ref:`ref.page-selection` for additional details. + +:samp:`--flatten-rotation` + For each page that is rotated using the ``/Rotate`` key in the page's + dictionary, remove the ``/Rotate`` key and implement the identical + rotation semantics by modifying the page's contents. This option can + be useful to prepare files for buggy PDF applications that don't + properly handle rotated pages. + +:samp:`--split-pages=[n]` + Write each group of :samp:`n` pages to a separate + output file. If :samp:`n` is not specified, create + single pages. Output file names are generated as follows: + + - If the string ``%d`` appears in the output file name, it is + replaced with a range of zero-padded page numbers starting from 1. + + - Otherwise, if the output file name ends in + :file:`.pdf` (case insensitive), a zero-padded + page range, preceded by a dash, is inserted before the file + extension. + + - Otherwise, the file name is appended with a zero-padded page range + preceded by a dash. + + Page ranges are a single number in the case of single-page groups or + two numbers separated by a dash otherwise. For example, if + :file:`infile.pdf` has 12 pages + + - :command:`qpdf --split-pages infile.pdf %d-out` + would generate files :file:`01-out` through + :file:`12-out` + + - :command:`qpdf --split-pages=2 infile.pdf + outfile.pdf` would generate files + :file:`outfile-01-02.pdf` through + :file:`outfile-11-12.pdf` + + - :command:`qpdf --split-pages infile.pdf + something.else` would generate files + :file:`something.else-01` through + :file:`something.else-12` + + Note that outlines, threads, and other global features of the + original PDF file are not preserved. For each page of output, this + option creates an empty PDF and copies a single page from the output + into it. If you require the global data, you will have to run + :command:`qpdf` with the + :samp:`--pages` option once for each file. Using + :samp:`--split-pages` is much faster if you don't + require the global data. + +:samp:`--overlay options --` + Overlay pages from another file onto the output pages. See :ref:`ref.overlay-underlay` for details on + overlay/underlay. + +:samp:`--underlay options --` + Overlay pages from another file onto the output pages. See :ref:`ref.overlay-underlay` for details on + overlay/underlay. + +Password-protected files may be opened by specifying a password. By +default, qpdf will preserve any encryption data associated with a file. +If :samp:`--decrypt` is specified, qpdf will attempt to +remove any encryption information. If :samp:`--encrypt` +is specified, qpdf will replace the document's encryption parameters +with whatever is specified. + +Note that qpdf does not obey encryption restrictions already imposed on +the file. Doing so would be meaningless since qpdf can be used to remove +encryption from the file entirely. This functionality is not intended to +be used for bypassing copyright restrictions or other restrictions +placed on files by their producers. + +Prior to 8.4.0, in the case of passwords that contain characters that +fall outside of 7-bit US-ASCII, qpdf left the burden of supplying +properly encoded encryption and decryption passwords to the user. +Starting in qpdf 8.4.0, qpdf does this automatically in most cases. For +an in-depth discussion, please see :ref:`ref.unicode-passwords`. Previous versions of this manual +described workarounds using the :command:`iconv` command. +Such workarounds are no longer required or recommended with qpdf 8.4.0. +However, for backward compatibility, qpdf attempts to detect those +workarounds and do the right thing in most cases. + +.. _ref.encryption-options: + +Encryption Options +------------------ + +To change the encryption parameters of a file, use the --encrypt flag. +The syntax is + +:: + + --encrypt user-password owner-password key-length [ restrictions ] -- + +Note that ":samp:`--`" terminates parsing of encryption +flags and must be present even if no restrictions are present. + +Either or both of the user password and the owner password may be empty +strings. Starting in qpdf 10.2, qpdf defaults to not allowing creation +of PDF files with a non-empty user password, an empty owner password, +and a 256-bit key since such files can be opened with no password. If +you want to create such files, specify the encryption option +:samp:`--allow-insecure`, as described below. + +The value for +:samp:`{key-length}` may +be 40, 128, or 256. The restriction flags are dependent upon key length. +When no additional restrictions are given, the default is to be fully +permissive. + +If :samp:`{key-length}` +is 40, the following restriction options are available: + +:samp:`--print=[yn]` + Determines whether or not to allow printing. + +:samp:`--modify=[yn]` + Determines whether or not to allow document modification. + +:samp:`--extract=[yn]` + Determines whether or not to allow text/image extraction. + +:samp:`--annotate=[yn]` + Determines whether or not to allow comments and form fill-in and + signing. + +If :samp:`{key-length}` +is 128, the following restriction options are available: + +:samp:`--accessibility=[yn]` + Determines whether or not to allow accessibility to visually + impaired. The qpdf library disregards this field when AES is used or + when 256-bit encryption is used. You should really never disable + accessibility, but qpdf lets you do it in case you need to configure + a file this way for testing purposes. The PDF spec says that + conforming readers should disregard this permission and always allow + accessibility. + +:samp:`--extract=[yn]` + Determines whether or not to allow text/graphic extraction. + +:samp:`--assemble=[yn]` + Determines whether document assembly (rotation and reordering of + pages) is allowed. + +:samp:`--annotate=[yn]` + Determines whether modifying annotations is allowed. This includes + adding comments and filling in form fields. Also allows editing of + form fields if :samp:`--modify-other=y` is given. + +:samp:`--form=[yn]` + Determines whether filling form fields is allowed. + +:samp:`--modify-other=[yn]` + Allow all document editing except those controlled separately by the + :samp:`--assemble`, + :samp:`--annotate`, and + :samp:`--form` options. + +:samp:`--print={print-opt}` + Controls printing access. + :samp:`{print-opt}` + may be one of the following: + + - :samp:`full`: allow full printing + + - :samp:`low`: allow low-resolution printing only + + - :samp:`none`: disallow printing + +:samp:`--modify={modify-opt}` + Controls modify access. This way of controlling modify access has + less granularity than new options added in qpdf 8.4. + :samp:`{modify-opt}` + may be one of the following: + + - :samp:`all`: allow full document modification + + - :samp:`annotate`: allow comment authoring, form + operations, and document assembly + + - :samp:`form`: allow form field fill-in and signing + and document assembly + + - :samp:`assembly`: allow document assembly only + + - :samp:`none`: allow no modifications + + Using the :samp:`--modify` option does not allow you + to create certain combinations of permissions such as allowing form + filling but not allowing document assembly. Starting with qpdf 8.4, + you can either just use the other options to control fields + individually, or you can use something like :samp:`--modify=form + --assembly=n` to fine tune. + +:samp:`--cleartext-metadata` + If specified, any metadata stream in the document will be left + unencrypted even if the rest of the document is encrypted. This also + forces the PDF version to be at least 1.5. + +:samp:`--use-aes=[yn]` + If :samp:`--use-aes=y` is specified, AES encryption + will be used instead of RC4 encryption. This forces the PDF version + to be at least 1.6. + +:samp:`--allow-insecure` + From qpdf 10.2, qpdf defaults to not allowing creation of PDF files + where the user password is non-empty, the owner password is empty, + and a 256-bit key is in use. Files created in this way are insecure + since they can be opened without a password. Users would ordinarily + never want to create such files. If you are using qpdf to + intentionally created strange files for testing (a definite valid use + of qpdf!), this option allows you to create such insecure files. + +:samp:`--force-V4` + Use of this option forces the ``/V`` and ``/R`` parameters in the + document's encryption dictionary to be set to the value ``4``. As + qpdf will automatically do this when required, there is no reason to + ever use this option. It exists primarily for use in testing qpdf + itself. This option also forces the PDF version to be at least 1.5. + +If :samp:`{key-length}` +is 256, the minimum PDF version is 1.7 with extension level 8, and the +AES-based encryption format used is the PDF 2.0 encryption method +supported by Acrobat X. the same options are available as with 128 bits +with the following exceptions: + +:samp:`--use-aes` + This option is not available with 256-bit keys. AES is always used + with 256-bit encryption keys. + +:samp:`--force-V4` + This option is not available with 256 keys. + +:samp:`--force-R5` + If specified, qpdf sets the minimum version to 1.7 at extension level + 3 and writes the deprecated encryption format used by Acrobat version + IX. This option should not be used in practice to generate PDF files + that will be in general use, but it can be useful to generate files + if you are trying to test proper support in another application for + PDF files encrypted in this way. + +The default for each permission option is to be fully permissive. + +.. _ref.page-selection: + +Page Selection Options +---------------------- + +Starting with qpdf 3.0, it is possible to split and merge PDF files by +selecting pages from one or more input files. Whatever file is given as +the primary input file is used as the starting point, but its pages are +replaced with pages as specified. + +:: + + --pages input-file [ --password=password ] [ page-range ] [ ... ] -- + +Multiple input files may be specified. Each one is given as the name of +the input file, an optional password (if required to open the file), and +the range of pages. Note that ":samp:`--`" terminates +parsing of page selection flags. + +Starting with qpf 8.4, the special input file name +":file:`.`" can be used as a shortcut for the +primary input filename. + +For each file that pages should be taken from, specify the file, a +password needed to open the file (if any), and a page range. The +password needs to be given only once per file. If any of the input files +are the same as the primary input file or the file used to copy +encryption parameters (if specified), you do not need to repeat the +password here. The same file can be repeated multiple times. If a file +that is repeated has a password, the password only has to be given the +first time. All non-page data (info, outlines, page numbers, etc.) are +taken from the primary input file. To discard these, use +:samp:`--empty` as the primary input. + +Starting with qpdf 5.0.0, it is possible to omit the page range. If qpdf +sees a value in the place where it expects a page range and that value +is not a valid range but is a valid file name, qpdf will implicitly use +the range ``1-z``, meaning that it will include all pages in the file. +This makes it possible to easily combine all pages in a set of files +with a command like :command:`qpdf --empty out.pdf --pages \*.pdf +--`. + +The page range is a set of numbers separated by commas, ranges of +numbers separated dashes, or combinations of those. The character "z" +represents the last page. A number preceded by an "r" indicates to count +from the end, so ``r3-r1`` would be the last three pages of the +document. Pages can appear in any order. Ranges can appear with a high +number followed by a low number, which causes the pages to appear in +reverse. Numbers may be repeated in a page range. A page range may be +optionally appended with ``:even`` or ``:odd`` to indicate only the even +or odd pages in the given range. Note that even and odd refer to the +positions within the specified, range, not whether the original number +is even or odd. + +Example page ranges: + +- ``1,3,5-9,15-12``: pages 1, 3, 5, 6, 7, 8, 9, 15, 14, 13, and 12 in + that order. + +- ``z-1``: all pages in the document in reverse + +- ``r3-r1``: the last three pages of the document + +- ``r1-r3``: the last three pages of the document in reverse order + +- ``1-20:even``: even pages from 2 to 20 + +- ``5,7-9,12:odd``: pages 5, 8, and, 12, which are the pages in odd + positions from among the original range, which represents pages 5, 7, + 8, 9, and 12. + +Starting in qpdf version 8.3, you can specify the +:samp:`--collate` option. Note that this option is +specified outside of :samp:`--pages ... --`. When +:samp:`--collate` is specified, it changes the meaning +of :samp:`--pages` so that the specified files, as +modified by page ranges, are collated rather than concatenated. For +example, if you add the files :file:`odd.pdf` and +:file:`even.pdf` containing odd and even pages of a +document respectively, you could run :command:`qpdf --collate odd.pdf +--pages odd.pdf even.pdf -- all.pdf` to collate the pages. +This would pick page 1 from odd, page 1 from even, page 2 from odd, page +2 from even, etc. until all pages have been included. Any number of +files and page ranges can be specified. If any file has fewer pages, +that file is just skipped when its pages have all been included. For +example, if you ran :command:`qpdf --collate --empty --pages a.pdf +1-5 b.pdf 6-4 c.pdf r1 -- out.pdf`, you would get the +following pages in this order: + +- a.pdf page 1 + +- b.pdf page 6 + +- c.pdf last page + +- a.pdf page 2 + +- b.pdf page 5 + +- a.pdf page 3 + +- b.pdf page 4 + +- a.pdf page 4 + +- a.pdf page 5 + +Starting in qpdf version 10.2, you may specify a numeric argument to +:samp:`--collate`. With +:samp:`--collate={n}`, +pull groups of :samp:`{n}` pages from each file, +again, stopping when there are no more pages. For example, if you ran +:command:`qpdf --collate=2 --empty --pages a.pdf 1-5 b.pdf 6-4 c.pdf +r1 -- out.pdf`, you would get the following pages in this +order: + +- a.pdf page 1 + +- a.pdf page 2 + +- b.pdf page 6 + +- b.pdf page 5 + +- c.pdf last page + +- a.pdf page 3 + +- a.pdf page 4 + +- b.pdf page 4 + +- a.pdf page 5 + +Starting in qpdf version 8.3, when you split and merge files, any page +labels (page numbers) are preserved in the final file. It is expected +that more document features will be preserved by splitting and merging. +In the mean time, semantics of splitting and merging vary across +features. For example, the document's outlines (bookmarks) point to +actual page objects, so if you select some pages and not others, +bookmarks that point to pages that are in the output file will work, and +remaining bookmarks will not work. A future version of +:command:`qpdf` may do a better job at handling these +issues. (Note that the qpdf library already contains all of the APIs +required in order to implement this in your own application if you need +it.) In the mean time, you can always use +:samp:`--empty` as the primary input file to avoid +copying all of that from the first file. For example, to take pages 1 +through 5 from a :file:`infile.pdf` while preserving +all metadata associated with that file, you could use + +:: + + qpdf infile.pdf --pages . 1-5 -- outfile.pdf + +If you wanted pages 1 through 5 from +:file:`infile.pdf` but you wanted the rest of the +metadata to be dropped, you could instead run + +:: + + qpdf --empty --pages infile.pdf 1-5 -- outfile.pdf + +If you wanted to take pages 1 through 5 from +:file:`file1.pdf` and pages 11 through 15 from +:file:`file2.pdf` in reverse, taking document-level +metadata from :file:`file2.pdf`, you would run + +:: + + qpdf file2.pdf --pages file1.pdf 1-5 . 15-11 -- outfile.pdf + +If, for some reason, you wanted to take the first page of an encrypted +file called :file:`encrypted.pdf` with password +``pass`` and repeat it twice in an output file, and if you wanted to +drop document-level metadata but preserve encryption, you would use + +:: + + qpdf --empty --copy-encryption=encrypted.pdf --encryption-file-password=pass + --pages encrypted.pdf --password=pass 1 ./encrypted.pdf --password=pass 1 -- + outfile.pdf + +Note that we had to specify the password all three times because giving +a password as :samp:`--encryption-file-password` doesn't +count for page selection, and as far as qpdf is concerned, +:file:`encrypted.pdf` and +:file:`./encrypted.pdf` are separated files. These +are all corner cases that most users should hopefully never have to be +bothered with. + +Prior to version 8.4, it was not possible to specify the same page from +the same file directly more than once, and the workaround of specifying +the same file in more than one way was required. Version 8.4 removes +this limitation, but there is still a valid use case. When you specify +the same page from the same file more than once, qpdf will share objects +between the pages. If you are going to do further manipulation on the +file and need the two instances of the same original page to be deep +copies, then you can specify the file in two different ways. For example +:command:`qpdf in.pdf --pages . 1 ./in.pdf 1 -- out.pdf` +would create a file with two copies of the first page of the input, and +the two copies would share any objects in common. This includes fonts, +images, and anything else the page references. + +.. _ref.overlay-underlay: + +Overlay and Underlay Options +---------------------------- + +Starting with qpdf 8.4, it is possible to overlay or underlay pages from +other files onto the output generated by qpdf. Specify overlay or +underlay as follows: + +:: + + { --overlay | --underlay } file [ options ] -- + +Overlay and underlay options are processed late, so they can be combined +with other like merging and will apply to the final output. The +:samp:`--overlay` and :samp:`--underlay` +options work the same way, except underlay pages are drawn underneath +the page to which they are applied, possibly obscured by the original +page, and overlay files are drawn on top of the page to which they are +applied, possibly obscuring the page. You can combine overlay and +underlay. + +The default behavior of overlay and underlay is that pages are taken +from the overlay/underlay file in sequence and applied to corresponding +pages in the output until there are no more output pages. If the overlay +or underlay file runs out of pages, remaining output pages are left +alone. This behavior can be modified by options, which are provided +between the :samp:`--overlay` or +:samp:`--underlay` flag and the +:samp:`--` option. The following options are supported: + +- :samp:`--password=password`: supply a password if the + overlay/underlay file is encrypted. + +- :samp:`--to=page-range`: a range of pages in the same + form at described in :ref:`ref.page-selection` + indicates which pages in the output should have the overlay/underlay + applied. If not specified, overlay/underlay are applied to all pages. + +- :samp:`--from=[page-range]`: a range of pages that + specifies which pages in the overlay/underlay file will be used for + overlay or underlay. If not specified, all pages will be used. This + can be explicitly specified to be empty if + :samp:`--repeat` is used. + +- :samp:`--repeat=page-range`: an optional range of + pages that specifies which pages in the overlay/underlay file will be + repeated after the "from" pages are used up. If you want to repeat a + range of pages starting at the beginning, you can explicitly use + :samp:`--from=`. + +Here are some examples. + +- :command:`--overlay o.pdf --to=1-5 --from=1-3 --repeat=4 + --`: overlay the first three pages from file + :file:`o.pdf` onto the first three pages of the + output, then overlay page 4 from :file:`o.pdf` + onto pages 4 and 5 of the output. Leave remaining output pages + untouched. + +- :command:`--underlay footer.pdf --from= --repeat=1,2 + --`: Underlay page 1 of + :file:`footer.pdf` on all odd output pages, and + underlay page 2 of :file:`footer.pdf` on all even + output pages. + +.. _ref.attachments: + +Embedded Files/Attachments Options +---------------------------------- + +Starting with qpdf 10.2, you can work with file attachments in PDF files +from the command line. The following options are available: + +:samp:`--list-attachments` + Show the "key" and stream number for embedded files. With + :samp:`--verbose`, additional information, including + preferred file name, description, dates, and more are also displayed. + The key is usually but not always equal to the file name, and is + needed by some of the other options. + +:samp:`--show-attachment={key}` + Write the contents of the specified attachment to standard output as + binary data. The key should match one of the keys shown by + :samp:`--list-attachments`. If specified multiple + times, only the last attachment will be shown. + +:samp:`--add-attachment {file} {options} --` + Add or replace an attachment with the contents of + :samp:`{file}`. This may be specified more + than once. The following additional options may appear before the + ``--`` that ends this option: + + :samp:`--key={key}` + The key to use to register the attachment in the embedded files + table. Defaults to the last path element of + :samp:`{file}`. + + :samp:`--filename={name}` + The file name to be used for the attachment. This is what is + usually displayed to the user and is the name most graphical PDF + viewers will use when saving a file. It defaults to the last path + element of :samp:`{file}`. + + :samp:`--creationdate={date}` + The attachment's creation date in PDF format; defaults to the + current time. The date format is explained below. + + :samp:`--moddate={date}` + The attachment's modification date in PDF format; defaults to the + current time. The date format is explained below. + + :samp:`--mimetype={type/subtype}` + The mime type for the attachment, e.g. ``text/plain`` or + ``application/pdf``. Note that the mimetype appears in a field + called ``/Subtype`` in the PDF but actually includes the full type + and subtype of the mime type. + + :samp:`--description={"text"}` + Descriptive text for the attachment, displayed by some PDF + viewers. + + :samp:`--replace` + Indicates that any existing attachment with the same key should be + replaced by the new attachment. Otherwise, + :command:`qpdf` gives an error if an attachment + with that key is already present. + +:samp:`--remove-attachment={key}` + Remove the specified attachment. This doesn't only remove the + attachment from the embedded files table but also clears out the file + specification. That means that any potential internal links to the + attachment will be broken. This option may be specified multiple + times. Run with :samp:`--verbose` to see status of + the removal. + +:samp:`--copy-attachments-from {file} {options} --` + Copy attachments from another file. This may be specified more than + once. The following additional options may appear before the ``--`` + that ends this option: + + :samp:`--password={password}` + If required, the password needed to open + :samp:`{file}` + + :samp:`--prefix={prefix}` + Only required if the file from which attachments are being copied + has attachments with keys that conflict with attachments already + in the file. In this case, the specified prefix will be prepended + to each key. This affects only the key in the embedded files + table, not the file name. The PDF specification doesn't preclude + multiple attachments having the same file name. + +When a date is required, the date should conform to the PDF date format +specification, which is +``D:``\ :samp:`{yyyymmddhhmmss<z>}`, where +:samp:`{<z>}` is either ``Z`` for UTC or a +timezone offset in the form :samp:`{-hh'mm'}` or +:samp:`{+hh'mm'}`. Examples: +``D:20210207161528-05'00'``, ``D:20210207211528Z``. + +.. _ref.advanced-parsing: + +Advanced Parsing Options +------------------------ + +These options control aspects of how qpdf reads PDF files. Mostly these +are of use to people who are working with damaged files. There is little +reason to use these options unless you are trying to solve specific +problems. The following options are available: + +:samp:`--suppress-recovery` + Prevents qpdf from attempting to recover damaged files. + +:samp:`--ignore-xref-streams` + Tells qpdf to ignore any cross-reference streams. + +Ordinarily, qpdf will attempt to recover from certain types of errors in +PDF files. These include errors in the cross-reference table, certain +types of object numbering errors, and certain types of stream length +errors. Sometimes, qpdf may think it has recovered but may not have +actually recovered, so care should be taken when using this option as +some data loss is possible. The +:samp:`--suppress-recovery` option will prevent qpdf +from attempting recovery. In this case, it will fail on the first error +that it encounters. + +Ordinarily, qpdf reads cross-reference streams when they are present in +a PDF file. If :samp:`--ignore-xref-streams` is +specified, qpdf will ignore any cross-reference streams for hybrid PDF +files. The purpose of hybrid files is to make some content available to +viewers that are not aware of cross-reference streams. It is almost +never desirable to ignore them. The only time when you might want to use +this feature is if you are testing creation of hybrid PDF files and wish +to see how a PDF consumer that doesn't understand object and +cross-reference streams would interpret such a file. + +.. _ref.advanced-transformation: + +Advanced Transformation Options +------------------------------- + +These transformation options control fine points of how qpdf creates the +output file. Mostly these are of use only to people who are very +familiar with the PDF file format or who are PDF developers. The +following options are available: + +:samp:`--compress-streams={[yn]}` + By default, or with :samp:`--compress-streams=y`, + qpdf will compress any stream with no other filters applied to it + with the ``/FlateDecode`` filter when it writes it. To suppress this + behavior and preserve uncompressed streams as uncompressed, use + :samp:`--compress-streams=n`. + +:samp:`--decode-level={option}` + Controls which streams qpdf tries to decode. The default is + :samp:`generalized`. The following options are + available: + + - :samp:`none`: do not attempt to decode any streams + + - :samp:`generalized`: decode streams filtered with + supported generalized filters: ``/LZWDecode``, ``/FlateDecode``, + ``/ASCII85Decode``, and ``/ASCIIHexDecode``. We define generalized + filters as those to be used for general-purpose compression or + encoding, as opposed to filters specifically designed for image + data. Note that, by default, streams already compressed with + ``/FlateDecode`` are not uncompressed and recompressed unless you + also specify :samp:`--recompress-flate`. + + - :samp:`specialized`: in addition to generalized, + decode streams with supported non-lossy specialized filters; + currently this is just ``/RunLengthDecode`` + + - :samp:`all`: in addition to generalized and + specialized, decode streams with supported lossy filters; + currently this is just ``/DCTDecode`` (JPEG) + +:samp:`--stream-data={option}` + Controls transformation of stream data. This option predates the + :samp:`--compress-streams` and + :samp:`--decode-level` options. Those options can be + used to achieve the same affect with more control. The value of + :samp:`{option}` may + be one of the following: + + - :samp:`compress`: recompress stream data when + possible (default); equivalent to + :samp:`--compress-streams=y` + :samp:`--decode-level=generalized`. Does not + recompress streams already compressed with ``/FlateDecode`` unless + :samp:`--recompress-flate` is also specified. + + - :samp:`preserve`: leave all stream data as is; + equivalent to :samp:`--compress-streams=n` + :samp:`--decode-level=none` + + - :samp:`uncompress`: uncompress stream data + compressed with generalized filters when possible; equivalent to + :samp:`--compress-streams=n` + :samp:`--decode-level=generalized` + +:samp:`--recompress-flate` + By default, streams already compressed with ``/FlateDecode`` are left + alone rather than being uncompressed and recompressed. This option + causes qpdf to uncompress and recompress the streams. There is a + significant performance cost to using this option, but you probably + want to use it if you specify + :samp:`--compression-level`. + +:samp:`--compression-level={level}` + When writing new streams that are compressed with ``/FlateDecode``, + use the specified compression level. The value of + :samp:`level` should be a number from 1 to 9 and is + passed directly to zlib, which implements deflate compression. Note + that qpdf doesn't uncompress and recompress streams by default. To + have this option apply to already compressed streams, you should also + specify :samp:`--recompress-flate`. If your goal is + to shrink the size of PDF files, you should also use + :samp:`--object-streams=generate`. + +:samp:`--normalize-content=[yn]` + Enables or disables normalization of content streams. Content + normalization is enabled by default in QDF mode. Please see :ref:`ref.qdf` for additional discussion of QDF mode. + +:samp:`--object-streams={mode}` + Controls handling of object streams. The value of + :samp:`{mode}` may be + one of the following: + + - :samp:`preserve`: preserve original object streams + (default) + + - :samp:`disable`: don't write any object streams + + - :samp:`generate`: use object streams wherever + possible + +:samp:`--preserve-unreferenced` + Tells qpdf to preserve objects that are not referenced when writing + the file. Ordinarily any object that is not referenced in a traversal + of the document from the trailer dictionary will be discarded. This + may be useful in working with some damaged files or inspecting files + with known unreferenced objects. + + This flag is ignored for linearized files and has the effect of + causing objects in the new file to be written in order by object ID + from the original file. This does not mean that object numbers will + be the same since qpdf may create stream lengths as direct or + indirect differently from the original file, and the original file + may have gaps in its numbering. + + See also :samp:`--preserve-unreferenced-resources`, + which does something completely different. + +:samp:`--remove-unreferenced-resources={option}` + The :samp:`{option}` may be ``auto``, + ``yes``, or ``no``. The default is ``auto``. + + Starting with qpdf 8.1, when splitting pages, qpdf is able to attempt + to remove images and fonts that are not used by a page even if they + are referenced in the page's resources dictionary. When shared + resources are in use, this behavior can greatly reduce the file sizes + of split pages, but the analysis is very slow. In versions from 8.1 + through 9.1.1, qpdf did this analysis by default. Starting in qpdf + 10.0.0, if ``auto`` is used, qpdf does a quick analysis of the file + to determine whether the file is likely to have unreferenced objects + on pages, a pattern that frequently occurs when resource dictionaries + are shared across multiple pages and rarely occurs otherwise. If it + discovers this pattern, then it will attempt to remove unreferenced + resources. Usually this means you get the slower splitting speed only + when it's actually going to create smaller files. You can suppress + removal of unreferenced resources altogether by specifying ``no`` or + force it to do the full algorithm by specifying ``yes``. + + Other than cases in which you don't care about file size and care a + lot about runtime, there are few reasons to use this option, + especially now that ``auto`` mode is supported. One reason to use + this is if you suspect that qpdf is removing resources it shouldn't + be removing. If you encounter that case, please report it as bug at + https://github.com/qpdf/qpdf/issues/. + +:samp:`--preserve-unreferenced-resources` + This is a synonym for + :samp:`--remove-unreferenced-resources=no`. + + See also :samp:`--preserve-unreferenced`, which does + something completely different. + +:samp:`--newline-before-endstream` + Tells qpdf to insert a newline before the ``endstream`` keyword, not + counted in the length, after any stream content even if the last + character of the stream was a newline. This may result in two + newlines in some cases. This is a requirement of PDF/A. While qpdf + doesn't specifically know how to generate PDF/A-compliant PDFs, this + at least prevents it from removing compliance on already compliant + files. + +:samp:`--linearize-pass1={file}` + Write the first pass of linearization to the named file. The + resulting file is not a valid PDF file. This option is useful only + for debugging ``QPDFWriter``'s linearization code. When qpdf + linearizes files, it writes the file in two passes, using the first + pass to calculate sizes and offsets that are required for hint tables + and the linearization dictionary. Ordinarily, the first pass is + discarded. This option enables it to be captured. + +:samp:`--coalesce-contents` + When a page's contents are split across multiple streams, this option + causes qpdf to combine them into a single stream. Use of this option + is never necessary for ordinary usage, but it can help when working + with some files in some cases. For example, this can also be combined + with QDF mode or content normalization to make it easier to look at + all of a page's contents at once. + +:samp:`--flatten-annotations={option}` + This option collapses annotations into the pages' contents with + special handling for form fields. Ordinarily, an annotation is + rendered separately and on top of the page. Combining annotations + into the page's contents effectively freezes the placement of the + annotations, making them look right after various page + transformations. The library functionality backing this option was + added for the benefit of programs that want to create *n-up* page + layouts and other similar things that don't work well with + annotations. The :samp:`{option}` parameter + may be any of the following: + + - :samp:`all`: include all annotations that are not + marked invisible or hidden + + - :samp:`print`: only include annotations that + indicate that they should appear when the page is printed + + - :samp:`screen`: omit annotations that indicate + they should not appear on the screen + + Note that form fields are special because the annotations that are + used to render filled-in form fields may become out of date from the + fields' values if the form is filled in by a program that doesn't + know how to update the appearances. If qpdf detects this case, its + default behavior is not to flatten those annotations because doing so + would cause the value of the form field to be lost. This gives you a + chance to go back and resave the form with a program that knows how + to generate appearances. QPDF itself can generate appearances with + some limitations. See the + :samp:`--generate-appearances` option below. + +:samp:`--generate-appearances` + If a file contains interactive form fields and indicates that the + appearances are out of date with the values of the form, this flag + will regenerate appearances, subject to a few limitations. Note that + there is not usually a reason to do this, but it can be necessary + before using the :samp:`--flatten-annotations` + option. Most of these are not a problem with well-behaved PDF files. + The limitations are as follows: + + - Radio button and checkbox appearances use the pre-set values in + the PDF file. QPDF just makes sure that the correct appearance is + displayed based on the value of the field. This is fine for PDF + files that create their forms properly. Some PDF writers save + appearances for fields when they change, which could cause some + controls to have inconsistent appearances. + + - For text fields and list boxes, any characters that fall outside + of US-ASCII or, if detected, "Windows ANSI" or "Mac Roman" + encoding, will be replaced by the ``?`` character. + + - Quadding is ignored. Quadding is used to specify whether the + contents of a field should be left, center, or right aligned with + the field. + + - Rich text, multi-line, and other more elaborate formatting + directives are ignored. + + - There is no support for multi-select fields or signature fields. + + If qpdf doesn't do a good enough job with your form, use an external + application to save your filled-in form before processing it with + qpdf. + +:samp:`--optimize-images` + This flag causes qpdf to recompress all images that are not + compressed with DCT (JPEG) using DCT compression as long as doing so + decreases the size in bytes of the image data and the image does not + fall below minimum specified dimensions. Useful information is + provided when used in combination with + :samp:`--verbose`. See also the + :samp:`--oi-min-width`, + :samp:`--oi-min-height`, and + :samp:`--oi-min-area` options. By default, starting + in qpdf 8.4, inline images are converted to regular images and + optimized as well. Use :samp:`--keep-inline-images` + to prevent inline images from being included. + +:samp:`--oi-min-width={width}` + Avoid optimizing images whose width is below the specified amount. If + omitted, the default is 128 pixels. Use 0 for no minimum. + +:samp:`--oi-min-height={height}` + Avoid optimizing images whose height is below the specified amount. + If omitted, the default is 128 pixels. Use 0 for no minimum. + +:samp:`--oi-min-area={area-in-pixels}` + Avoid optimizing images whose pixel count (width × height) is below + the specified amount. If omitted, the default is 16,384 pixels. Use 0 + for no minimum. + +:samp:`--externalize-inline-images` + Convert inline images to regular images. By default, images whose + data is at least 1,024 bytes are converted when this option is + selected. Use :samp:`--ii-min-bytes` to change the + size threshold. This option is implicitly selected when + :samp:`--optimize-images` is selected. Use + :samp:`--keep-inline-images` to exclude inline images + from image optimization. + +:samp:`--ii-min-bytes={bytes}` + Avoid converting inline images whose size is below the specified + minimum size to regular images. If omitted, the default is 1,024 + bytes. Use 0 for no minimum. + +:samp:`--keep-inline-images` + Prevent inline images from being included in image optimization. This + option has no affect when :samp:`--optimize-images` + is not specified. + +:samp:`--remove-page-labels` + Remove page labels from the output file. + +:samp:`--qdf` + Turns on QDF mode. For additional information on QDF, please see :ref:`ref.qdf`. Note that :samp:`--linearize` + disables QDF mode. + +:samp:`--min-version={version}` + Forces the PDF version of the output file to be at least + :samp:`{version}`. In other words, if the + input file has a lower version than the specified version, the + specified version will be used. If the input file has a higher + version, the input file's original version will be used. It is seldom + necessary to use this option since qpdf will automatically increase + the version as needed when adding features that require newer PDF + readers. + + The version number may be expressed in the form + :samp:`{major.minor.extension-level}`, in + which case the version is interpreted as + :samp:`{major.minor}` at extension level + :samp:`{extension-level}`. For example, + version ``1.7.8`` represents version 1.7 at extension level 8. Note + that minimal syntax checking is done on the command line. + +:samp:`--force-version={version}` + This option forces the PDF version to be the exact version specified + *even when the file may have content that is not supported in that + version*. The version number is interpreted in the same way as with + :samp:`--min-version` so that extension levels can be + set. In some cases, forcing the output file's PDF version to be lower + than that of the input file will cause qpdf to disable certain + features of the document. Specifically, 256-bit keys are disabled if + the version is less than 1.7 with extension level 8 (except R5 is + disabled if less than 1.7 with extension level 3), AES encryption is + disabled if the version is less than 1.6, cleartext metadata and + object streams are disabled if less than 1.5, 128-bit encryption keys + are disabled if less than 1.4, and all encryption is disabled if less + than 1.3. Even with these precautions, qpdf won't be able to do + things like eliminate use of newer image compression schemes, + transparency groups, or other features that may have been added in + more recent versions of PDF. + + As a general rule, with the exception of big structural things like + the use of object streams or AES encryption, PDF viewers are supposed + to ignore features in files that they don't support from newer + versions. This means that forcing the version to a lower version may + make it possible to open your PDF file with an older version, though + bear in mind that some of the original document's functionality may + be lost. + +By default, when a stream is encoded using non-lossy filters that qpdf +understands and is not already compressed using a good compression +scheme, qpdf will uncompress and recompress streams. Assuming proper +filter implements, this is safe and generally results in smaller files. +This behavior may also be explicitly requested with +:samp:`--stream-data=compress`. + +When :samp:`--normalize-content=y` is specified, qpdf +will attempt to normalize whitespace and newlines in page content +streams. This is generally safe but could, in some cases, cause damage +to the content streams. This option is intended for people who wish to +study PDF content streams or to debug PDF content. You should not use +this for "production" PDF files. + +When normalizing content, if qpdf runs into any lexical errors, it will +print a warning indicating that content may be damaged. The only +situation in which qpdf is known to cause damage during content +normalization is when a page's contents are split across multiple +streams and streams are split in the middle of a lexical token such as a +string, name, or inline image. Note that files that do this are invalid +since the PDF specification states that content streams are not to be +split in the middle of a token. If you want to inspect the original +content streams in an uncompressed format, you can always run with +:samp:`--qdf --normalize-content=n` for a QDF file +without content normalization, or alternatively +:samp:`--stream-data=uncompress` for a regular non-QDF +mode file with uncompressed streams. These will both uncompress all the +streams but will not attempt to normalize content. Please note that if +you are using content normalization or QDF mode for the purpose of +manually inspecting files, you don't have to care about this. + +Object streams, also known as compressed objects, were introduced into +the PDF specification at version 1.5, corresponding to Acrobat 6. Some +older PDF viewers may not support files with object streams. qpdf can be +used to transform files with object streams to files without object +streams or vice versa. As mentioned above, there are three object stream +modes: :samp:`preserve`, +:samp:`disable`, and :samp:`generate`. + +In :samp:`preserve` mode, the relationship to objects +and the streams that contain them is preserved from the original file. +In :samp:`disable` mode, all objects are written as +regular, uncompressed objects. The resulting file should be readable by +older PDF viewers. (Of course, the content of the files may include +features not supported by older viewers, but at least the structure will +be supported.) In :samp:`generate` mode, qpdf will +create its own object streams. This will usually result in more compact +PDF files, though they may not be readable by older viewers. In this +mode, qpdf will also make sure the PDF version number in the header is +at least 1.5. + +The :samp:`--qdf` flag turns on QDF mode, which changes +some of the defaults described above. Specifically, in QDF mode, by +default, stream data is uncompressed, content streams are normalized, +and encryption is removed. These defaults can still be overridden by +specifying the appropriate options as described above. Additionally, in +QDF mode, stream lengths are stored as indirect objects, objects are +laid out in a less efficient but more readable fashion, and the +documents are interspersed with comments that make it easier for the +user to find things and also make it possible for +:command:`fix-qdf` to work properly. QDF mode is intended +for people, mostly developers, who wish to inspect or modify PDF files +in a text editor. For details, please see :ref:`ref.qdf`. + +.. _ref.testing-options: + +Testing, Inspection, and Debugging Options +------------------------------------------ + +These options can be useful for digging into PDF files or for use in +automated test suites for software that uses the qpdf library. When any +of the options in this section are specified, no output file should be +given. The following options are available: + +:samp:`--deterministic-id` + Causes generation of a deterministic value for /ID. This prevents use + of timestamp and output file name information in the /ID generation. + Instead, at some slight additional runtime cost, the /ID field is + generated to include a digest of the significant parts of the content + of the output PDF file. This means that a given qpdf operation should + generate the same /ID each time it is run, which can be useful when + caching results or for generation of some test data. Use of this flag + is not compatible with creation of encrypted files. + +:samp:`--static-id` + Causes generation of a fixed value for /ID. This is intended for + testing only. Never use it for production files. If you are trying to + get the same /ID each time for a given file and you are not + generating encrypted files, consider using the + :samp:`--deterministic-id` option. + +:samp:`--static-aes-iv` + Causes use of a static initialization vector for AES-CBC. This is + intended for testing only so that output files can be reproducible. + Never use it for production files. This option in particular is not + secure since it significantly weakens the encryption. + +:samp:`--no-original-object-ids` + Suppresses inclusion of original object ID comments in QDF files. + This can be useful when generating QDF files for test purposes, + particularly when comparing them to determine whether two PDF files + have identical content. + +:samp:`--show-encryption` + Shows document encryption parameters. Also shows the document's user + password if the owner password is given. + +:samp:`--show-encryption-key` + When encryption information is being displayed, as when + :samp:`--check` or + :samp:`--show-encryption` is given, display the + computed or retrieved encryption key as a hexadecimal string. This + value is not ordinarily useful to users, but it can be used as the + argument to :samp:`--password` if the + :samp:`--password-is-hex-key` is specified. Note + that, when PDF files are encrypted, passwords and other metadata are + used only to compute an encryption key, and the encryption key is + what is actually used for encryption. This enables retrieval of that + key. + +:samp:`--check-linearization` + Checks file integrity and linearization status. + +:samp:`--show-linearization` + Checks and displays all data in the linearization hint tables. + +:samp:`--show-xref` + Shows the contents of the cross-reference table in a human-readable + form. This is especially useful for files with cross-reference + streams which are stored in a binary format. + +:samp:`--show-object=trailer|obj[,gen]` + Show the contents of the given object. This is especially useful for + inspecting objects that are inside of object streams (also known as + "compressed objects"). + +:samp:`--raw-stream-data` + When used along with the :samp:`--show-object` + option, if the object is a stream, shows the raw stream data instead + of object's contents. + +:samp:`--filtered-stream-data` + When used along with the :samp:`--show-object` + option, if the object is a stream, shows the filtered stream data + instead of object's contents. If the stream is filtered using filters + that qpdf does not support, an error will be issued. + +:samp:`--show-npages` + Prints the number of pages in the input file on a line by itself. + Since the number of pages appears by itself on a line, this option + can be useful for scripting if you need to know the number of pages + in a file. + +:samp:`--show-pages` + Shows the object and generation number for each page dictionary + object and for each content stream associated with the page. Having + this information makes it more convenient to inspect objects from a + particular page. + +:samp:`--with-images` + When used along with :samp:`--show-pages`, also shows + the object and generation numbers for the image objects on each page. + (At present, information about images in shared resource dictionaries + are not output by this command. This is discussed in a comment in the + source code.) + +:samp:`--json` + Generate a JSON representation of the file. This is described in + depth in :ref:`ref.json` + +:samp:`--json-help` + Describe the format of the JSON output. + +:samp:`--json-key=key` + This option is repeatable. If specified, only top-level keys + specified will be included in the JSON output. If not specified, all + keys will be shown. + +:samp:`--json-object=trailer|obj[,gen]` + This option is repeatable. If specified, only specified objects will + be shown in the "``objects``" key of the JSON output. If absent, all + objects will be shown. + +:samp:`--check` + Checks file structure and well as encryption, linearization, and + encoding of stream data. A file for which + :samp:`--check` reports no errors may still have + errors in stream data content but should otherwise be structurally + sound. If :samp:`--check` any errors, qpdf will exit + with a status of 2. There are some recoverable conditions that + :samp:`--check` detects. These are issued as warnings + instead of errors. If qpdf finds no errors but finds warnings, it + will exit with a status of 3 (as of version 2.0.4). When + :samp:`--check` is combined with other options, + checks are always performed before any other options are processed. + For erroneous files, :samp:`--check` will cause qpdf + to attempt to recover, after which other options are effectively + operating on the recovered file. Combining + :samp:`--check` with other options in this way can be + useful for manually recovering severely damaged files. Note that + :samp:`--check` produces no output to standard output + when everything is valid, so if you are using this to + programmatically validate files in bulk, it is safe to run without + output redirected to :file:`/dev/null` and just + check for a 0 exit code. + +The :samp:`--raw-stream-data` and +:samp:`--filtered-stream-data` options are ignored +unless :samp:`--show-object` is given. Either of these +options will cause the stream data to be written to standard output. In +order to avoid commingling of stream data with other output, it is +recommend that these objects not be combined with other test/inspection +options. + +If :samp:`--filtered-stream-data` is given and +:samp:`--normalize-content=y` is also given, qpdf will +attempt to normalize the stream data as if it is a page content stream. +This attempt will be made even if it is not a page content stream, in +which case it will produce unusable results. + +.. _ref.unicode-passwords: + +Unicode Passwords +----------------- + +At the library API level, all methods that perform encryption and +decryption interpret passwords as strings of bytes. It is up to the +caller to ensure that they are appropriately encoded. Starting with qpdf +version 8.4.0, qpdf will attempt to make this easier for you when +interact with qpdf via its command line interface. The PDF specification +requires passwords used to encrypt files with 40-bit or 128-bit +encryption to be encoded with PDF Doc encoding. This encoding is a +single-byte encoding that supports ISO-Latin-1 and a handful of other +commonly used characters. It has a large overlap with Windows ANSI but +is not exactly the same. There is generally not a way to provide PDF Doc +encoded strings on the command line. As such, qpdf versions prior to +8.4.0 would often create PDF files that couldn't be opened with other +software when given a password with non-ASCII characters to encrypt a +file with 40-bit or 128-bit encryption. Starting with qpdf 8.4.0, qpdf +recognizes the encoding of the parameter and transcodes it as needed. +The rest of this section provides the details about exactly how qpdf +behaves. Most users will not need to know this information, but it might +be useful if you have been working around qpdf's old behavior or if you +are using qpdf to generate encrypted files for testing other PDF +software. + +A note about Windows: when qpdf builds, it attempts to determine what it +has to do to use ``wmain`` instead of ``main`` on Windows. The ``wmain`` +function is an alternative entry point that receives all arguments as +UTF-16-encoded strings. When qpdf starts up this way, it converts all +the strings to UTF-8 encoding and then invokes the regular main. This +means that, as far as qpdf is concerned, it receives its command-line +arguments with UTF-8 encoding, just as it would in any modern Linux or +UNIX environment. + +If a file is being encrypted with 40-bit or 128-bit encryption and the +supplied password is not a valid UTF-8 string, qpdf will fall back to +the behavior of interpreting the password as a string of bytes. If you +have old scripts that encrypt files by passing the output of +:command:`iconv` to qpdf, you no longer need to do that, +but if you do, qpdf should still work. The only exception would be for +the extremely unlikely case of a password that is encoded with a +single-byte encoding but also happens to be valid UTF-8. Such a password +would contain strings of even numbers of characters that alternate +between accented letters and symbols. In the extremely unlikely event +that you are intentionally using such passwords and qpdf is thwarting +you by interpreting them as UTF-8, you can use +:samp:`--password-mode=bytes` to suppress qpdf's +automatic behavior. + +The :samp:`--password-mode` option, as described earlier +in this chapter, can be used to change qpdf's interpretation of supplied +passwords. There are very few reasons to use this option. One would be +the unlikely case described in the previous paragraph in which the +supplied password happens to be valid UTF-8 but isn't supposed to be +UTF-8. Your best bet would be just to provide the password as a valid +UTF-8 string, but you could also use +:samp:`--password-mode=bytes`. Another reason to use +:samp:`--password-mode=bytes` would be to intentionally +generate PDF files encrypted with passwords that are not properly +encoded. The qpdf test suite does this to generate invalid files for the +purpose of testing its password recovery capability. If you were trying +to create intentionally incorrect files for a similar purposes, the +:samp:`bytes` password mode can enable you to do this. + +When qpdf attempts to decrypt a file with a password that contains +non-ASCII characters, it will generate a list of alternative passwords +by attempting to interpret the password as each of a handful of +different coding systems and then transcode them to the required format. +This helps to compensate for the supplied password being given in the +wrong coding system, such as would happen if you used the +:command:`iconv` workaround that was previously needed. +It also generates passwords by doing the reverse operation: translating +from correct in incorrect encoding of the password. This would enable +qpdf to decrypt files using passwords that were improperly encoded by +whatever software encrypted the files, including older versions of qpdf +invoked without properly encoded passwords. The combination of these two +recovery methods should make qpdf transparently open most encrypted +files with the password supplied correctly but in the wrong coding +system. There are no real downsides to this behavior, but if you don't +want qpdf to do this, you can use the +:samp:`--suppress-password-recovery` option. One reason +to do that is to ensure that you know the exact password that was used +to encrypt the file. + +With these changes, qpdf now generates compliant passwords in most +cases. There are still some exceptions. In particular, the PDF +specification directs compliant writers to normalize Unicode passwords +and to perform certain transformations on passwords with bidirectional +text. Implementing this functionality requires using a real Unicode +library like ICU. If a client application that uses qpdf wants to do +this, the qpdf library will accept the resulting passwords, but qpdf +will not perform these transformations itself. It is possible that this +will be addressed in a future version of qpdf. The ``QPDFWriter`` +methods that enable encryption on the output file accept passwords as +strings of bytes. + +Please note that the :samp:`--password-is-hex-key` +option is unrelated to all this. This flag bypasses the normal process +of going from password to encryption string entirely, allowing the raw +encryption key to be specified directly. This is useful for forensic +purposes or for brute-force recovery of files with unknown passwords. |