aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorJay Berkenbilt <ejb@ql.org>2021-12-18 15:01:52 +0100
committerJay Berkenbilt <ejb@ql.org>2021-12-18 17:05:51 +0100
commit10fb619d3e0618528b7ac6c20cad6262020cf947 (patch)
treec893fedff351e809edead840376e8648f1cc28ff
parentf3d1138b8ab64c6a26e1dd5f77a644b19016a30d (diff)
downloadqpdf-10fb619d3e0618528b7ac6c20cad6262020cf947.tar.zst
Split documentation into multiple pages, change theme
-rw-r--r--TODO2
-rw-r--r--manual/acknowledgement.rst14
-rw-r--r--manual/cli.rst1675
-rw-r--r--manual/conf.py5
-rw-r--r--manual/design.rst747
-rw-r--r--manual/index.rst6271
-rw-r--r--manual/installation.rst342
-rw-r--r--manual/json.rst177
-rw-r--r--manual/library.rst91
-rw-r--r--manual/license.rst12
-rw-r--r--manual/linearization.rst197
-rw-r--r--manual/object-streams.rst186
-rw-r--r--manual/overview.rst33
-rw-r--r--manual/qdf.rst96
-rw-r--r--manual/release-notes.rst2643
-rw-r--r--manual/weak-crypto.rst33
16 files changed, 6263 insertions, 6261 deletions
diff --git a/TODO b/TODO
index b41b435c..18746cf6 100644
--- a/TODO
+++ b/TODO
@@ -30,8 +30,6 @@ Before release:
I can do about, and it doesn't seem worth fixing. Maybe mention it
somewhere?
* README-maintainer: Fix installation of documentation to website
-* Get navigation working properly
-* Figure out where to put :ref:`search` so we get doc search
Soon:
diff --git a/manual/acknowledgement.rst b/manual/acknowledgement.rst
new file mode 100644
index 00000000..0fe038e0
--- /dev/null
+++ b/manual/acknowledgement.rst
@@ -0,0 +1,14 @@
+.. _acknowledgments:
+
+Acknowledgment
+==============
+
+QPDF was originally created in 2001 and modified periodically between
+2001 and 2005 during my employment at `Apex CoVantage
+<http://www.apexcovantage.com>`__. Upon my departure from Apex, the
+company graciously allowed me to take ownership of the software and
+continue maintaining it as an open source project, a decision for which I
+am very grateful. I have made considerable enhancements to it since
+that time. I feel fortunate to have worked for people who would make
+such a decision. This work would not have been possible without their
+support.
diff --git a/manual/cli.rst b/manual/cli.rst
new file mode 100644
index 00000000..e8a07f5f
--- /dev/null
+++ b/manual/cli.rst
@@ -0,0 +1,1675 @@
+.. _ref.using:
+
+Running QPDF
+============
+
+This chapter describes how to run the qpdf program from the command
+line.
+
+.. _ref.invocation:
+
+Basic Invocation
+----------------
+
+When running qpdf, the basic invocation is as follows:
+
+::
+
+ qpdf [ options ] { infilename | --empty } outfilename
+
+This converts PDF file :samp:`infilename` to PDF file
+:samp:`outfilename`. The output file is functionally
+identical to the input file but may have been structurally reorganized.
+Also, orphaned objects will be removed from the file. Many
+transformations are available as controlled by the options below. In
+place of :samp:`infilename`, the parameter
+:samp:`--empty` may be specified. This causes qpdf to
+use a dummy input file that contains zero pages. The only normal use
+case for using :samp:`--empty` would be if you were
+going to add pages from another source, as discussed in :ref:`ref.page-selection`.
+
+If :samp:`@filename` appears as a word anywhere in the
+command-line, it will be read line by line, and each line will be
+treated as a command-line argument. Leading and trailing whitespace is
+intentionally not removed from lines, which makes it possible to handle
+arguments that start or end with spaces. The :samp:`@-`
+option allows arguments to be read from standard input. This allows qpdf
+to be invoked with an arbitrary number of arbitrarily long arguments. It
+is also very useful for avoiding having to pass passwords on the command
+line. Note that the :samp:`@filename` can't appear in
+the middle of an argument, so constructs such as
+:samp:`--arg=@option` will not work. You would have to
+include the argument and its options together in the arguments file.
+
+:samp:`outfilename` does not have to be seekable, even
+when generating linearized files. Specifying ":samp:`-`"
+as :samp:`outfilename` means to write to standard
+output. If you want to overwrite the input file with the output, use the
+option :samp:`--replace-input` and omit the output file
+name. You can't specify the same file as both the input and the output.
+If you do this, qpdf will tell you about the
+:samp:`--replace-input` option.
+
+Most options require an output file, but some testing or inspection
+commands do not. These are specifically noted.
+
+.. _ref.exit-status:
+
+Exit Status
+~~~~~~~~~~~
+
+The exit status of :command:`qpdf` may be interpreted as
+follows:
+
+- ``0``: no errors or warnings were found. The file may still have
+ problems qpdf can't detect. If
+ :samp:`--warning-exit-0` was specified, exit status 0
+ is used even if there are warnings.
+
+- ``2``: errors were found. qpdf was not able to fully process the
+ file.
+
+- ``3``: qpdf encountered problems that it was able to recover from. In
+ some cases, the resulting file may still be damaged. Note that qpdf
+ still exits with status ``3`` if it finds warnings even when
+ :samp:`--no-warn` is specified. With
+ :samp:`--warning-exit-0`, warnings without errors
+ exit with status 0 instead of 3.
+
+Note that :command:`qpdf` never exists with status ``1``.
+If you get an exit status of ``1``, it was something else, like the
+shell not being able to find or execute :command:`qpdf`.
+
+.. _ref.shell-completion:
+
+Shell Completion
+----------------
+
+Starting in qpdf version 8.3.0, qpdf provides its own completion support
+for zsh and bash. You can enable bash completion with :command:`eval
+$(qpdf --completion-bash)` and zsh completion with
+:command:`eval $(qpdf --completion-zsh)`. If
+:command:`qpdf` is not in your path, you should invoke it
+above with an absolute path. If you invoke it with a relative path, it
+will warn you, and the completion won't work if you're in a different
+directory.
+
+qpdf will use ``argv[0]`` to figure out where its executable is. This
+may produce unwanted results in some cases, especially if you are trying
+to use completion with copy of qpdf that is built from source. You can
+specify a full path to the qpdf you want to use for completion in the
+``QPDF_EXECUTABLE`` environment variable.
+
+.. _ref.basic-options:
+
+Basic Options
+-------------
+
+The following options are the most common ones and perform commonly
+needed transformations.
+
+:samp:`--help`
+ Display command-line invocation help.
+
+:samp:`--version`
+ Display the current version of qpdf.
+
+:samp:`--copyright`
+ Show detailed copyright information.
+
+:samp:`--show-crypto`
+ Show a list of available crypto providers, each on a line by itself.
+ The default provider is always listed first. See :ref:`ref.crypto` for more information about crypto
+ providers.
+
+:samp:`--completion-bash`
+ Output a completion command you can eval to enable shell completion
+ from bash.
+
+:samp:`--completion-zsh`
+ Output a completion command you can eval to enable shell completion
+ from zsh.
+
+:samp:`--password={password}`
+ Specifies a password for accessing encrypted files. To read the
+ password from a file or standard input, you can use
+ :samp:`--password-file`, added in qpdf 10.2. Note
+ that you can also use :samp:`@filename` or
+ :samp:`@-` as described above to put the password in
+ a file or pass it via standard input, but you would do so by
+ specifying the entire
+ :samp:`--password={password}`
+ option in the file. Syntax such as
+ :samp:`--password=@filename` won't work since
+ :samp:`@filename` is not recognized in the middle of
+ an argument.
+
+:samp:`--password-file={filename}`
+ Reads the first line from the specified file and uses it as the
+ password for accessing encrypted files.
+ :samp:`{filename}`
+ may be ``-`` to read the password from standard input. Note that, in
+ this case, the password is echoed and there is no prompt, so use with
+ caution.
+
+:samp:`--is-encrypted`
+ Silently exit with status 0 if the file is encrypted or status 2 if
+ the file is not encrypted. This is useful for shell scripts. Other
+ options are ignored if this is given. This option is mutually
+ exclusive with :samp:`--requires-password`. Both this
+ option and :samp:`--requires-password` exit with
+ status 2 for non-encrypted files.
+
+:samp:`--requires-password`
+ Silently exit with status 0 if a password (other than as supplied) is
+ required. Exit with status 2 if the file is not encrypted. Exit with
+ status 3 if the file is encrypted but requires no password or the
+ correct password has been supplied. This is useful for shell scripts.
+ Note that any supplied password is used when opening the file. When
+ used with a :samp:`--password` option, this option
+ can be used to check the correctness of the password. In that case,
+ an exit status of 3 means the file works with the supplied password.
+ This option is mutually exclusive with
+ :samp:`--is-encrypted`. Both this option and
+ :samp:`--is-encrypted` exit with status 2 for
+ non-encrypted files.
+
+:samp:`--verbose`
+ Increase verbosity of output. For now, this just prints some
+ indication of any file that it creates.
+
+:samp:`--progress`
+ Indicate progress while writing files.
+
+:samp:`--no-warn`
+ Suppress writing of warnings to stderr. If warnings were detected and
+ suppressed, :command:`qpdf` will still exit with exit
+ code 3. See also :samp:`--warning-exit-0`.
+
+:samp:`--warning-exit-0`
+ If warnings are found but no errors, exit with exit code 0 instead 3.
+ When combined with :samp:`--no-warn`, the effect is
+ for :command:`qpdf` to completely ignore warnings.
+
+:samp:`--linearize`
+ Causes generation of a linearized (web-optimized) output file.
+
+:samp:`--replace-input`
+ If specified, the output file name should be omitted. This option
+ tells qpdf to replace the input file with the output. It does this by
+ writing to
+ :file:`{infilename}.~qpdf-temp#`
+ and, when done, overwriting the input file with the temporary file.
+ If there were any warnings, the original input is saved as
+ :file:`{infilename}.~qpdf-orig`.
+
+:samp:`--copy-encryption=file`
+ Encrypt the file using the same encryption parameters, including user
+ and owner password, as the specified file. Use
+ :samp:`--encryption-file-password` to specify a
+ password if one is needed to open this file. Note that copying the
+ encryption parameters from a file also copies the first half of
+ ``/ID`` from the file since this is part of the encryption
+ parameters.
+
+:samp:`--encryption-file-password=password`
+ If the file specified with :samp:`--copy-encryption`
+ requires a password, specify the password using this option. Note
+ that only one of the user or owner password is required. Both
+ passwords will be preserved since QPDF does not distinguish between
+ the two passwords. It is possible to preserve encryption parameters,
+ including the owner password, from a file even if you don't know the
+ file's owner password.
+
+:samp:`--allow-weak-crypto`
+ Starting with version 10.4, qpdf issues warnings when requested to
+ create files using RC4 encryption. This option suppresses those
+ warnings. In future versions of qpdf, qpdf will refuse to create
+ files with weak cryptography when this flag is not given. See :ref:`ref.weak-crypto` for additional details.
+
+:samp:`--encrypt options --`
+ Causes generation an encrypted output file. Please see :ref:`ref.encryption-options` for details on how to specify
+ encryption parameters.
+
+:samp:`--decrypt`
+ Removes any encryption on the file. A password must be supplied if
+ the file is password protected.
+
+:samp:`--password-is-hex-key`
+ Overrides the usual computation/retrieval of the PDF file's
+ encryption key from user/owner password with an explicit
+ specification of the encryption key. When this option is specified,
+ the argument to the :samp:`--password` option is
+ interpreted as a hexadecimal-encoded key value. This only applies to
+ the password used to open the main input file. It does not apply to
+ other files opened by :samp:`--pages` or other
+ options or to files being written.
+
+ Most users will never have a need for this option, and no standard
+ viewers support this mode of operation, but it can be useful for
+ forensic or investigatory purposes. For example, if a PDF file is
+ encrypted with an unknown password, a brute-force attack using the
+ key directly is sometimes more efficient than one using the password.
+ Also, if a file is heavily damaged, it may be possible to derive the
+ encryption key and recover parts of the file using it directly. To
+ expose the encryption key used by an encrypted file that you can open
+ normally, use the :samp:`--show-encryption-key`
+ option.
+
+:samp:`--suppress-password-recovery`
+ Ordinarily, qpdf attempts to automatically compensate for passwords
+ specified in the wrong character encoding. This option suppresses
+ that behavior. Under normal conditions, there are no reasons to use
+ this option. See :ref:`ref.unicode-passwords` for a
+ discussion
+
+:samp:`--password-mode={mode}`
+ This option can be used to fine-tune how qpdf interprets Unicode
+ (non-ASCII) password strings passed on the command line. With the
+ exception of the :samp:`hex-bytes` mode, these only
+ apply to passwords provided when encrypting files. The
+ :samp:`hex-bytes` mode also applies to passwords
+ specified for reading files. For additional discussion of the
+ supported password modes and when you might want to use them, see
+ :ref:`ref.unicode-passwords`. The following modes
+ are supported:
+
+ - :samp:`auto`: Automatically determine whether the
+ specified password is a properly encoded Unicode (UTF-8) string,
+ and transcode it as required by the PDF spec based on the type
+ encryption being applied. On Windows starting with version 8.4.0,
+ and on almost all other modern platforms, incoming passwords will
+ be properly encoded in UTF-8, so this is almost always what you
+ want.
+
+ - :samp:`unicode`: Tells qpdf that the incoming
+ password is UTF-8, overriding whatever its automatic detection
+ determines. The only difference between this mode and
+ :samp:`auto` is that qpdf will fail with an error
+ message if the password is not valid UTF-8 instead of falling back
+ to :samp:`bytes` mode with a warning.
+
+ - :samp:`bytes`: Interpret the password as a literal
+ byte string. For non-Windows platforms, this is what versions of
+ qpdf prior to 8.4.0 did. For Windows platforms, there is no way to
+ specify strings of binary data on the command line directly, but
+ you can use the :samp:`@filename` option to do it,
+ in which case this option forces qpdf to respect the string of
+ bytes as provided. This option will allow you to encrypt PDF files
+ with passwords that will not be usable by other readers.
+
+ - :samp:`hex-bytes`: Interpret the password as a
+ hex-encoded string. This provides a way to pass binary data as a
+ password on all platforms including Windows. As with
+ :samp:`bytes`, this option may allow creation of
+ files that can't be opened by other readers. This mode affects
+ qpdf's interpretation of passwords specified for decrypting files
+ as well as for encrypting them. It makes it possible to specify
+ strings that are encoded in some manner other than the system's
+ default encoding.
+
+:samp:`--rotate=[+|-]angle[:page-range]`
+ Apply rotation to specified pages. The
+ :samp:`page-range` portion of the option value has
+ the same format as page ranges in :ref:`ref.page-selection`. If the page range is omitted, the
+ rotation is applied to all pages. The :samp:`angle`
+ portion of the parameter may be either 0, 90, 180, or 270. If
+ preceded by :samp:`+` or :samp:`-`,
+ the angle is added to or subtracted from the specified pages'
+ original rotations. This is almost always what you want. Otherwise
+ the pages' rotations are set to the exact value, which may cause the
+ appearances of the pages to be inconsistent, especially for scans.
+ For example, the command :command:`qpdf in.pdf out.pdf
+ --rotate=+90:2,4,6 --rotate=180:7-8` would rotate pages
+ 2, 4, and 6 90 degrees clockwise from their original rotation and
+ force the rotation of pages 7 through 8 to 180 degrees regardless of
+ their original rotation, and the command :command:`qpdf in.pdf
+ out.pdf --rotate=+180` would rotate all pages by 180
+ degrees.
+
+:samp:`--keep-files-open={[yn]}`
+ This option controls whether qpdf keeps individual files open while
+ merging. Prior to version 8.1.0, qpdf always kept all files open, but
+ this meant that the number of files that could be merged was limited
+ by the operating system's open file limit. Version 8.1.0 opened files
+ as they were referenced and closed them after each read, but this
+ caused a major performance impact. Version 8.2.0 optimized the
+ performance but did so in a way that, for local file systems, there
+ was a small but unavoidable performance hit, but for networked file
+ systems, the performance impact could be very high. Starting with
+ version 8.2.1, the default behavior is that files are kept open if no
+ more than 200 files are specified, but this default behavior can be
+ explicitly overridden with the
+ :samp:`--keep-files-open` flag. If you are merging
+ more than 200 files but less than the operating system's max open
+ files limit, you may want to use
+ :samp:`--keep-files-open=y`, especially if working
+ over a networked file system. If you are using a local file system
+ where the overhead is low and you might sometimes merge more than the
+ OS limit's number of files from a script and are not worried about a
+ few seconds additional processing time, you may want to specify
+ :samp:`--keep-files-open=n`. The threshold for
+ switching may be changed from the default 200 with the
+ :samp:`--keep-files-open-threshold` option.
+
+:samp:`--keep-files-open-threshold={count}`
+ If specified, overrides the default value of 200 used as the
+ threshold for qpdf deciding whether or not to keep files open. See
+ :samp:`--keep-files-open` for details.
+
+:samp:`--pages options --`
+ Select specific pages from one or more input files. See :ref:`ref.page-selection` for details on how to do
+ page selection (splitting and merging).
+
+:samp:`--collate={n}`
+ When specified, collate rather than concatenate pages from files
+ specified with :samp:`--pages`. With a numeric
+ argument, collate in groups of :samp:`{n}`.
+ The default is 1. See :ref:`ref.page-selection` for additional details.
+
+:samp:`--flatten-rotation`
+ For each page that is rotated using the ``/Rotate`` key in the page's
+ dictionary, remove the ``/Rotate`` key and implement the identical
+ rotation semantics by modifying the page's contents. This option can
+ be useful to prepare files for buggy PDF applications that don't
+ properly handle rotated pages.
+
+:samp:`--split-pages=[n]`
+ Write each group of :samp:`n` pages to a separate
+ output file. If :samp:`n` is not specified, create
+ single pages. Output file names are generated as follows:
+
+ - If the string ``%d`` appears in the output file name, it is
+ replaced with a range of zero-padded page numbers starting from 1.
+
+ - Otherwise, if the output file name ends in
+ :file:`.pdf` (case insensitive), a zero-padded
+ page range, preceded by a dash, is inserted before the file
+ extension.
+
+ - Otherwise, the file name is appended with a zero-padded page range
+ preceded by a dash.
+
+ Page ranges are a single number in the case of single-page groups or
+ two numbers separated by a dash otherwise. For example, if
+ :file:`infile.pdf` has 12 pages
+
+ - :command:`qpdf --split-pages infile.pdf %d-out`
+ would generate files :file:`01-out` through
+ :file:`12-out`
+
+ - :command:`qpdf --split-pages=2 infile.pdf
+ outfile.pdf` would generate files
+ :file:`outfile-01-02.pdf` through
+ :file:`outfile-11-12.pdf`
+
+ - :command:`qpdf --split-pages infile.pdf
+ something.else` would generate files
+ :file:`something.else-01` through
+ :file:`something.else-12`
+
+ Note that outlines, threads, and other global features of the
+ original PDF file are not preserved. For each page of output, this
+ option creates an empty PDF and copies a single page from the output
+ into it. If you require the global data, you will have to run
+ :command:`qpdf` with the
+ :samp:`--pages` option once for each file. Using
+ :samp:`--split-pages` is much faster if you don't
+ require the global data.
+
+:samp:`--overlay options --`
+ Overlay pages from another file onto the output pages. See :ref:`ref.overlay-underlay` for details on
+ overlay/underlay.
+
+:samp:`--underlay options --`
+ Overlay pages from another file onto the output pages. See :ref:`ref.overlay-underlay` for details on
+ overlay/underlay.
+
+Password-protected files may be opened by specifying a password. By
+default, qpdf will preserve any encryption data associated with a file.
+If :samp:`--decrypt` is specified, qpdf will attempt to
+remove any encryption information. If :samp:`--encrypt`
+is specified, qpdf will replace the document's encryption parameters
+with whatever is specified.
+
+Note that qpdf does not obey encryption restrictions already imposed on
+the file. Doing so would be meaningless since qpdf can be used to remove
+encryption from the file entirely. This functionality is not intended to
+be used for bypassing copyright restrictions or other restrictions
+placed on files by their producers.
+
+Prior to 8.4.0, in the case of passwords that contain characters that
+fall outside of 7-bit US-ASCII, qpdf left the burden of supplying
+properly encoded encryption and decryption passwords to the user.
+Starting in qpdf 8.4.0, qpdf does this automatically in most cases. For
+an in-depth discussion, please see :ref:`ref.unicode-passwords`. Previous versions of this manual
+described workarounds using the :command:`iconv` command.
+Such workarounds are no longer required or recommended with qpdf 8.4.0.
+However, for backward compatibility, qpdf attempts to detect those
+workarounds and do the right thing in most cases.
+
+.. _ref.encryption-options:
+
+Encryption Options
+------------------
+
+To change the encryption parameters of a file, use the --encrypt flag.
+The syntax is
+
+::
+
+ --encrypt user-password owner-password key-length [ restrictions ] --
+
+Note that ":samp:`--`" terminates parsing of encryption
+flags and must be present even if no restrictions are present.
+
+Either or both of the user password and the owner password may be empty
+strings. Starting in qpdf 10.2, qpdf defaults to not allowing creation
+of PDF files with a non-empty user password, an empty owner password,
+and a 256-bit key since such files can be opened with no password. If
+you want to create such files, specify the encryption option
+:samp:`--allow-insecure`, as described below.
+
+The value for
+:samp:`{key-length}` may
+be 40, 128, or 256. The restriction flags are dependent upon key length.
+When no additional restrictions are given, the default is to be fully
+permissive.
+
+If :samp:`{key-length}`
+is 40, the following restriction options are available:
+
+:samp:`--print=[yn]`
+ Determines whether or not to allow printing.
+
+:samp:`--modify=[yn]`
+ Determines whether or not to allow document modification.
+
+:samp:`--extract=[yn]`
+ Determines whether or not to allow text/image extraction.
+
+:samp:`--annotate=[yn]`
+ Determines whether or not to allow comments and form fill-in and
+ signing.
+
+If :samp:`{key-length}`
+is 128, the following restriction options are available:
+
+:samp:`--accessibility=[yn]`
+ Determines whether or not to allow accessibility to visually
+ impaired. The qpdf library disregards this field when AES is used or
+ when 256-bit encryption is used. You should really never disable
+ accessibility, but qpdf lets you do it in case you need to configure
+ a file this way for testing purposes. The PDF spec says that
+ conforming readers should disregard this permission and always allow
+ accessibility.
+
+:samp:`--extract=[yn]`
+ Determines whether or not to allow text/graphic extraction.
+
+:samp:`--assemble=[yn]`
+ Determines whether document assembly (rotation and reordering of
+ pages) is allowed.
+
+:samp:`--annotate=[yn]`
+ Determines whether modifying annotations is allowed. This includes
+ adding comments and filling in form fields. Also allows editing of
+ form fields if :samp:`--modify-other=y` is given.
+
+:samp:`--form=[yn]`
+ Determines whether filling form fields is allowed.
+
+:samp:`--modify-other=[yn]`
+ Allow all document editing except those controlled separately by the
+ :samp:`--assemble`,
+ :samp:`--annotate`, and
+ :samp:`--form` options.
+
+:samp:`--print={print-opt}`
+ Controls printing access.
+ :samp:`{print-opt}`
+ may be one of the following:
+
+ - :samp:`full`: allow full printing
+
+ - :samp:`low`: allow low-resolution printing only
+
+ - :samp:`none`: disallow printing
+
+:samp:`--modify={modify-opt}`
+ Controls modify access. This way of controlling modify access has
+ less granularity than new options added in qpdf 8.4.
+ :samp:`{modify-opt}`
+ may be one of the following:
+
+ - :samp:`all`: allow full document modification
+
+ - :samp:`annotate`: allow comment authoring, form
+ operations, and document assembly
+
+ - :samp:`form`: allow form field fill-in and signing
+ and document assembly
+
+ - :samp:`assembly`: allow document assembly only
+
+ - :samp:`none`: allow no modifications
+
+ Using the :samp:`--modify` option does not allow you
+ to create certain combinations of permissions such as allowing form
+ filling but not allowing document assembly. Starting with qpdf 8.4,
+ you can either just use the other options to control fields
+ individually, or you can use something like :samp:`--modify=form
+ --assembly=n` to fine tune.
+
+:samp:`--cleartext-metadata`
+ If specified, any metadata stream in the document will be left
+ unencrypted even if the rest of the document is encrypted. This also
+ forces the PDF version to be at least 1.5.
+
+:samp:`--use-aes=[yn]`
+ If :samp:`--use-aes=y` is specified, AES encryption
+ will be used instead of RC4 encryption. This forces the PDF version
+ to be at least 1.6.
+
+:samp:`--allow-insecure`
+ From qpdf 10.2, qpdf defaults to not allowing creation of PDF files
+ where the user password is non-empty, the owner password is empty,
+ and a 256-bit key is in use. Files created in this way are insecure
+ since they can be opened without a password. Users would ordinarily
+ never want to create such files. If you are using qpdf to
+ intentionally created strange files for testing (a definite valid use
+ of qpdf!), this option allows you to create such insecure files.
+
+:samp:`--force-V4`
+ Use of this option forces the ``/V`` and ``/R`` parameters in the
+ document's encryption dictionary to be set to the value ``4``. As
+ qpdf will automatically do this when required, there is no reason to
+ ever use this option. It exists primarily for use in testing qpdf
+ itself. This option also forces the PDF version to be at least 1.5.
+
+If :samp:`{key-length}`
+is 256, the minimum PDF version is 1.7 with extension level 8, and the
+AES-based encryption format used is the PDF 2.0 encryption method
+supported by Acrobat X. the same options are available as with 128 bits
+with the following exceptions:
+
+:samp:`--use-aes`
+ This option is not available with 256-bit keys. AES is always used
+ with 256-bit encryption keys.
+
+:samp:`--force-V4`
+ This option is not available with 256 keys.
+
+:samp:`--force-R5`
+ If specified, qpdf sets the minimum version to 1.7 at extension level
+ 3 and writes the deprecated encryption format used by Acrobat version
+ IX. This option should not be used in practice to generate PDF files
+ that will be in general use, but it can be useful to generate files
+ if you are trying to test proper support in another application for
+ PDF files encrypted in this way.
+
+The default for each permission option is to be fully permissive.
+
+.. _ref.page-selection:
+
+Page Selection Options
+----------------------
+
+Starting with qpdf 3.0, it is possible to split and merge PDF files by
+selecting pages from one or more input files. Whatever file is given as
+the primary input file is used as the starting point, but its pages are
+replaced with pages as specified.
+
+::
+
+ --pages input-file [ --password=password ] [ page-range ] [ ... ] --
+
+Multiple input files may be specified. Each one is given as the name of
+the input file, an optional password (if required to open the file), and
+the range of pages. Note that ":samp:`--`" terminates
+parsing of page selection flags.
+
+Starting with qpf 8.4, the special input file name
+":file:`.`" can be used as a shortcut for the
+primary input filename.
+
+For each file that pages should be taken from, specify the file, a
+password needed to open the file (if any), and a page range. The
+password needs to be given only once per file. If any of the input files
+are the same as the primary input file or the file used to copy
+encryption parameters (if specified), you do not need to repeat the
+password here. The same file can be repeated multiple times. If a file
+that is repeated has a password, the password only has to be given the
+first time. All non-page data (info, outlines, page numbers, etc.) are
+taken from the primary input file. To discard these, use
+:samp:`--empty` as the primary input.
+
+Starting with qpdf 5.0.0, it is possible to omit the page range. If qpdf
+sees a value in the place where it expects a page range and that value
+is not a valid range but is a valid file name, qpdf will implicitly use
+the range ``1-z``, meaning that it will include all pages in the file.
+This makes it possible to easily combine all pages in a set of files
+with a command like :command:`qpdf --empty out.pdf --pages \*.pdf
+--`.
+
+The page range is a set of numbers separated by commas, ranges of
+numbers separated dashes, or combinations of those. The character "z"
+represents the last page. A number preceded by an "r" indicates to count
+from the end, so ``r3-r1`` would be the last three pages of the
+document. Pages can appear in any order. Ranges can appear with a high
+number followed by a low number, which causes the pages to appear in
+reverse. Numbers may be repeated in a page range. A page range may be
+optionally appended with ``:even`` or ``:odd`` to indicate only the even
+or odd pages in the given range. Note that even and odd refer to the
+positions within the specified, range, not whether the original number
+is even or odd.
+
+Example page ranges:
+
+- ``1,3,5-9,15-12``: pages 1, 3, 5, 6, 7, 8, 9, 15, 14, 13, and 12 in
+ that order.
+
+- ``z-1``: all pages in the document in reverse
+
+- ``r3-r1``: the last three pages of the document
+
+- ``r1-r3``: the last three pages of the document in reverse order
+
+- ``1-20:even``: even pages from 2 to 20
+
+- ``5,7-9,12:odd``: pages 5, 8, and, 12, which are the pages in odd
+ positions from among the original range, which represents pages 5, 7,
+ 8, 9, and 12.
+
+Starting in qpdf version 8.3, you can specify the
+:samp:`--collate` option. Note that this option is
+specified outside of :samp:`--pages ... --`. When
+:samp:`--collate` is specified, it changes the meaning
+of :samp:`--pages` so that the specified files, as
+modified by page ranges, are collated rather than concatenated. For
+example, if you add the files :file:`odd.pdf` and
+:file:`even.pdf` containing odd and even pages of a
+document respectively, you could run :command:`qpdf --collate odd.pdf
+--pages odd.pdf even.pdf -- all.pdf` to collate the pages.
+This would pick page 1 from odd, page 1 from even, page 2 from odd, page
+2 from even, etc. until all pages have been included. Any number of
+files and page ranges can be specified. If any file has fewer pages,
+that file is just skipped when its pages have all been included. For
+example, if you ran :command:`qpdf --collate --empty --pages a.pdf
+1-5 b.pdf 6-4 c.pdf r1 -- out.pdf`, you would get the
+following pages in this order:
+
+- a.pdf page 1
+
+- b.pdf page 6
+
+- c.pdf last page
+
+- a.pdf page 2
+
+- b.pdf page 5
+
+- a.pdf page 3
+
+- b.pdf page 4
+
+- a.pdf page 4
+
+- a.pdf page 5
+
+Starting in qpdf version 10.2, you may specify a numeric argument to
+:samp:`--collate`. With
+:samp:`--collate={n}`,
+pull groups of :samp:`{n}` pages from each file,
+again, stopping when there are no more pages. For example, if you ran
+:command:`qpdf --collate=2 --empty --pages a.pdf 1-5 b.pdf 6-4 c.pdf
+r1 -- out.pdf`, you would get the following pages in this
+order:
+
+- a.pdf page 1
+
+- a.pdf page 2
+
+- b.pdf page 6
+
+- b.pdf page 5
+
+- c.pdf last page
+
+- a.pdf page 3
+
+- a.pdf page 4
+
+- b.pdf page 4
+
+- a.pdf page 5
+
+Starting in qpdf version 8.3, when you split and merge files, any page
+labels (page numbers) are preserved in the final file. It is expected
+that more document features will be preserved by splitting and merging.
+In the mean time, semantics of splitting and merging vary across
+features. For example, the document's outlines (bookmarks) point to
+actual page objects, so if you select some pages and not others,
+bookmarks that point to pages that are in the output file will work, and
+remaining bookmarks will not work. A future version of
+:command:`qpdf` may do a better job at handling these
+issues. (Note that the qpdf library already contains all of the APIs
+required in order to implement this in your own application if you need
+it.) In the mean time, you can always use
+:samp:`--empty` as the primary input file to avoid
+copying all of that from the first file. For example, to take pages 1
+through 5 from a :file:`infile.pdf` while preserving
+all metadata associated with that file, you could use
+
+::
+
+ qpdf infile.pdf --pages . 1-5 -- outfile.pdf
+
+If you wanted pages 1 through 5 from
+:file:`infile.pdf` but you wanted the rest of the
+metadata to be dropped, you could instead run
+
+::
+
+ qpdf --empty --pages infile.pdf 1-5 -- outfile.pdf
+
+If you wanted to take pages 1 through 5 from
+:file:`file1.pdf` and pages 11 through 15 from
+:file:`file2.pdf` in reverse, taking document-level
+metadata from :file:`file2.pdf`, you would run
+
+::
+
+ qpdf file2.pdf --pages file1.pdf 1-5 . 15-11 -- outfile.pdf
+
+If, for some reason, you wanted to take the first page of an encrypted
+file called :file:`encrypted.pdf` with password
+``pass`` and repeat it twice in an output file, and if you wanted to
+drop document-level metadata but preserve encryption, you would use
+
+::
+
+ qpdf --empty --copy-encryption=encrypted.pdf --encryption-file-password=pass
+ --pages encrypted.pdf --password=pass 1 ./encrypted.pdf --password=pass 1 --
+ outfile.pdf
+
+Note that we had to specify the password all three times because giving
+a password as :samp:`--encryption-file-password` doesn't
+count for page selection, and as far as qpdf is concerned,
+:file:`encrypted.pdf` and
+:file:`./encrypted.pdf` are separated files. These
+are all corner cases that most users should hopefully never have to be
+bothered with.
+
+Prior to version 8.4, it was not possible to specify the same page from
+the same file directly more than once, and the workaround of specifying
+the same file in more than one way was required. Version 8.4 removes
+this limitation, but there is still a valid use case. When you specify
+the same page from the same file more than once, qpdf will share objects
+between the pages. If you are going to do further manipulation on the
+file and need the two instances of the same original page to be deep
+copies, then you can specify the file in two different ways. For example
+:command:`qpdf in.pdf --pages . 1 ./in.pdf 1 -- out.pdf`
+would create a file with two copies of the first page of the input, and
+the two copies would share any objects in common. This includes fonts,
+images, and anything else the page references.
+
+.. _ref.overlay-underlay:
+
+Overlay and Underlay Options
+----------------------------
+
+Starting with qpdf 8.4, it is possible to overlay or underlay pages from
+other files onto the output generated by qpdf. Specify overlay or
+underlay as follows:
+
+::
+
+ { --overlay | --underlay } file [ options ] --
+
+Overlay and underlay options are processed late, so they can be combined
+with other like merging and will apply to the final output. The
+:samp:`--overlay` and :samp:`--underlay`
+options work the same way, except underlay pages are drawn underneath
+the page to which they are applied, possibly obscured by the original
+page, and overlay files are drawn on top of the page to which they are
+applied, possibly obscuring the page. You can combine overlay and
+underlay.
+
+The default behavior of overlay and underlay is that pages are taken
+from the overlay/underlay file in sequence and applied to corresponding
+pages in the output until there are no more output pages. If the overlay
+or underlay file runs out of pages, remaining output pages are left
+alone. This behavior can be modified by options, which are provided
+between the :samp:`--overlay` or
+:samp:`--underlay` flag and the
+:samp:`--` option. The following options are supported:
+
+- :samp:`--password=password`: supply a password if the
+ overlay/underlay file is encrypted.
+
+- :samp:`--to=page-range`: a range of pages in the same
+ form at described in :ref:`ref.page-selection`
+ indicates which pages in the output should have the overlay/underlay
+ applied. If not specified, overlay/underlay are applied to all pages.
+
+- :samp:`--from=[page-range]`: a range of pages that
+ specifies which pages in the overlay/underlay file will be used for
+ overlay or underlay. If not specified, all pages will be used. This
+ can be explicitly specified to be empty if
+ :samp:`--repeat` is used.
+
+- :samp:`--repeat=page-range`: an optional range of
+ pages that specifies which pages in the overlay/underlay file will be
+ repeated after the "from" pages are used up. If you want to repeat a
+ range of pages starting at the beginning, you can explicitly use
+ :samp:`--from=`.
+
+Here are some examples.
+
+- :command:`--overlay o.pdf --to=1-5 --from=1-3 --repeat=4
+ --`: overlay the first three pages from file
+ :file:`o.pdf` onto the first three pages of the
+ output, then overlay page 4 from :file:`o.pdf`
+ onto pages 4 and 5 of the output. Leave remaining output pages
+ untouched.
+
+- :command:`--underlay footer.pdf --from= --repeat=1,2
+ --`: Underlay page 1 of
+ :file:`footer.pdf` on all odd output pages, and
+ underlay page 2 of :file:`footer.pdf` on all even
+ output pages.
+
+.. _ref.attachments:
+
+Embedded Files/Attachments Options
+----------------------------------
+
+Starting with qpdf 10.2, you can work with file attachments in PDF files
+from the command line. The following options are available:
+
+:samp:`--list-attachments`
+ Show the "key" and stream number for embedded files. With
+ :samp:`--verbose`, additional information, including
+ preferred file name, description, dates, and more are also displayed.
+ The key is usually but not always equal to the file name, and is
+ needed by some of the other options.
+
+:samp:`--show-attachment={key}`
+ Write the contents of the specified attachment to standard output as
+ binary data. The key should match one of the keys shown by
+ :samp:`--list-attachments`. If specified multiple
+ times, only the last attachment will be shown.
+
+:samp:`--add-attachment {file} {options} --`
+ Add or replace an attachment with the contents of
+ :samp:`{file}`. This may be specified more
+ than once. The following additional options may appear before the
+ ``--`` that ends this option:
+
+ :samp:`--key={key}`
+ The key to use to register the attachment in the embedded files
+ table. Defaults to the last path element of
+ :samp:`{file}`.
+
+ :samp:`--filename={name}`
+ The file name to be used for the attachment. This is what is
+ usually displayed to the user and is the name most graphical PDF
+ viewers will use when saving a file. It defaults to the last path
+ element of :samp:`{file}`.
+
+ :samp:`--creationdate={date}`
+ The attachment's creation date in PDF format; defaults to the
+ current time. The date format is explained below.
+
+ :samp:`--moddate={date}`
+ The attachment's modification date in PDF format; defaults to the
+ current time. The date format is explained below.
+
+ :samp:`--mimetype={type/subtype}`
+ The mime type for the attachment, e.g. ``text/plain`` or
+ ``application/pdf``. Note that the mimetype appears in a field
+ called ``/Subtype`` in the PDF but actually includes the full type
+ and subtype of the mime type.
+
+ :samp:`--description={"text"}`
+ Descriptive text for the attachment, displayed by some PDF
+ viewers.
+
+ :samp:`--replace`
+ Indicates that any existing attachment with the same key should be
+ replaced by the new attachment. Otherwise,
+ :command:`qpdf` gives an error if an attachment
+ with that key is already present.
+
+:samp:`--remove-attachment={key}`
+ Remove the specified attachment. This doesn't only remove the
+ attachment from the embedded files table but also clears out the file
+ specification. That means that any potential internal links to the
+ attachment will be broken. This option may be specified multiple
+ times. Run with :samp:`--verbose` to see status of
+ the removal.
+
+:samp:`--copy-attachments-from {file} {options} --`
+ Copy attachments from another file. This may be specified more than
+ once. The following additional options may appear before the ``--``
+ that ends this option:
+
+ :samp:`--password={password}`
+ If required, the password needed to open
+ :samp:`{file}`
+
+ :samp:`--prefix={prefix}`
+ Only required if the file from which attachments are being copied
+ has attachments with keys that conflict with attachments already
+ in the file. In this case, the specified prefix will be prepended
+ to each key. This affects only the key in the embedded files
+ table, not the file name. The PDF specification doesn't preclude
+ multiple attachments having the same file name.
+
+When a date is required, the date should conform to the PDF date format
+specification, which is
+``D:``\ :samp:`{yyyymmddhhmmss<z>}`, where
+:samp:`{<z>}` is either ``Z`` for UTC or a
+timezone offset in the form :samp:`{-hh'mm'}` or
+:samp:`{+hh'mm'}`. Examples:
+``D:20210207161528-05'00'``, ``D:20210207211528Z``.
+
+.. _ref.advanced-parsing:
+
+Advanced Parsing Options
+------------------------
+
+These options control aspects of how qpdf reads PDF files. Mostly these
+are of use to people who are working with damaged files. There is little
+reason to use these options unless you are trying to solve specific
+problems. The following options are available:
+
+:samp:`--suppress-recovery`
+ Prevents qpdf from attempting to recover damaged files.
+
+:samp:`--ignore-xref-streams`
+ Tells qpdf to ignore any cross-reference streams.
+
+Ordinarily, qpdf will attempt to recover from certain types of errors in
+PDF files. These include errors in the cross-reference table, certain
+types of object numbering errors, and certain types of stream length
+errors. Sometimes, qpdf may think it has recovered but may not have
+actually recovered, so care should be taken when using this option as
+some data loss is possible. The
+:samp:`--suppress-recovery` option will prevent qpdf
+from attempting recovery. In this case, it will fail on the first error
+that it encounters.
+
+Ordinarily, qpdf reads cross-reference streams when they are present in
+a PDF file. If :samp:`--ignore-xref-streams` is
+specified, qpdf will ignore any cross-reference streams for hybrid PDF
+files. The purpose of hybrid files is to make some content available to
+viewers that are not aware of cross-reference streams. It is almost
+never desirable to ignore them. The only time when you might want to use
+this feature is if you are testing creation of hybrid PDF files and wish
+to see how a PDF consumer that doesn't understand object and
+cross-reference streams would interpret such a file.
+
+.. _ref.advanced-transformation:
+
+Advanced Transformation Options
+-------------------------------
+
+These transformation options control fine points of how qpdf creates the
+output file. Mostly these are of use only to people who are very
+familiar with the PDF file format or who are PDF developers. The
+following options are available:
+
+:samp:`--compress-streams={[yn]}`
+ By default, or with :samp:`--compress-streams=y`,
+ qpdf will compress any stream with no other filters applied to it
+ with the ``/FlateDecode`` filter when it writes it. To suppress this
+ behavior and preserve uncompressed streams as uncompressed, use
+ :samp:`--compress-streams=n`.
+
+:samp:`--decode-level={option}`
+ Controls which streams qpdf tries to decode. The default is
+ :samp:`generalized`. The following options are
+ available:
+
+ - :samp:`none`: do not attempt to decode any streams
+
+ - :samp:`generalized`: decode streams filtered with
+ supported generalized filters: ``/LZWDecode``, ``/FlateDecode``,
+ ``/ASCII85Decode``, and ``/ASCIIHexDecode``. We define generalized
+ filters as those to be used for general-purpose compression or
+ encoding, as opposed to filters specifically designed for image
+ data. Note that, by default, streams already compressed with
+ ``/FlateDecode`` are not uncompressed and recompressed unless you
+ also specify :samp:`--recompress-flate`.
+
+ - :samp:`specialized`: in addition to generalized,
+ decode streams with supported non-lossy specialized filters;
+ currently this is just ``/RunLengthDecode``
+
+ - :samp:`all`: in addition to generalized and
+ specialized, decode streams with supported lossy filters;
+ currently this is just ``/DCTDecode`` (JPEG)
+
+:samp:`--stream-data={option}`
+ Controls transformation of stream data. This option predates the
+ :samp:`--compress-streams` and
+ :samp:`--decode-level` options. Those options can be
+ used to achieve the same affect with more control. The value of
+ :samp:`{option}` may
+ be one of the following:
+
+ - :samp:`compress`: recompress stream data when
+ possible (default); equivalent to
+ :samp:`--compress-streams=y`
+ :samp:`--decode-level=generalized`. Does not
+ recompress streams already compressed with ``/FlateDecode`` unless
+ :samp:`--recompress-flate` is also specified.
+
+ - :samp:`preserve`: leave all stream data as is;
+ equivalent to :samp:`--compress-streams=n`
+ :samp:`--decode-level=none`
+
+ - :samp:`uncompress`: uncompress stream data
+ compressed with generalized filters when possible; equivalent to
+ :samp:`--compress-streams=n`
+ :samp:`--decode-level=generalized`
+
+:samp:`--recompress-flate`
+ By default, streams already compressed with ``/FlateDecode`` are left
+ alone rather than being uncompressed and recompressed. This option
+ causes qpdf to uncompress and recompress the streams. There is a
+ significant performance cost to using this option, but you probably
+ want to use it if you specify
+ :samp:`--compression-level`.
+
+:samp:`--compression-level={level}`
+ When writing new streams that are compressed with ``/FlateDecode``,
+ use the specified compression level. The value of
+ :samp:`level` should be a number from 1 to 9 and is
+ passed directly to zlib, which implements deflate compression. Note
+ that qpdf doesn't uncompress and recompress streams by default. To
+ have this option apply to already compressed streams, you should also
+ specify :samp:`--recompress-flate`. If your goal is
+ to shrink the size of PDF files, you should also use
+ :samp:`--object-streams=generate`.
+
+:samp:`--normalize-content=[yn]`
+ Enables or disables normalization of content streams. Content
+ normalization is enabled by default in QDF mode. Please see :ref:`ref.qdf` for additional discussion of QDF mode.
+
+:samp:`--object-streams={mode}`
+ Controls handling of object streams. The value of
+ :samp:`{mode}` may be
+ one of the following:
+
+ - :samp:`preserve`: preserve original object streams
+ (default)
+
+ - :samp:`disable`: don't write any object streams
+
+ - :samp:`generate`: use object streams wherever
+ possible
+
+:samp:`--preserve-unreferenced`
+ Tells qpdf to preserve objects that are not referenced when writing
+ the file. Ordinarily any object that is not referenced in a traversal
+ of the document from the trailer dictionary will be discarded. This
+ may be useful in working with some damaged files or inspecting files
+ with known unreferenced objects.
+
+ This flag is ignored for linearized files and has the effect of
+ causing objects in the new file to be written in order by object ID
+ from the original file. This does not mean that object numbers will
+ be the same since qpdf may create stream lengths as direct or
+ indirect differently from the original file, and the original file
+ may have gaps in its numbering.
+
+ See also :samp:`--preserve-unreferenced-resources`,
+ which does something completely different.
+
+:samp:`--remove-unreferenced-resources={option}`
+ The :samp:`{option}` may be ``auto``,
+ ``yes``, or ``no``. The default is ``auto``.
+
+ Starting with qpdf 8.1, when splitting pages, qpdf is able to attempt
+ to remove images and fonts that are not used by a page even if they
+ are referenced in the page's resources dictionary. When shared
+ resources are in use, this behavior can greatly reduce the file sizes
+ of split pages, but the analysis is very slow. In versions from 8.1
+ through 9.1.1, qpdf did this analysis by default. Starting in qpdf
+ 10.0.0, if ``auto`` is used, qpdf does a quick analysis of the file
+ to determine whether the file is likely to have unreferenced objects
+ on pages, a pattern that frequently occurs when resource dictionaries
+ are shared across multiple pages and rarely occurs otherwise. If it
+ discovers this pattern, then it will attempt to remove unreferenced
+ resources. Usually this means you get the slower splitting speed only
+ when it's actually going to create smaller files. You can suppress
+ removal of unreferenced resources altogether by specifying ``no`` or
+ force it to do the full algorithm by specifying ``yes``.
+
+ Other than cases in which you don't care about file size and care a
+ lot about runtime, there are few reasons to use this option,
+ especially now that ``auto`` mode is supported. One reason to use
+ this is if you suspect that qpdf is removing resources it shouldn't
+ be removing. If you encounter that case, please report it as bug at
+ https://github.com/qpdf/qpdf/issues/.
+
+:samp:`--preserve-unreferenced-resources`
+ This is a synonym for
+ :samp:`--remove-unreferenced-resources=no`.
+
+ See also :samp:`--preserve-unreferenced`, which does
+ something completely different.
+
+:samp:`--newline-before-endstream`
+ Tells qpdf to insert a newline before the ``endstream`` keyword, not
+ counted in the length, after any stream content even if the last
+ character of the stream was a newline. This may result in two
+ newlines in some cases. This is a requirement of PDF/A. While qpdf
+ doesn't specifically know how to generate PDF/A-compliant PDFs, this
+ at least prevents it from removing compliance on already compliant
+ files.
+
+:samp:`--linearize-pass1={file}`
+ Write the first pass of linearization to the named file. The
+ resulting file is not a valid PDF file. This option is useful only
+ for debugging ``QPDFWriter``'s linearization code. When qpdf
+ linearizes files, it writes the file in two passes, using the first
+ pass to calculate sizes and offsets that are required for hint tables
+ and the linearization dictionary. Ordinarily, the first pass is
+ discarded. This option enables it to be captured.
+
+:samp:`--coalesce-contents`
+ When a page's contents are split across multiple streams, this option
+ causes qpdf to combine them into a single stream. Use of this option
+ is never necessary for ordinary usage, but it can help when working
+ with some files in some cases. For example, this can also be combined
+ with QDF mode or content normalization to make it easier to look at
+ all of a page's contents at once.
+
+:samp:`--flatten-annotations={option}`
+ This option collapses annotations into the pages' contents with
+ special handling for form fields. Ordinarily, an annotation is
+ rendered separately and on top of the page. Combining annotations
+ into the page's contents effectively freezes the placement of the
+ annotations, making them look right after various page
+ transformations. The library functionality backing this option was
+ added for the benefit of programs that want to create *n-up* page
+ layouts and other similar things that don't work well with
+ annotations. The :samp:`{option}` parameter
+ may be any of the following:
+
+ - :samp:`all`: include all annotations that are not
+ marked invisible or hidden
+
+ - :samp:`print`: only include annotations that
+ indicate that they should appear when the page is printed
+
+ - :samp:`screen`: omit annotations that indicate
+ they should not appear on the screen
+
+ Note that form fields are special because the annotations that are
+ used to render filled-in form fields may become out of date from the
+ fields' values if the form is filled in by a program that doesn't
+ know how to update the appearances. If qpdf detects this case, its
+ default behavior is not to flatten those annotations because doing so
+ would cause the value of the form field to be lost. This gives you a
+ chance to go back and resave the form with a program that knows how
+ to generate appearances. QPDF itself can generate appearances with
+ some limitations. See the
+ :samp:`--generate-appearances` option below.
+
+:samp:`--generate-appearances`
+ If a file contains interactive form fields and indicates that the
+ appearances are out of date with the values of the form, this flag
+ will regenerate appearances, subject to a few limitations. Note that
+ there is not usually a reason to do this, but it can be necessary
+ before using the :samp:`--flatten-annotations`
+ option. Most of these are not a problem with well-behaved PDF files.
+ The limitations are as follows:
+
+ - Radio button and checkbox appearances use the pre-set values in
+ the PDF file. QPDF just makes sure that the correct appearance is
+ displayed based on the value of the field. This is fine for PDF
+ files that create their forms properly. Some PDF writers save
+ appearances for fields when they change, which could cause some
+ controls to have inconsistent appearances.
+
+ - For text fields and list boxes, any characters that fall outside
+ of US-ASCII or, if detected, "Windows ANSI" or "Mac Roman"
+ encoding, will be replaced by the ``?`` character.
+
+ - Quadding is ignored. Quadding is used to specify whether the
+ contents of a field should be left, center, or right aligned with
+ the field.
+
+ - Rich text, multi-line, and other more elaborate formatting
+ directives are ignored.
+
+ - There is no support for multi-select fields or signature fields.
+
+ If qpdf doesn't do a good enough job with your form, use an external
+ application to save your filled-in form before processing it with
+ qpdf.
+
+:samp:`--optimize-images`
+ This flag causes qpdf to recompress all images that are not
+ compressed with DCT (JPEG) using DCT compression as long as doing so
+ decreases the size in bytes of the image data and the image does not
+ fall below minimum specified dimensions. Useful information is
+ provided when used in combination with
+ :samp:`--verbose`. See also the
+ :samp:`--oi-min-width`,
+ :samp:`--oi-min-height`, and
+ :samp:`--oi-min-area` options. By default, starting
+ in qpdf 8.4, inline images are converted to regular images and
+ optimized as well. Use :samp:`--keep-inline-images`
+ to prevent inline images from being included.
+
+:samp:`--oi-min-width={width}`
+ Avoid optimizing images whose width is below the specified amount. If
+ omitted, the default is 128 pixels. Use 0 for no minimum.
+
+:samp:`--oi-min-height={height}`
+ Avoid optimizing images whose height is below the specified amount.
+ If omitted, the default is 128 pixels. Use 0 for no minimum.
+
+:samp:`--oi-min-area={area-in-pixels}`
+ Avoid optimizing images whose pixel count (width × height) is below
+ the specified amount. If omitted, the default is 16,384 pixels. Use 0
+ for no minimum.
+
+:samp:`--externalize-inline-images`
+ Convert inline images to regular images. By default, images whose
+ data is at least 1,024 bytes are converted when this option is
+ selected. Use :samp:`--ii-min-bytes` to change the
+ size threshold. This option is implicitly selected when
+ :samp:`--optimize-images` is selected. Use
+ :samp:`--keep-inline-images` to exclude inline images
+ from image optimization.
+
+:samp:`--ii-min-bytes={bytes}`
+ Avoid converting inline images whose size is below the specified
+ minimum size to regular images. If omitted, the default is 1,024
+ bytes. Use 0 for no minimum.
+
+:samp:`--keep-inline-images`
+ Prevent inline images from being included in image optimization. This
+ option has no affect when :samp:`--optimize-images`
+ is not specified.
+
+:samp:`--remove-page-labels`
+ Remove page labels from the output file.
+
+:samp:`--qdf`
+ Turns on QDF mode. For additional information on QDF, please see :ref:`ref.qdf`. Note that :samp:`--linearize`
+ disables QDF mode.
+
+:samp:`--min-version={version}`
+ Forces the PDF version of the output file to be at least
+ :samp:`{version}`. In other words, if the
+ input file has a lower version than the specified version, the
+ specified version will be used. If the input file has a higher
+ version, the input file's original version will be used. It is seldom
+ necessary to use this option since qpdf will automatically increase
+ the version as needed when adding features that require newer PDF
+ readers.
+
+ The version number may be expressed in the form
+ :samp:`{major.minor.extension-level}`, in
+ which case the version is interpreted as
+ :samp:`{major.minor}` at extension level
+ :samp:`{extension-level}`. For example,
+ version ``1.7.8`` represents version 1.7 at extension level 8. Note
+ that minimal syntax checking is done on the command line.
+
+:samp:`--force-version={version}`
+ This option forces the PDF version to be the exact version specified
+ *even when the file may have content that is not supported in that
+ version*. The version number is interpreted in the same way as with
+ :samp:`--min-version` so that extension levels can be
+ set. In some cases, forcing the output file's PDF version to be lower
+ than that of the input file will cause qpdf to disable certain
+ features of the document. Specifically, 256-bit keys are disabled if
+ the version is less than 1.7 with extension level 8 (except R5 is
+ disabled if less than 1.7 with extension level 3), AES encryption is
+ disabled if the version is less than 1.6, cleartext metadata and
+ object streams are disabled if less than 1.5, 128-bit encryption keys
+ are disabled if less than 1.4, and all encryption is disabled if less
+ than 1.3. Even with these precautions, qpdf won't be able to do
+ things like eliminate use of newer image compression schemes,
+ transparency groups, or other features that may have been added in
+ more recent versions of PDF.
+
+ As a general rule, with the exception of big structural things like
+ the use of object streams or AES encryption, PDF viewers are supposed
+ to ignore features in files that they don't support from newer
+ versions. This means that forcing the version to a lower version may
+ make it possible to open your PDF file with an older version, though
+ bear in mind that some of the original document's functionality may
+ be lost.
+
+By default, when a stream is encoded using non-lossy filters that qpdf
+understands and is not already compressed using a good compression
+scheme, qpdf will uncompress and recompress streams. Assuming proper
+filter implements, this is safe and generally results in smaller files.
+This behavior may also be explicitly requested with
+:samp:`--stream-data=compress`.
+
+When :samp:`--normalize-content=y` is specified, qpdf
+will attempt to normalize whitespace and newlines in page content
+streams. This is generally safe but could, in some cases, cause damage
+to the content streams. This option is intended for people who wish to
+study PDF content streams or to debug PDF content. You should not use
+this for "production" PDF files.
+
+When normalizing content, if qpdf runs into any lexical errors, it will
+print a warning indicating that content may be damaged. The only
+situation in which qpdf is known to cause damage during content
+normalization is when a page's contents are split across multiple
+streams and streams are split in the middle of a lexical token such as a
+string, name, or inline image. Note that files that do this are invalid
+since the PDF specification states that content streams are not to be
+split in the middle of a token. If you want to inspect the original
+content streams in an uncompressed format, you can always run with
+:samp:`--qdf --normalize-content=n` for a QDF file
+without content normalization, or alternatively
+:samp:`--stream-data=uncompress` for a regular non-QDF
+mode file with uncompressed streams. These will both uncompress all the
+streams but will not attempt to normalize content. Please note that if
+you are using content normalization or QDF mode for the purpose of
+manually inspecting files, you don't have to care about this.
+
+Object streams, also known as compressed objects, were introduced into
+the PDF specification at version 1.5, corresponding to Acrobat 6. Some
+older PDF viewers may not support files with object streams. qpdf can be
+used to transform files with object streams to files without object
+streams or vice versa. As mentioned above, there are three object stream
+modes: :samp:`preserve`,
+:samp:`disable`, and :samp:`generate`.
+
+In :samp:`preserve` mode, the relationship to objects
+and the streams that contain them is preserved from the original file.
+In :samp:`disable` mode, all objects are written as
+regular, uncompressed objects. The resulting file should be readable by
+older PDF viewers. (Of course, the content of the files may include
+features not supported by older viewers, but at least the structure will
+be supported.) In :samp:`generate` mode, qpdf will
+create its own object streams. This will usually result in more compact
+PDF files, though they may not be readable by older viewers. In this
+mode, qpdf will also make sure the PDF version number in the header is
+at least 1.5.
+
+The :samp:`--qdf` flag turns on QDF mode, which changes
+some of the defaults described above. Specifically, in QDF mode, by
+default, stream data is uncompressed, content streams are normalized,
+and encryption is removed. These defaults can still be overridden by
+specifying the appropriate options as described above. Additionally, in
+QDF mode, stream lengths are stored as indirect objects, objects are
+laid out in a less efficient but more readable fashion, and the
+documents are interspersed with comments that make it easier for the
+user to find things and also make it possible for
+:command:`fix-qdf` to work properly. QDF mode is intended
+for people, mostly developers, who wish to inspect or modify PDF files
+in a text editor. For details, please see :ref:`ref.qdf`.
+
+.. _ref.testing-options:
+
+Testing, Inspection, and Debugging Options
+------------------------------------------
+
+These options can be useful for digging into PDF files or for use in
+automated test suites for software that uses the qpdf library. When any
+of the options in this section are specified, no output file should be
+given. The following options are available:
+
+:samp:`--deterministic-id`
+ Causes generation of a deterministic value for /ID. This prevents use
+ of timestamp and output file name information in the /ID generation.
+ Instead, at some slight additional runtime cost, the /ID field is
+ generated to include a digest of the significant parts of the content
+ of the output PDF file. This means that a given qpdf operation should
+ generate the same /ID each time it is run, which can be useful when
+ caching results or for generation of some test data. Use of this flag
+ is not compatible with creation of encrypted files.
+
+:samp:`--static-id`
+ Causes generation of a fixed value for /ID. This is intended for
+ testing only. Never use it for production files. If you are trying to
+ get the same /ID each time for a given file and you are not
+ generating encrypted files, consider using the
+ :samp:`--deterministic-id` option.
+
+:samp:`--static-aes-iv`
+ Causes use of a static initialization vector for AES-CBC. This is
+ intended for testing only so that output files can be reproducible.
+ Never use it for production files. This option in particular is not
+ secure since it significantly weakens the encryption.
+
+:samp:`--no-original-object-ids`
+ Suppresses inclusion of original object ID comments in QDF files.
+ This can be useful when generating QDF files for test purposes,
+ particularly when comparing them to determine whether two PDF files
+ have identical content.
+
+:samp:`--show-encryption`
+ Shows document encryption parameters. Also shows the document's user
+ password if the owner password is given.
+
+:samp:`--show-encryption-key`
+ When encryption information is being displayed, as when
+ :samp:`--check` or
+ :samp:`--show-encryption` is given, display the
+ computed or retrieved encryption key as a hexadecimal string. This
+ value is not ordinarily useful to users, but it can be used as the
+ argument to :samp:`--password` if the
+ :samp:`--password-is-hex-key` is specified. Note
+ that, when PDF files are encrypted, passwords and other metadata are
+ used only to compute an encryption key, and the encryption key is
+ what is actually used for encryption. This enables retrieval of that
+ key.
+
+:samp:`--check-linearization`
+ Checks file integrity and linearization status.
+
+:samp:`--show-linearization`
+ Checks and displays all data in the linearization hint tables.
+
+:samp:`--show-xref`
+ Shows the contents of the cross-reference table in a human-readable
+ form. This is especially useful for files with cross-reference
+ streams which are stored in a binary format.
+
+:samp:`--show-object=trailer|obj[,gen]`
+ Show the contents of the given object. This is especially useful for
+ inspecting objects that are inside of object streams (also known as
+ "compressed objects").
+
+:samp:`--raw-stream-data`
+ When used along with the :samp:`--show-object`
+ option, if the object is a stream, shows the raw stream data instead
+ of object's contents.
+
+:samp:`--filtered-stream-data`
+ When used along with the :samp:`--show-object`
+ option, if the object is a stream, shows the filtered stream data
+ instead of object's contents. If the stream is filtered using filters
+ that qpdf does not support, an error will be issued.
+
+:samp:`--show-npages`
+ Prints the number of pages in the input file on a line by itself.
+ Since the number of pages appears by itself on a line, this option
+ can be useful for scripting if you need to know the number of pages
+ in a file.
+
+:samp:`--show-pages`
+ Shows the object and generation number for each page dictionary
+ object and for each content stream associated with the page. Having
+ this information makes it more convenient to inspect objects from a
+ particular page.
+
+:samp:`--with-images`
+ When used along with :samp:`--show-pages`, also shows
+ the object and generation numbers for the image objects on each page.
+ (At present, information about images in shared resource dictionaries
+ are not output by this command. This is discussed in a comment in the
+ source code.)
+
+:samp:`--json`
+ Generate a JSON representation of the file. This is described in
+ depth in :ref:`ref.json`
+
+:samp:`--json-help`
+ Describe the format of the JSON output.
+
+:samp:`--json-key=key`
+ This option is repeatable. If specified, only top-level keys
+ specified will be included in the JSON output. If not specified, all
+ keys will be shown.
+
+:samp:`--json-object=trailer|obj[,gen]`
+ This option is repeatable. If specified, only specified objects will
+ be shown in the "``objects``" key of the JSON output. If absent, all
+ objects will be shown.
+
+:samp:`--check`
+ Checks file structure and well as encryption, linearization, and
+ encoding of stream data. A file for which
+ :samp:`--check` reports no errors may still have
+ errors in stream data content but should otherwise be structurally
+ sound. If :samp:`--check` any errors, qpdf will exit
+ with a status of 2. There are some recoverable conditions that
+ :samp:`--check` detects. These are issued as warnings
+ instead of errors. If qpdf finds no errors but finds warnings, it
+ will exit with a status of 3 (as of version 2.0.4). When
+ :samp:`--check` is combined with other options,
+ checks are always performed before any other options are processed.
+ For erroneous files, :samp:`--check` will cause qpdf
+ to attempt to recover, after which other options are effectively
+ operating on the recovered file. Combining
+ :samp:`--check` with other options in this way can be
+ useful for manually recovering severely damaged files. Note that
+ :samp:`--check` produces no output to standard output
+ when everything is valid, so if you are using this to
+ programmatically validate files in bulk, it is safe to run without
+ output redirected to :file:`/dev/null` and just
+ check for a 0 exit code.
+
+The :samp:`--raw-stream-data` and
+:samp:`--filtered-stream-data` options are ignored
+unless :samp:`--show-object` is given. Either of these
+options will cause the stream data to be written to standard output. In
+order to avoid commingling of stream data with other output, it is
+recommend that these objects not be combined with other test/inspection
+options.
+
+If :samp:`--filtered-stream-data` is given and
+:samp:`--normalize-content=y` is also given, qpdf will
+attempt to normalize the stream data as if it is a page content stream.
+This attempt will be made even if it is not a page content stream, in
+which case it will produce unusable results.
+
+.. _ref.unicode-passwords:
+
+Unicode Passwords
+-----------------
+
+At the library API level, all methods that perform encryption and
+decryption interpret passwords as strings of bytes. It is up to the
+caller to ensure that they are appropriately encoded. Starting with qpdf
+version 8.4.0, qpdf will attempt to make this easier for you when
+interact with qpdf via its command line interface. The PDF specification
+requires passwords used to encrypt files with 40-bit or 128-bit
+encryption to be encoded with PDF Doc encoding. This encoding is a
+single-byte encoding that supports ISO-Latin-1 and a handful of other
+commonly used characters. It has a large overlap with Windows ANSI but
+is not exactly the same. There is generally not a way to provide PDF Doc
+encoded strings on the command line. As such, qpdf versions prior to
+8.4.0 would often create PDF files that couldn't be opened with other
+software when given a password with non-ASCII characters to encrypt a
+file with 40-bit or 128-bit encryption. Starting with qpdf 8.4.0, qpdf
+recognizes the encoding of the parameter and transcodes it as needed.
+The rest of this section provides the details about exactly how qpdf
+behaves. Most users will not need to know this information, but it might
+be useful if you have been working around qpdf's old behavior or if you
+are using qpdf to generate encrypted files for testing other PDF
+software.
+
+A note about Windows: when qpdf builds, it attempts to determine what it
+has to do to use ``wmain`` instead of ``main`` on Windows. The ``wmain``
+function is an alternative entry point that receives all arguments as
+UTF-16-encoded strings. When qpdf starts up this way, it converts all
+the strings to UTF-8 encoding and then invokes the regular main. This
+means that, as far as qpdf is concerned, it receives its command-line
+arguments with UTF-8 encoding, just as it would in any modern Linux or
+UNIX environment.
+
+If a file is being encrypted with 40-bit or 128-bit encryption and the
+supplied password is not a valid UTF-8 string, qpdf will fall back to
+the behavior of interpreting the password as a string of bytes. If you
+have old scripts that encrypt files by passing the output of
+:command:`iconv` to qpdf, you no longer need to do that,
+but if you do, qpdf should still work. The only exception would be for
+the extremely unlikely case of a password that is encoded with a
+single-byte encoding but also happens to be valid UTF-8. Such a password
+would contain strings of even numbers of characters that alternate
+between accented letters and symbols. In the extremely unlikely event
+that you are intentionally using such passwords and qpdf is thwarting
+you by interpreting them as UTF-8, you can use
+:samp:`--password-mode=bytes` to suppress qpdf's
+automatic behavior.
+
+The :samp:`--password-mode` option, as described earlier
+in this chapter, can be used to change qpdf's interpretation of supplied
+passwords. There are very few reasons to use this option. One would be
+the unlikely case described in the previous paragraph in which the
+supplied password happens to be valid UTF-8 but isn't supposed to be
+UTF-8. Your best bet would be just to provide the password as a valid
+UTF-8 string, but you could also use
+:samp:`--password-mode=bytes`. Another reason to use
+:samp:`--password-mode=bytes` would be to intentionally
+generate PDF files encrypted with passwords that are not properly
+encoded. The qpdf test suite does this to generate invalid files for the
+purpose of testing its password recovery capability. If you were trying
+to create intentionally incorrect files for a similar purposes, the
+:samp:`bytes` password mode can enable you to do this.
+
+When qpdf attempts to decrypt a file with a password that contains
+non-ASCII characters, it will generate a list of alternative passwords
+by attempting to interpret the password as each of a handful of
+different coding systems and then transcode them to the required format.
+This helps to compensate for the supplied password being given in the
+wrong coding system, such as would happen if you used the
+:command:`iconv` workaround that was previously needed.
+It also generates passwords by doing the reverse operation: translating
+from correct in incorrect encoding of the password. This would enable
+qpdf to decrypt files using passwords that were improperly encoded by
+whatever software encrypted the files, including older versions of qpdf
+invoked without properly encoded passwords. The combination of these two
+recovery methods should make qpdf transparently open most encrypted
+files with the password supplied correctly but in the wrong coding
+system. There are no real downsides to this behavior, but if you don't
+want qpdf to do this, you can use the
+:samp:`--suppress-password-recovery` option. One reason
+to do that is to ensure that you know the exact password that was used
+to encrypt the file.
+
+With these changes, qpdf now generates compliant passwords in most
+cases. There are still some exceptions. In particular, the PDF
+specification directs compliant writers to normalize Unicode passwords
+and to perform certain transformations on passwords with bidirectional
+text. Implementing this functionality requires using a real Unicode
+library like ICU. If a client application that uses qpdf wants to do
+this, the qpdf library will accept the resulting passwords, but qpdf
+will not perform these transformations itself. It is possible that this
+will be addressed in a future version of qpdf. The ``QPDFWriter``
+methods that enable encryption on the output file accept passwords as
+strings of bytes.
+
+Please note that the :samp:`--password-is-hex-key`
+option is unrelated to all this. This flag bypasses the normal process
+of going from password to encryption string entirely, allowing the raw
+encryption key to be specified directly. This is useful for forensic
+purposes or for brute-force recovery of files with unknown passwords.
diff --git a/manual/conf.py b/manual/conf.py
index fdfffe7f..be8357d6 100644
--- a/manual/conf.py
+++ b/manual/conf.py
@@ -11,4 +11,7 @@ project = 'QPDF'
copyright = '2005-2021, Jay Berkenbilt'
author = 'Jay Berkenbilt'
release = '10.4.0'
-html_theme = 'alabaster'
+html_theme = 'agogo'
+html_theme_options = {
+ "body_max_width": None,
+}
diff --git a/manual/design.rst b/manual/design.rst
new file mode 100644
index 00000000..73122943
--- /dev/null
+++ b/manual/design.rst
@@ -0,0 +1,747 @@
+.. _ref.design:
+
+Design and Library Notes
+========================
+
+.. _ref.design.intro:
+
+Introduction
+------------
+
+This section was written prior to the implementation of the qpdf package
+and was subsequently modified to reflect the implementation. In some
+cases, for purposes of explanation, it may differ slightly from the
+actual implementation. As always, the source code and test suite are
+authoritative. Even if there are some errors, this document should serve
+as a road map to understanding how this code works.
+
+In general, one should adhere strictly to a specification when writing
+but be liberal in reading. This way, the product of our software will be
+accepted by the widest range of other programs, and we will accept the
+widest range of input files. This library attempts to conform to that
+philosophy whenever possible but also aims to provide strict checking
+for people who want to validate PDF files. If you don't want to see
+warnings and are trying to write something that is tolerant, you can
+call ``setSuppressWarnings(true)``. If you want to fail on the first
+error, you can call ``setAttemptRecovery(false)``. The default behavior
+is to generating warnings for recoverable problems. Note that recovery
+will not always produce the desired results even if it is able to get
+through the file. Unlike most other PDF files that produce generic
+warnings such as "This file is damaged,", qpdf generally issues a
+detailed error message that would be most useful to a PDF developer.
+This is by design as there seems to be a shortage of PDF validation
+tools out there. This was, in fact, one of the major motivations behind
+the initial creation of qpdf.
+
+.. _ref.design-goals:
+
+Design Goals
+------------
+
+The QPDF package includes support for reading and rewriting PDF files.
+It aims to hide from the user details involving object locations,
+modified (appended) PDF files, the directness/indirectness of objects,
+and stream filters including encryption. It does not aim to hide
+knowledge of the object hierarchy or content stream contents. Put
+another way, a user of the qpdf library is expected to have knowledge
+about how PDF files work, but is not expected to have to keep track of
+bookkeeping details such as file positions.
+
+A user of the library never has to care whether an object is direct or
+indirect, though it is possible to determine whether an object is direct
+or not if this information is needed. All access to objects deals with
+this transparently. All memory management details are also handled by
+the library.
+
+The ``PointerHolder`` object is used internally by the library to deal
+with memory management. This is basically a smart pointer object very
+similar in spirit to C++-11's ``std::shared_ptr`` object, but predating
+it by several years. This library also makes use of a technique for
+giving fine-grained access to methods in one class to other classes by
+using public subclasses with friends and only private members that in
+turn call private methods of the containing class. See
+``QPDFObjectHandle::Factory`` as an example.
+
+The top-level qpdf class is ``QPDF``. A ``QPDF`` object represents a PDF
+file. The library provides methods for both accessing and mutating PDF
+files.
+
+The primary class for interacting with PDF objects is
+``QPDFObjectHandle``. Instances of this class can be passed around by
+value, copied, stored in containers, etc. with very low overhead.
+Instances of ``QPDFObjectHandle`` created by reading from a file will
+always contain a reference back to the ``QPDF`` object from which they
+were created. A ``QPDFObjectHandle`` may be direct or indirect. If
+indirect, the ``QPDFObject`` the ``PointerHolder`` initially points to
+is a null pointer. In this case, the first attempt to access the
+underlying ``QPDFObject`` will result in the ``QPDFObject`` being
+resolved via a call to the referenced ``QPDF`` instance. This makes it
+essentially impossible to make coding errors in which certain things
+will work for some PDF files and not for others based on which objects
+are direct and which objects are indirect.
+
+Instances of ``QPDFObjectHandle`` can be directly created and modified
+using static factory methods in the ``QPDFObjectHandle`` class. There
+are factory methods for each type of object as well as a convenience
+method ``QPDFObjectHandle::parse`` that creates an object from a string
+representation of the object. Existing instances of ``QPDFObjectHandle``
+can also be modified in several ways. See comments in
+:file:`QPDFObjectHandle.hh` for details.
+
+An instance of ``QPDF`` is constructed by using the class's default
+constructor. If desired, the ``QPDF`` object may be configured with
+various methods that change its default behavior. Then the
+``QPDF::processFile()`` method is passed the name of a PDF file, which
+permanently associates the file with that QPDF object. A password may
+also be given for access to password-protected files. QPDF does not
+enforce encryption parameters and will treat user and owner passwords
+equivalently. Either password may be used to access an encrypted file.
+``QPDF`` will allow recovery of a user password given an owner password.
+The input PDF file must be seekable. (Output files written by
+``QPDFWriter`` need not be seekable, even when creating linearized
+files.) During construction, ``QPDF`` validates the PDF file's header,
+and then reads the cross reference tables and trailer dictionaries. The
+``QPDF`` class keeps only the first trailer dictionary though it does
+read all of them so it can check the ``/Prev`` key. ``QPDF`` class users
+may request the root object and the trailer dictionary specifically. The
+cross reference table is kept private. Objects may then be requested by
+number of by walking the object tree.
+
+When a PDF file has a cross-reference stream instead of a
+cross-reference table and trailer, requesting the document's trailer
+dictionary returns the stream dictionary from the cross-reference stream
+instead.
+
+There are some convenience routines for very common operations such as
+walking the page tree and returning a vector of all page objects. For
+full details, please see the header files
+:file:`QPDF.hh` and
+:file:`QPDFObjectHandle.hh`. There are also some
+additional helper classes that provide higher level API functions for
+certain document constructions. These are discussed in :ref:`ref.helper-classes`.
+
+.. _ref.helper-classes:
+
+Helper Classes
+--------------
+
+QPDF version 8.1 introduced the concept of helper classes. Helper
+classes are intended to contain higher level APIs that allow developers
+to work with certain document constructs at an abstraction level above
+that of ``QPDFObjectHandle`` while staying true to qpdf's philosophy of
+not hiding document structure from the developer. As with qpdf in
+general, the goal is take away some of the more tedious bookkeeping
+aspects of working with PDF files, not to remove the need for the
+developer to understand how the PDF construction in question works. The
+driving factor behind the creation of helper classes was to allow the
+evolution of higher level interfaces in qpdf without polluting the
+interfaces of the main top-level classes ``QPDF`` and
+``QPDFObjectHandle``.
+
+There are two kinds of helper classes: *document* helpers and *object*
+helpers. Document helpers are constructed with a reference to a ``QPDF``
+object and provide methods for working with structures that are at the
+document level. Object helpers are constructed with an instance of a
+``QPDFObjectHandle`` and provide methods for working with specific types
+of objects.
+
+Examples of document helpers include ``QPDFPageDocumentHelper``, which
+contains methods for operating on the document's page trees, such as
+enumerating all pages of a document and adding and removing pages; and
+``QPDFAcroFormDocumentHelper``, which contains document-level methods
+related to interactive forms, such as enumerating form fields and
+creating mappings between form fields and annotations.
+
+Examples of object helpers include ``QPDFPageObjectHelper`` for
+performing operations on pages such as page rotation and some operations
+on content streams, ``QPDFFormFieldObjectHelper`` for performing
+operations related to interactive form fields, and
+``QPDFAnnotationObjectHelper`` for working with annotations.
+
+It is always possible to retrieve the underlying ``QPDF`` reference from
+a document helper and the underlying ``QPDFObjectHandle`` reference from
+an object helper. Helpers are designed to be helpers, not wrappers. The
+intention is that, in general, it is safe to freely intermix operations
+that use helpers with operations that use the underlying objects.
+Document and object helpers do not attempt to provide a complete
+interface for working with the things they are helping with, nor do they
+attempt to encapsulate underlying structures. They just provide a few
+methods to help with error-prone, repetitive, or complex tasks. In some
+cases, a helper object may cache some information that is expensive to
+gather. In such cases, the helper classes are implemented so that their
+own methods keep the cache consistent, and the header file will provide
+a method to invalidate the cache and a description of what kinds of
+operations would make the cache invalid. If in doubt, you can always
+discard a helper class and create a new one with the same underlying
+objects, which will ensure that you have discarded any stale
+information.
+
+By Convention, document helpers are called
+``QPDFSomethingDocumentHelper`` and are derived from
+``QPDFDocumentHelper``, and object helpers are called
+``QPDFSomethingObjectHelper`` and are derived from ``QPDFObjectHelper``.
+For details on specific helpers, please see their header files. You can
+find them by looking at
+:file:`include/qpdf/QPDF*DocumentHelper.hh` and
+:file:`include/qpdf/QPDF*ObjectHelper.hh`.
+
+In order to avoid creation of circular dependencies, the following
+general guidelines are followed with helper classes:
+
+- Core class interfaces do not know about helper classes. For example,
+ no methods of ``QPDF`` or ``QPDFObjectHandle`` will include helper
+ classes in their interfaces.
+
+- Interfaces of object helpers will usually not use document helpers in
+ their interfaces. This is because it is much more useful for document
+ helpers to have methods that return object helpers. Most operations
+ in PDF files start at the document level and go from there to the
+ object level rather than the other way around. It can sometimes be
+ useful to map back from object-level structures to document-level
+ structures. If there is a desire to do this, it will generally be
+ provided by a method in the document helper class.
+
+- Most of the time, object helpers don't know about other object
+ helpers. However, in some cases, one type of object may be a
+ container for another type of object, in which case it may make sense
+ for the outer object to know about the inner object. For example,
+ there are methods in the ``QPDFPageObjectHelper`` that know
+ ``QPDFAnnotationObjectHelper`` because references to annotations are
+ contained in page dictionaries.
+
+- Any helper or core library class may use helpers in their
+ implementations.
+
+Prior to qpdf version 8.1, higher level interfaces were added as
+"convenience functions" in either ``QPDF`` or ``QPDFObjectHandle``. For
+compatibility, older convenience functions for operating with pages will
+remain in those classes even as alternatives are provided in helper
+classes. Going forward, new higher level interfaces will be provided
+using helper classes.
+
+.. _ref.implementation-notes:
+
+Implementation Notes
+--------------------
+
+This section contains a few notes about QPDF's internal implementation,
+particularly around what it does when it first processes a file. This
+section is a bit of a simplification of what it actually does, but it
+could serve as a starting point to someone trying to understand the
+implementation. There is nothing in this section that you need to know
+to use the qpdf library.
+
+``QPDFObject`` is the basic PDF Object class. It is an abstract base
+class from which are derived classes for each type of PDF object.
+Clients do not interact with Objects directly but instead interact with
+``QPDFObjectHandle``.
+
+When the ``QPDF`` class creates a new object, it dynamically allocates
+the appropriate type of ``QPDFObject`` and immediately hands the pointer
+to an instance of ``QPDFObjectHandle``. The parser reads a token from
+the current file position. If the token is a not either a dictionary or
+array opener, an object is immediately constructed from the single token
+and the parser returns. Otherwise, the parser iterates in a special mode
+in which it accumulates objects until it finds a balancing closer.
+During this process, the "``R``" keyword is recognized and an indirect
+``QPDFObjectHandle`` may be constructed.
+
+The ``QPDF::resolve()`` method, which is used to resolve an indirect
+object, may be invoked from the ``QPDFObjectHandle`` class. It first
+checks a cache to see whether this object has already been read. If not,
+it reads the object from the PDF file and caches it. It the returns the
+resulting ``QPDFObjectHandle``. The calling object handle then replaces
+its ``PointerHolder<QDFObject>`` with the one from the newly returned
+``QPDFObjectHandle``. In this way, only a single copy of any direct
+object need exist and clients can access objects transparently without
+knowing caring whether they are direct or indirect objects.
+Additionally, no object is ever read from the file more than once. That
+means that only the portions of the PDF file that are actually needed
+are ever read from the input file, thus allowing the qpdf package to
+take advantage of this important design goal of PDF files.
+
+If the requested object is inside of an object stream, the object stream
+itself is first read into memory. Then the tokenizer reads objects from
+the memory stream based on the offset information stored in the stream.
+Those individual objects are cached, after which the temporary buffer
+holding the object stream contents are discarded. In this way, the first
+time an object in an object stream is requested, all objects in the
+stream are cached.
+
+The following example should clarify how ``QPDF`` processes a simple
+file.
+
+- Client constructs ``QPDF`` ``pdf`` and calls
+ ``pdf.processFile("a.pdf");``.
+
+- The ``QPDF`` class checks the beginning of
+ :file:`a.pdf` for a PDF header. It then reads the
+ cross reference table mentioned at the end of the file, ensuring that
+ it is looking before the last ``%%EOF``. After getting to ``trailer``
+ keyword, it invokes the parser.
+
+- The parser sees "``<<``", so it calls itself recursively in
+ dictionary creation mode.
+
+- In dictionary creation mode, the parser keeps accumulating objects
+ until it encounters "``>>``". Each object that is read is pushed onto
+ a stack. If "``R``" is read, the last two objects on the stack are
+ inspected. If they are integers, they are popped off the stack and
+ their values are used to construct an indirect object handle which is
+ then pushed onto the stack. When "``>>``" is finally read, the stack
+ is converted into a ``QPDF_Dictionary`` which is placed in a
+ ``QPDFObjectHandle`` and returned.
+
+- The resulting dictionary is saved as the trailer dictionary.
+
+- The ``/Prev`` key is searched. If present, ``QPDF`` seeks to that
+ point and repeats except that the new trailer dictionary is not
+ saved. If ``/Prev`` is not present, the initial parsing process is
+ complete.
+
+ If there is an encryption dictionary, the document's encryption
+ parameters are initialized.
+
+- The client requests root object. The ``QPDF`` class gets the value of
+ root key from trailer dictionary and returns it. It is an unresolved
+ indirect ``QPDFObjectHandle``.
+
+- The client requests the ``/Pages`` key from root
+ ``QPDFObjectHandle``. The ``QPDFObjectHandle`` notices that it is
+ indirect so it asks ``QPDF`` to resolve it. ``QPDF`` looks in the
+ object cache for an object with the root dictionary's object ID and
+ generation number. Upon not seeing it, it checks the cross reference
+ table, gets the offset, and reads the object present at that offset.
+ It stores the result in the object cache and returns the cached
+ result. The calling ``QPDFObjectHandle`` replaces its object pointer
+ with the one from the resolved ``QPDFObjectHandle``, verifies that it
+ a valid dictionary object, and returns the (unresolved indirect)
+ ``QPDFObject`` handle to the top of the Pages hierarchy.
+
+ As the client continues to request objects, the same process is
+ followed for each new requested object.
+
+.. _ref.casting:
+
+Casting Policy
+--------------
+
+This section describes the casting policy followed by qpdf's
+implementation. This is no concern to qpdf's end users and largely of no
+concern to people writing code that uses qpdf, but it could be of
+interest to people who are porting qpdf to a new platform or who are
+making modifications to the code.
+
+The C++ code in qpdf is free of old-style casts except where unavoidable
+(e.g. where the old-style cast is in a macro provided by a third-party
+header file). When there is a need for a cast, it is handled, in order
+of preference, by rewriting the code to avoid the need for a cast,
+calling ``const_cast``, calling ``static_cast``, calling
+``reinterpret_cast``, or calling some combination of the above. As a
+last resort, a compiler-specific ``#pragma`` may be used to suppress a
+warning that we don't want to fix. Examples may include suppressing
+warnings about the use of old-style casts in code that is shared between
+C and C++ code.
+
+The ``QIntC`` namespace, provided by
+:file:`include/qpdf/QIntC.hh`, implements safe
+functions for converting between integer types. These functions do range
+checking and throw a ``std::range_error``, which is subclass of
+``std::runtime_error``, if conversion from one integer type to another
+results in loss of information. There are many cases in which we have to
+move between different integer types because of incompatible integer
+types used in interoperable interfaces. Some are unavoidable, such as
+moving between sizes and offsets, and others are there because of old
+code that is too in entrenched to be fixable without breaking source
+compatibility and causing pain for users. QPDF is compiled with extra
+warnings to detect conversions with potential data loss, and all such
+cases should be fixed by either using a function from ``QIntC`` or a
+``static_cast``.
+
+When the intention is just to switch the type because of exchanging data
+between incompatible interfaces, use ``QIntC``. This is the usual case.
+However, there are some cases in which we are explicitly intending to
+use the exact same bit pattern with a different type. This is most
+common when switching between signed and unsigned characters. A lot of
+qpdf's code uses unsigned characters internally, but ``std::string`` and
+``char`` are signed. Using ``QIntC::to_char`` would be wrong for
+converting from unsigned to signed characters because a negative
+``char`` value and the corresponding ``unsigned char`` value greater
+than 127 *mean the same thing*. There are also
+cases in which we use ``static_cast`` when working with bit fields where
+we are not representing a numerical value but rather a bunch of bits
+packed together in some integer type. Also note that ``size_t`` and
+``long`` both typically differ between 32-bit and 64-bit environments,
+so sometimes an explicit cast may not be needed to avoid warnings on one
+platform but may be needed on another. A conversion with ``QIntC``
+should always be used when the types are different even if the
+underlying size is the same. QPDF's CI build builds on 32-bit and 64-bit
+platforms, and the test suite is very thorough, so it is hard to make
+any of the potential errors here without being caught in build or test.
+
+Non-const ``unsigned char*`` is used in the ``Pipeline`` interface. The
+pipeline interface has a ``write`` call that uses ``unsigned char*``
+without a ``const`` qualifier. The main reason for this is
+to support pipelines that make calls to third-party libraries, such as
+zlib, that don't include ``const`` in their interfaces. Unfortunately,
+there are many places in the code where it is desirable to have
+``const char*`` with pipelines. None of the pipeline implementations
+in qpdf
+currently modify the data passed to write, and doing so would be counter
+to the intent of ``Pipeline``, but there is nothing in the code to
+prevent this from being done. There are places in the code where
+``const_cast`` is used to remove the const-ness of pointers going into
+``Pipeline``\ s. This could theoretically be unsafe, but there is
+adequate testing to assert that it is safe and will remain safe in
+qpdf's code.
+
+.. _ref.encryption:
+
+Encryption
+----------
+
+Encryption is supported transparently by qpdf. When opening a PDF file,
+if an encryption dictionary exists, the ``QPDF`` object processes this
+dictionary using the password (if any) provided. The primary decryption
+key is computed and cached. No further access is made to the encryption
+dictionary after that time. When an object is read from a file, the
+object ID and generation of the object in which it is contained is
+always known. Using this information along with the stored encryption
+key, all stream and string objects are transparently decrypted. Raw
+encrypted objects are never stored in memory. This way, nothing in the
+library ever has to know or care whether it is reading an encrypted
+file.
+
+An interface is also provided for writing encrypted streams and strings
+given an encryption key. This is used by ``QPDFWriter`` when it rewrites
+encrypted files.
+
+When copying encrypted files, unless otherwise directed, qpdf will
+preserve any encryption in force in the original file. qpdf can do this
+with either the user or the owner password. There is no difference in
+capability based on which password is used. When 40 or 128 bit
+encryption keys are used, the user password can be recovered with the
+owner password. With 256 keys, the user and owner passwords are used
+independently to encrypt the actual encryption key, so while either can
+be used, the owner password can no longer be used to recover the user
+password.
+
+Starting with version 4.0.0, qpdf can read files that are not encrypted
+but that contain encrypted attachments, but it cannot write such files.
+qpdf also requires the password to be specified in order to open the
+file, not just to extract attachments, since once the file is open, all
+decryption is handled transparently. When copying files like this while
+preserving encryption, qpdf will apply the file's encryption to
+everything in the file, not just to the attachments. When decrypting the
+file, qpdf will decrypt the attachments. In general, when copying PDF
+files with multiple encryption formats, qpdf will choose the newest
+format. The only exception to this is that clear-text metadata will be
+preserved as clear-text if it is that way in the original file.
+
+One point of confusion some people have about encrypted PDF files is
+that encryption is not the same as password protection. Password
+protected files are always encrypted, but it is also possible to create
+encrypted files that do not have passwords. Internally, such files use
+the empty string as a password, and most readers try the empty string
+first to see if it works and prompt for a password only if the empty
+string doesn't work. Normally such files have an empty user password and
+a non-empty owner password. In that way, if the file is opened by an
+ordinary reader without specification of password, the restrictions
+specified in the encryption dictionary can be enforced. Most users
+wouldn't even realize such a file was encrypted. Since qpdf always
+ignores the restrictions (except for the purpose of reporting what they
+are), qpdf doesn't care which password you use. QPDF will allow you to
+create PDF files with non-empty user passwords and empty owner
+passwords. Some readers will require a password when you open these
+files, and others will open the files without a password and not enforce
+restrictions. Having a non-empty user password and an empty owner
+password doesn't really make sense because it would mean that opening
+the file with the user password would be more restrictive than not
+supplying a password at all. QPDF also allows you to create PDF files
+with the same password as both the user and owner password. Some readers
+will not ever allow such files to be accessed without restrictions
+because they never try the password as the owner password if it works as
+the user password. Nonetheless, one of the powerful aspects of qpdf is
+that it allows you to finely specify the way encrypted files are
+created, even if the results are not useful to some readers. One use
+case for this would be for testing a PDF reader to ensure that it
+handles odd configurations of input files.
+
+.. _ref.random-numbers:
+
+Random Number Generation
+------------------------
+
+QPDF generates random numbers to support generation of encrypted data.
+Starting in qpdf 10.0.0, qpdf uses the crypto provider as its source of
+random numbers. Older versions used the OS-provided source of secure
+random numbers or, if allowed at build time, insecure random numbers
+from stdlib. Starting with version 5.1.0, you can disable use of
+OS-provided secure random numbers at build time. This is especially
+useful on Windows if you want to avoid a dependency on Microsoft's
+cryptography API. You can also supply your own random data provider. For
+details on how to do this, please refer to the top-level README.md file
+in the source distribution and to comments in
+:file:`QUtil.hh`.
+
+.. _ref.adding-and-remove-pages:
+
+Adding and Removing Pages
+-------------------------
+
+While qpdf's API has supported adding and modifying objects for some
+time, version 3.0 introduces specific methods for adding and removing
+pages. These are largely convenience routines that handle two tricky
+issues: pushing inheritable resources from the ``/Pages`` tree down to
+individual pages and manipulation of the ``/Pages`` tree itself. For
+details, see ``addPage`` and surrounding methods in
+:file:`QPDF.hh`.
+
+.. _ref.reserved-objects:
+
+Reserving Object Numbers
+------------------------
+
+Version 3.0 of qpdf introduced the concept of reserved objects. These
+are seldom needed for ordinary operations, but there are cases in which
+you may want to add a series of indirect objects with references to each
+other to a ``QPDF`` object. This causes a problem because you can't
+determine the object ID that a new indirect object will have until you
+add it to the ``QPDF`` object with ``QPDF::makeIndirectObject``. The
+only way to add two mutually referential objects to a ``QPDF`` object
+prior to version 3.0 would be to add the new objects first and then make
+them refer to each other after adding them. Now it is possible to create
+a *reserved object* using
+``QPDFObjectHandle::newReserved``. This is an indirect object that stays
+"unresolved" even if it is queried for its type. So now, if you want to
+create a set of mutually referential objects, you can create
+reservations for each one of them and use those reservations to
+construct the references. When finished, you can call
+``QPDF::replaceReserved`` to replace the reserved objects with the real
+ones. This functionality will never be needed by most applications, but
+it is used internally by QPDF when copying objects from other PDF files,
+as discussed in :ref:`ref.foreign-objects`. For an example of how to use reserved
+objects, search for ``newReserved`` in
+:file:`test_driver.cc` in qpdf's sources.
+
+.. _ref.foreign-objects:
+
+Copying Objects From Other PDF Files
+------------------------------------
+
+Version 3.0 of qpdf introduced the ability to copy objects into a
+``QPDF`` object from a different ``QPDF`` object, which we refer to as
+*foreign objects*. This allows arbitrary
+merging of PDF files. The "from" ``QPDF`` object must remain valid after
+the copy as discussed in the note below. The
+:command:`qpdf` command-line tool provides limited
+support for basic page selection, including merging in pages from other
+files, but the library's API makes it possible to implement arbitrarily
+complex merging operations. The main method for copying foreign objects
+is ``QPDF::copyForeignObject``. This takes an indirect object from
+another ``QPDF`` and copies it recursively into this object while
+preserving all object structure, including circular references. This
+means you can add a direct object that you create from scratch to a
+``QPDF`` object with ``QPDF::makeIndirectObject``, and you can add an
+indirect object from another file with ``QPDF::copyForeignObject``. The
+fact that ``QPDF::makeIndirectObject`` does not automatically detect a
+foreign object and copy it is an explicit design decision. Copying a
+foreign object seems like a sufficiently significant thing to do that it
+should be done explicitly.
+
+The other way to copy foreign objects is by passing a page from one
+``QPDF`` to another by calling ``QPDF::addPage``. In contrast to
+``QPDF::makeIndirectObject``, this method automatically distinguishes
+between indirect objects in the current file, foreign objects, and
+direct objects.
+
+Please note: when you copy objects from one ``QPDF`` to another, the
+source ``QPDF`` object must remain valid until you have finished with
+the destination object. This is because the original object is still
+used to retrieve any referenced stream data from the copied object.
+
+.. _ref.rewriting:
+
+Writing PDF Files
+-----------------
+
+The qpdf library supports file writing of ``QPDF`` objects to PDF files
+through the ``QPDFWriter`` class. The ``QPDFWriter`` class has two
+writing modes: one for non-linearized files, and one for linearized
+files. See :ref:`ref.linearization` for a description of
+linearization is implemented. This section describes how we write
+non-linearized files including the creation of QDF files (see :ref:`ref.qdf`.
+
+This outline was written prior to implementation and is not exactly
+accurate, but it provides a correct "notional" idea of how writing
+works. Look at the code in ``QPDFWriter`` for exact details.
+
+- Initialize state:
+
+ - next object number = 1
+
+ - object queue = empty
+
+ - renumber table: old object id/generation to new id/0 = empty
+
+ - xref table: new id -> offset = empty
+
+- Create a QPDF object from a file.
+
+- Write header for new PDF file.
+
+- Request the trailer dictionary.
+
+- For each value that is an indirect object, grab the next object
+ number (via an operation that returns and increments the number). Map
+ object to new number in renumber table. Push object onto queue.
+
+- While there are more objects on the queue:
+
+ - Pop queue.
+
+ - Look up object's new number *n* in the renumbering table.
+
+ - Store current offset into xref table.
+
+ - Write ``:samp:`{n}` 0 obj``.
+
+ - If object is null, whether direct or indirect, write out null,
+ thus eliminating unresolvable indirect object references.
+
+ - If the object is a stream stream, write stream contents, piped
+ through any filters as required, to a memory buffer. Use this
+ buffer to determine the stream length.
+
+ - If object is not a stream, array, or dictionary, write out its
+ contents.
+
+ - If object is an array or dictionary (including stream), traverse
+ its elements (for array) or values (for dictionaries), handling
+ recursive dictionaries and arrays, looking for indirect objects.
+ When an indirect object is found, if it is not resolvable, ignore.
+ (This case is handled when writing it out.) Otherwise, look it up
+ in the renumbering table. If not found, grab the next available
+ object number, assign to the referenced object in the renumbering
+ table, and push the referenced object onto the queue. As a special
+ case, when writing out a stream dictionary, replace length,
+ filters, and decode parameters as required.
+
+ Write out dictionary or array, replacing any unresolvable indirect
+ object references with null (pdf spec says reference to
+ non-existent object is legal and resolves to null) and any
+ resolvable ones with references to the renumbered objects.
+
+ - If the object is a stream, write ``stream\n``, the stream contents
+ (from the memory buffer), and ``\nendstream\n``.
+
+ - When done, write ``endobj``.
+
+Once we have finished the queue, all referenced objects will have been
+written out and all deleted objects or unreferenced objects will have
+been skipped. The new cross-reference table will contain an offset for
+every new object number from 1 up to the number of objects written. This
+can be used to write out a new xref table. Finally we can write out the
+trailer dictionary with appropriately computed /ID (see spec, 8.3, File
+Identifiers), the cross reference table offset, and ``%%EOF``.
+
+.. _ref.filtered-streams:
+
+Filtered Streams
+----------------
+
+Support for streams is implemented through the ``Pipeline`` interface
+which was designed for this package.
+
+When reading streams, create a series of ``Pipeline`` objects. The
+``Pipeline`` abstract base requires implementation ``write()`` and
+``finish()`` and provides an implementation of ``getNext()``. Each
+pipeline object, upon receiving data, does whatever it is going to do
+and then writes the data (possibly modified) to its successor.
+Alternatively, a pipeline may be an end-of-the-line pipeline that does
+something like store its output to a file or a memory buffer ignoring a
+successor. For additional details, look at
+:file:`Pipeline.hh`.
+
+``QPDF`` can read raw or filtered streams. When reading a filtered
+stream, the ``QPDF`` class creates a ``Pipeline`` object for one of each
+appropriate filter object and chains them together. The last filter
+should write to whatever type of output is required. The ``QPDF`` class
+has an interface to write raw or filtered stream contents to a given
+pipeline.
+
+.. _ref.object-accessors:
+
+Object Accessor Methods
+-----------------------
+
+..
+ This section is referenced in QPDFObjectHandle.hh
+
+For general information about how to access instances of
+``QPDFObjectHandle``, please see the comments in
+:file:`QPDFObjectHandle.hh`. Search for "Accessor
+methods". This section provides a more in-depth discussion of the
+behavior and the rationale for the behavior.
+
+*Why were type errors made into warnings?* When type checks were
+introduced into qpdf in the early days, it was expected that type errors
+would only occur as a result of programmer error. However, in practice,
+type errors would occur with malformed PDF files because of assumptions
+made in code, including code within the qpdf library and code written by
+library users. The most common case would be chaining calls to
+``getKey()`` to access keys deep within a dictionary. In many cases,
+qpdf would be able to recover from these situations, but the old
+behavior often resulted in crashes rather than graceful recovery. For
+this reason, the errors were changed to warnings.
+
+*Why even warn about type errors when the user can't usually do anything
+about them?* Type warnings are extremely valuable during development.
+Since it's impossible to catch at compile time things like typos in
+dictionary key names or logic errors around what the structure of a PDF
+file might be, the presence of type warnings can save lots of developer
+time. They have also proven useful in exposing issues in qpdf itself
+that would have otherwise gone undetected.
+
+*Can there be a type-safe ``QPDFObjectHandle``?* It would be great if
+``QPDFObjectHandle`` could be more strongly typed so that you'd have to
+have check that something was of a particular type before calling
+type-specific accessor methods. However, implementing this at this stage
+of the library's history would be quite difficult, and it would make a
+the common pattern of drilling into an object no longer work. While it
+would be possible to have a parallel interface, it would create a lot of
+extra code. If qpdf were written in a language like rust, an interface
+like this would make a lot of sense, but, for a variety of reasons, the
+qpdf API is consistent with other APIs of its time, relying on exception
+handling to catch errors. The underlying PDF objects are inherently not
+type-safe. Forcing stronger type safety in ``QPDFObjectHandle`` would
+ultimately cause a lot more code to have to be written and would like
+make software that uses qpdf more brittle, and even so, checks would
+have to occur at runtime.
+
+*Why do type errors sometimes raise exceptions?* The way warnings work
+in qpdf requires a ``QPDF`` object to be associated with an object
+handle for a warning to be issued. It would be nice if this could be
+fixed, but it would require major changes to the API. Rather than
+throwing away these conditions, we convert them to exceptions. It's not
+that bad though. Since any object handle that was read from a file has
+an associated ``QPDF`` object, it would only be type errors on objects
+that were created explicitly that would cause exceptions, and in that
+case, type errors are much more likely to be the result of a coding
+error than invalid input.
+
+*Why does the behavior of a type exception differ between the C and C++
+API?* There is no way to throw and catch exceptions in C short of
+something like ``setjmp`` and ``longjmp``, and that approach is not
+portable across language barriers. Since the C API is often used from
+other languages, it's important to keep things as simple as possible.
+Starting in qpdf 10.5, exceptions that used to crash code using the C
+API will be written to stderr by default, and it is possible to register
+an error handler. There's no reason that the error handler can't
+simulate exception handling in some way, such as by using ``setjmp`` and
+``longjmp`` or by setting some variable that can be checked after
+library calls are made. In retrospect, it might have been better if the
+C API object handle methods returned error codes like the other methods
+and set return values in passed-in pointers, but this would complicate
+both the implementation and the use of the library for a case that is
+actually quite rare and largely avoidable.
diff --git a/manual/index.rst b/manual/index.rst
index 3adef192..0ffdd9b2 100644
--- a/manual/index.rst
+++ b/manual/index.rst
@@ -9,6261 +9,16 @@ QPDF version |release|
:maxdepth: 2
:caption: Contents:
-.. _ref.overview:
-
-What is QPDF?
-=============
-
-QPDF is a program and C++ library for structural, content-preserving
-transformations on PDF files. QPDF's website is located at
-https://qpdf.sourceforge.io/. QPDF's source code is hosted on github
-at https://github.com/qpdf/qpdf.
-
-QPDF provides many useful capabilities to developers of PDF-producing
-software or for people who just want to look at the innards of a PDF
-file to learn more about how they work. With QPDF, it is possible to
-copy objects from one PDF file into another and to manipulate the list
-of pages in a PDF file. This makes it possible to merge and split PDF
-files. The QPDF library also makes it possible for you to create PDF
-files from scratch. In this mode, you are responsible for supplying
-all the contents of the file, while the QPDF library takes care of all
-the syntactical representation of the objects, creation of cross
-references tables and, if you use them, object streams, encryption,
-linearization, and other syntactic details. You are still responsible
-for generating PDF content on your own.
-
-QPDF has been designed with very few external dependencies, and it is
-intentionally very lightweight. QPDF is *not* a PDF content creation
-library, a PDF viewer, or a program capable of converting PDF into other
-formats. In particular, QPDF knows nothing about the semantics of PDF
-content streams. If you are looking for something that can do that, you
-should look elsewhere. However, once you have a valid PDF file, QPDF can
-be used to transform that file in ways that perhaps your original PDF
-creation tool can't handle. For example, many programs generate simple PDF
-files but can't password-protect them, web-optimize them, or perform
-other transformations of that type.
-
-.. _ref.license:
-
-License
-=======
-
-QPDF is licensed under `the Apache License, Version 2.0
-<http://www.apache.org/licenses/LICENSE-2.0>`__ (the "License").
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
-implied. See the License for the specific language governing
-permissions and limitations under the License.
-
-.. _ref.installing:
-
-Building and Installing QPDF
-============================
-
-This chapter describes how to build and install qpdf. Please see also
-the :file:`README.md` and
-:file:`INSTALL` files in the source distribution.
-
-.. _ref.prerequisites:
-
-System Requirements
--------------------
-
-The qpdf package has few external dependencies. In order to build qpdf,
-the following packages are required:
-
-- A C++ compiler that supports C++-14.
-
-- zlib: http://www.zlib.net/
-
-- jpeg: http://www.ijg.org/files/ or https://libjpeg-turbo.org/
-
-- *Recommended but not required:* gnutls: https://www.gnutls.org/ to be
- able to use the gnutls crypto provider, and/or openssl:
- https://openssl.org/ to be able to use the openssl crypto provider.
-
-- gnu make 3.81 or newer: http://www.gnu.org/software/make
-
-- perl version 5.8 or newer: http://www.perl.org/; required for running
- the test suite. Starting with qpdf version 9.1.1, perl is no longer
- required at runtime.
-
-- GNU diffutils (any version): http://www.gnu.org/software/diffutils/
- is required to run the test suite. Note that this is the version of
- diff present on virtually all GNU/Linux systems. This is required
- because the test suite uses :command:`diff -u`.
-
-Part of qpdf's test suite does comparisons of the contents PDF files by
-converting them images and comparing the images. The image comparison
-tests are disabled by default. Those tests are not required for
-determining correctness of a qpdf build if you have not modified the
-code since the test suite also contains expected output files that are
-compared literally. The image comparison tests provide an extra check to
-make sure that any content transformations don't break the rendering of
-pages. Transformations that affect the content streams themselves are
-off by default and are only provided to help developers look into the
-contents of PDF files. If you are making deep changes to the library
-that cause changes in the contents of the files that qpdf generate,
-then you should enable the image comparison tests. Enable them by
-running :command:`configure` with the
-:samp:`--enable-test-compare-images` flag. If you enable
-this, the following additional requirements are required by the test
-suite. Note that in no case are these items required to use qpdf.
-
-- libtiff: http://www.remotesensing.org/libtiff/
-
-- GhostScript version 8.60 or newer: http://www.ghostscript.com
-
-If you do not enable this, then you do not need to have tiff and
-ghostscript.
-
-Pre-built documentation is distributed with qpdf, so you should
-generally not need to rebuild the documentation. In order to build the
-documentation from source, you need to install `Sphinx
-<https://sphinx-doc.org>`__. To build the PDF version of the
-documentation, you need `pdflatex`, `latexmk`, and a fairly complete
-LaTeX installation. Detailed requirements can be found in the Sphinx
-documentation.
-
-.. _ref.building:
-
-Build Instructions
-------------------
-
-Building qpdf on UNIX is generally just a matter of running
-
-::
-
- ./configure
- make
-
-You can also run :command:`make check` to run the test
-suite and :command:`make install` to install. Please run
-:command:`./configure --help` for options on what can be
-configured. You can also set the value of ``DESTDIR`` during
-installation to install to a temporary location, as is common with many
-open source packages. Please see also the
-:file:`README.md` and
-:file:`INSTALL` files in the source distribution.
-
-Building on Windows is a little bit more complicated. For details,
-please see :file:`README-windows.md` in the source
-distribution. You can also download a binary distribution for Windows.
-There is a port of qpdf to Visual C++ version 6 in the
-:file:`contrib` area generously contributed by Jian
-Ma. This is also discussed in more detail in
-:file:`README-windows.md`.
-
-While ``wchar_t`` is part of the C++ standard, qpdf uses it in only one
-place in the public API, and it's just in a helper function. It is
-possible to build qpdf on a system that doesn't have ``wchar_t``, and
-it's also possible to compile a program that uses qpdf on a system
-without ``wchar_t`` as long as you don't call that one method. This is a
-very unusual situation. For a detailed discussion, please see the
-top-level README.md file in qpdf's source distribution.
-
-There are some other things you can do with the build. Although qpdf
-uses :command:`autoconf`, it does not use
-:command:`automake` but instead uses a
-hand-crafted non-recursive Makefile that requires gnu make. If you're
-really interested, please read the comments in the top-level
-:file:`Makefile`.
-
-.. _ref.crypto:
-
-Crypto Providers
-----------------
-
-Starting with qpdf 9.1.0, the qpdf library can be built with multiple
-implementations of providers of cryptographic functions, which we refer
-to as "crypto providers." At the time of writing, a crypto
-implementation must provide MD5 and SHA2 (256, 384, and 512-bit) hashes
-and RC4 and AES256 with and without CBC encryption. In the future, if
-digital signature is added to qpdf, there may be additional requirements
-beyond this.
-
-Starting with qpdf version 9.1.0, the available implementations are
-``native`` and ``gnutls``. In qpdf 10.0.0, ``openssl`` was added.
-Additional implementations may be added if needed. It is also possible
-for a developer to provide their own implementation without modifying
-the qpdf library.
-
-.. _ref.crypto.build:
-
-Build Support For Crypto Providers
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-When building with qpdf's build system, crypto providers can be enabled
-at build time using various :command:`./configure`
-options. The default behavior is for
-:command:`./configure` to discover which crypto providers
-can be supported based on available external libraries, to build all
-available crypto providers, and to use an external provider as the
-default over the native one. This behavior can be changed with the
-following flags to :command:`./configure`:
-
-- :samp:`--enable-crypto-{x}`
- (where :samp:`{x}` is a supported crypto
- provider): enable the :samp:`{x}` crypto
- provider, requiring any external dependencies it needs
-
-- :samp:`--disable-crypto-{x}`:
- disable the :samp:`{x}` provider, and do not
- link against its dependencies even if they are available
-
-- :samp:`--with-default-crypto={x}`:
- make :samp:`{x}` the default provider even if
- a higher priority one is available
-
-- :samp:`--disable-implicit-crypto`: only build crypto
- providers that are explicitly requested with an
- :samp:`--enable-crypto-{x}`
- option
-
-For example, if you want to guarantee that the gnutls crypto provider is
-used and that the native provider is not built, you could run
-:command:`./configure --enable-crypto-gnutls
---disable-implicit-crypto`.
-
-If you build qpdf using your own build system, in order for qpdf to work
-at all, you need to enable at least one crypto provider. The file
-:file:`libqpdf/qpdf/qpdf-config.h.in` provides
-macros ``DEFAULT_CRYPTO``, whose value must be a string naming the
-default crypto provider, and various symbols starting with
-``USE_CRYPTO_``, at least one of which has to be enabled. Additionally,
-you must compile the source files that implement a crypto provider. To
-get a list of those files, look at
-:file:`libqpdf/build.mk`. If you want to omit a
-particular crypto provider, as long as its ``USE_CRYPTO_`` symbol is
-undefined, you can completely ignore the source files that belong to a
-particular crypto provider. Additionally, crypto providers may have
-their own external dependencies that can be omitted if the crypto
-provider is not used. For example, if you are building qpdf yourself and
-are using an environment that does not support gnutls or openssl, you
-can ensure that ``USE_CRYPTO_NATIVE`` is defined, ``USE_CRYPTO_GNUTLS``
-is not defined, and ``DEFAULT_CRYPTO`` is defined to ``"native"``. Then
-you must include the source files used in the native implementation,
-some of which were added or renamed from earlier versions, to your
-build, and you can ignore
-:file:`QPDFCrypto_gnutls.cc`. Always consult
-:file:`libqpdf/build.mk` to get the list of source
-files you need to build.
-
-.. _ref.crypto.runtime:
-
-Runtime Crypto Provider Selection
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-You can use the :samp:`--show-crypto` option to
-:command:`qpdf` to get a list of available crypto
-providers. The default provider is always listed first, and the rest are
-listed in lexical order. Each crypto provider is listed on a line by
-itself with no other text, enabling the output of this command to be
-used easily in scripts.
-
-You can override which crypto provider is used by setting the
-``QPDF_CRYPTO_PROVIDER`` environment variable. There are few reasons to
-ever do this, but you might want to do it if you were explicitly trying
-to compare behavior of two different crypto providers while testing
-performance or reproducing a bug. It could also be useful for people who
-are implementing their own crypto providers.
-
-.. _ref.crypto.develop:
-
-Crypto Provider Information for Developers
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-If you are writing code that uses libqpdf and you want to force a
-certain crypto provider to be used, you can call the method
-``QPDFCryptoProvider::setDefaultProvider``. The argument is the name of
-a built-in or developer-supplied provider. To add your own crypto
-provider, you have to create a class derived from ``QPDFCryptoImpl`` and
-register it with ``QPDFCryptoProvider``. For additional information, see
-comments in :file:`include/qpdf/QPDFCryptoImpl.hh`.
-
-.. _ref.crypto.design:
-
-Crypto Provider Design Notes
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-This section describes a few bits of rationale for why the crypto
-provider interface was set up the way it was. You don't need to know any
-of this information, but it's provided for the record and in case it's
-interesting.
-
-As a general rule, I want to avoid as much as possible including large
-blocks of code that are conditionally compiled such that, in most
-builds, some code is never built. This is dangerous because it makes it
-very easy for invalid code to creep in unnoticed. As such, I want it to
-be possible to build qpdf with all available crypto providers, and this
-is the way I build qpdf for local development. At the same time, if a
-particular packager feels that it is a security liability for qpdf to
-use crypto functionality from other than a library that gets
-considerable scrutiny for this specific purpose (such as gnutls,
-openssl, or nettle), then I want to give that packager the ability to
-completely disable qpdf's native implementation. Or if someone wants to
-avoid adding a dependency on one of the external crypto providers, I
-don't want the availability of the provider to impose additional
-external dependencies within that environment. Both of these are
-situations that I know to be true for some users of qpdf.
-
-I want registration and selection of crypto providers to be thread-safe,
-and I want it to work deterministically for a developer to provide their
-own crypto provider and be able to set it up as the default. This was
-the primary motivation behind requiring C++-11 as doing so enabled me to
-exploit the guaranteed thread safety of local block static
-initialization. The ``QPDFCryptoProvider`` class uses a singleton
-pattern with thread-safe initialization to create the singleton instance
-of ``QPDFCryptoProvider`` and exposes only static methods in its public
-interface. In this way, if a developer wants to call any
-``QPDFCryptoProvider`` methods, the library guarantees the
-``QPDFCryptoProvider`` is fully initialized and all built-in crypto
-providers are registered. Making ``QPDFCryptoProvider`` actually know
-about all the built-in providers may seem a bit sad at first, but this
-choice makes it extremely clear exactly what the initialization behavior
-is. There's no question about provider implementations automatically
-registering themselves in a nondeterministic order. It also means that
-implementations do not need to know anything about the provider
-interface, which makes them easier to test in isolation. Another
-advantage of this approach is that a developer who wants to develop
-their own crypto provider can do so in complete isolation from the qpdf
-library and, with just two calls, can make qpdf use their provider in
-their application. If they decided to contribute their code, plugging it
-into the qpdf library would require a very small change to qpdf's source
-code.
-
-The decision to make the crypto provider selectable at runtime was one I
-struggled with a little, but I decided to do it for various reasons.
-Allowing an end user to switch crypto providers easily could be very
-useful for reproducing a potential bug. If a user reports a bug that
-some cryptographic thing is broken, I can easily ask that person to try
-with the ``QPDF_CRYPTO_PROVIDER`` variable set to different values. The
-same could apply in the event of a performance problem. This also makes
-it easier for qpdf's own test suite to exercise code with different
-providers without having to make every program that links with qpdf
-aware of the possibility of multiple providers. In qpdf's continuous
-integration environment, the entire test suite is run for each supported
-crypto provider. This is made simple by being able to select the
-provider using an environment variable.
-
-Finally, making crypto providers selectable in this way establish a
-pattern that I may follow again in the future for stream filter
-providers. One could imagine a future enhancement where someone could
-provide their own implementations for basic filters like
-``/FlateDecode`` or for other filters that qpdf doesn't support.
-Implementing the registration functions and internal storage of
-registered providers was also easier using C++-11's functional
-interfaces, which was another reason to require C++-11 at this time.
-
-.. _ref.packaging:
-
-Notes for Packagers
--------------------
-
-If you are packaging qpdf for an operating system distribution, here are
-some things you may want to keep in mind:
-
-- Starting in qpdf version 9.1.1, qpdf no longer has a runtime
- dependency on perl. This is because fix-qdf was rewritten in C++.
- However, qpdf still has a build-time dependency on perl.
-
-- Make sure you are getting the intended behavior with regard to crypto
- providers. Read :ref:`ref.crypto.build` for details.
-
-- Passing :samp:`--enable-show-failed-test-output` to
- :command:`./configure` will cause any failed test
- output to be written to the console. This can be very useful for
- seeing test failures generated by autobuilders where you can't access
- qtest.log after the fact.
-
-- If qpdf's build environment detects the presence of autoconf and
- related tools, it will check to ensure that automatically generated
- files are up-to-date with recorded checksums and fail if it detects a
- discrepancy. This feature is intended to prevent you from
- accidentally forgetting to regenerate automatic files after modifying
- their sources. If your packaging environment automatically refreshes
- automatic files, it can cause this check to fail. Suppress qpdf's
- checks by passing :samp:`--disable-check-autofiles`
- to :command:`/.configure`. This is safe since qpdf's
- :command:`autogen.sh` just runs autotools in the
- normal way.
-
-- QPDF's :command:`make install` does not install
- completion files by default, but as a packager, it's good if you
- install them wherever your distribution expects such files to go. You
- can find completion files to install in the
- :file:`completions` directory.
-
-- Packagers are encouraged to install the source files from the
- :file:`examples` directory along with qpdf
- development packages.
-
-.. _ref.using:
-
-Running QPDF
-============
-
-This chapter describes how to run the qpdf program from the command
-line.
-
-.. _ref.invocation:
-
-Basic Invocation
-----------------
-
-When running qpdf, the basic invocation is as follows:
-
-::
-
- qpdf [ options ] { infilename | --empty } outfilename
-
-This converts PDF file :samp:`infilename` to PDF file
-:samp:`outfilename`. The output file is functionally
-identical to the input file but may have been structurally reorganized.
-Also, orphaned objects will be removed from the file. Many
-transformations are available as controlled by the options below. In
-place of :samp:`infilename`, the parameter
-:samp:`--empty` may be specified. This causes qpdf to
-use a dummy input file that contains zero pages. The only normal use
-case for using :samp:`--empty` would be if you were
-going to add pages from another source, as discussed in :ref:`ref.page-selection`.
-
-If :samp:`@filename` appears as a word anywhere in the
-command-line, it will be read line by line, and each line will be
-treated as a command-line argument. Leading and trailing whitespace is
-intentionally not removed from lines, which makes it possible to handle
-arguments that start or end with spaces. The :samp:`@-`
-option allows arguments to be read from standard input. This allows qpdf
-to be invoked with an arbitrary number of arbitrarily long arguments. It
-is also very useful for avoiding having to pass passwords on the command
-line. Note that the :samp:`@filename` can't appear in
-the middle of an argument, so constructs such as
-:samp:`--arg=@option` will not work. You would have to
-include the argument and its options together in the arguments file.
-
-:samp:`outfilename` does not have to be seekable, even
-when generating linearized files. Specifying ":samp:`-`"
-as :samp:`outfilename` means to write to standard
-output. If you want to overwrite the input file with the output, use the
-option :samp:`--replace-input` and omit the output file
-name. You can't specify the same file as both the input and the output.
-If you do this, qpdf will tell you about the
-:samp:`--replace-input` option.
-
-Most options require an output file, but some testing or inspection
-commands do not. These are specifically noted.
-
-.. _ref.exit-status:
-
-Exit Status
-~~~~~~~~~~~
-
-The exit status of :command:`qpdf` may be interpreted as
-follows:
-
-- ``0``: no errors or warnings were found. The file may still have
- problems qpdf can't detect. If
- :samp:`--warning-exit-0` was specified, exit status 0
- is used even if there are warnings.
-
-- ``2``: errors were found. qpdf was not able to fully process the
- file.
-
-- ``3``: qpdf encountered problems that it was able to recover from. In
- some cases, the resulting file may still be damaged. Note that qpdf
- still exits with status ``3`` if it finds warnings even when
- :samp:`--no-warn` is specified. With
- :samp:`--warning-exit-0`, warnings without errors
- exit with status 0 instead of 3.
-
-Note that :command:`qpdf` never exists with status ``1``.
-If you get an exit status of ``1``, it was something else, like the
-shell not being able to find or execute :command:`qpdf`.
-
-.. _ref.shell-completion:
-
-Shell Completion
-----------------
-
-Starting in qpdf version 8.3.0, qpdf provides its own completion support
-for zsh and bash. You can enable bash completion with :command:`eval
-$(qpdf --completion-bash)` and zsh completion with
-:command:`eval $(qpdf --completion-zsh)`. If
-:command:`qpdf` is not in your path, you should invoke it
-above with an absolute path. If you invoke it with a relative path, it
-will warn you, and the completion won't work if you're in a different
-directory.
-
-qpdf will use ``argv[0]`` to figure out where its executable is. This
-may produce unwanted results in some cases, especially if you are trying
-to use completion with copy of qpdf that is built from source. You can
-specify a full path to the qpdf you want to use for completion in the
-``QPDF_EXECUTABLE`` environment variable.
-
-.. _ref.basic-options:
-
-Basic Options
--------------
-
-The following options are the most common ones and perform commonly
-needed transformations.
-
-:samp:`--help`
- Display command-line invocation help.
-
-:samp:`--version`
- Display the current version of qpdf.
-
-:samp:`--copyright`
- Show detailed copyright information.
-
-:samp:`--show-crypto`
- Show a list of available crypto providers, each on a line by itself.
- The default provider is always listed first. See :ref:`ref.crypto` for more information about crypto
- providers.
-
-:samp:`--completion-bash`
- Output a completion command you can eval to enable shell completion
- from bash.
-
-:samp:`--completion-zsh`
- Output a completion command you can eval to enable shell completion
- from zsh.
-
-:samp:`--password={password}`
- Specifies a password for accessing encrypted files. To read the
- password from a file or standard input, you can use
- :samp:`--password-file`, added in qpdf 10.2. Note
- that you can also use :samp:`@filename` or
- :samp:`@-` as described above to put the password in
- a file or pass it via standard input, but you would do so by
- specifying the entire
- :samp:`--password={password}`
- option in the file. Syntax such as
- :samp:`--password=@filename` won't work since
- :samp:`@filename` is not recognized in the middle of
- an argument.
-
-:samp:`--password-file={filename}`
- Reads the first line from the specified file and uses it as the
- password for accessing encrypted files.
- :samp:`{filename}`
- may be ``-`` to read the password from standard input. Note that, in
- this case, the password is echoed and there is no prompt, so use with
- caution.
-
-:samp:`--is-encrypted`
- Silently exit with status 0 if the file is encrypted or status 2 if
- the file is not encrypted. This is useful for shell scripts. Other
- options are ignored if this is given. This option is mutually
- exclusive with :samp:`--requires-password`. Both this
- option and :samp:`--requires-password` exit with
- status 2 for non-encrypted files.
-
-:samp:`--requires-password`
- Silently exit with status 0 if a password (other than as supplied) is
- required. Exit with status 2 if the file is not encrypted. Exit with
- status 3 if the file is encrypted but requires no password or the
- correct password has been supplied. This is useful for shell scripts.
- Note that any supplied password is used when opening the file. When
- used with a :samp:`--password` option, this option
- can be used to check the correctness of the password. In that case,
- an exit status of 3 means the file works with the supplied password.
- This option is mutually exclusive with
- :samp:`--is-encrypted`. Both this option and
- :samp:`--is-encrypted` exit with status 2 for
- non-encrypted files.
-
-:samp:`--verbose`
- Increase verbosity of output. For now, this just prints some
- indication of any file that it creates.
-
-:samp:`--progress`
- Indicate progress while writing files.
-
-:samp:`--no-warn`
- Suppress writing of warnings to stderr. If warnings were detected and
- suppressed, :command:`qpdf` will still exit with exit
- code 3. See also :samp:`--warning-exit-0`.
-
-:samp:`--warning-exit-0`
- If warnings are found but no errors, exit with exit code 0 instead 3.
- When combined with :samp:`--no-warn`, the effect is
- for :command:`qpdf` to completely ignore warnings.
-
-:samp:`--linearize`
- Causes generation of a linearized (web-optimized) output file.
-
-:samp:`--replace-input`
- If specified, the output file name should be omitted. This option
- tells qpdf to replace the input file with the output. It does this by
- writing to
- :file:`{infilename}.~qpdf-temp#`
- and, when done, overwriting the input file with the temporary file.
- If there were any warnings, the original input is saved as
- :file:`{infilename}.~qpdf-orig`.
-
-:samp:`--copy-encryption=file`
- Encrypt the file using the same encryption parameters, including user
- and owner password, as the specified file. Use
- :samp:`--encryption-file-password` to specify a
- password if one is needed to open this file. Note that copying the
- encryption parameters from a file also copies the first half of
- ``/ID`` from the file since this is part of the encryption
- parameters.
-
-:samp:`--encryption-file-password=password`
- If the file specified with :samp:`--copy-encryption`
- requires a password, specify the password using this option. Note
- that only one of the user or owner password is required. Both
- passwords will be preserved since QPDF does not distinguish between
- the two passwords. It is possible to preserve encryption parameters,
- including the owner password, from a file even if you don't know the
- file's owner password.
-
-:samp:`--allow-weak-crypto`
- Starting with version 10.4, qpdf issues warnings when requested to
- create files using RC4 encryption. This option suppresses those
- warnings. In future versions of qpdf, qpdf will refuse to create
- files with weak cryptography when this flag is not given. See :ref:`ref.weak-crypto` for additional details.
-
-:samp:`--encrypt options --`
- Causes generation an encrypted output file. Please see :ref:`ref.encryption-options` for details on how to specify
- encryption parameters.
-
-:samp:`--decrypt`
- Removes any encryption on the file. A password must be supplied if
- the file is password protected.
-
-:samp:`--password-is-hex-key`
- Overrides the usual computation/retrieval of the PDF file's
- encryption key from user/owner password with an explicit
- specification of the encryption key. When this option is specified,
- the argument to the :samp:`--password` option is
- interpreted as a hexadecimal-encoded key value. This only applies to
- the password used to open the main input file. It does not apply to
- other files opened by :samp:`--pages` or other
- options or to files being written.
-
- Most users will never have a need for this option, and no standard
- viewers support this mode of operation, but it can be useful for
- forensic or investigatory purposes. For example, if a PDF file is
- encrypted with an unknown password, a brute-force attack using the
- key directly is sometimes more efficient than one using the password.
- Also, if a file is heavily damaged, it may be possible to derive the
- encryption key and recover parts of the file using it directly. To
- expose the encryption key used by an encrypted file that you can open
- normally, use the :samp:`--show-encryption-key`
- option.
-
-:samp:`--suppress-password-recovery`
- Ordinarily, qpdf attempts to automatically compensate for passwords
- specified in the wrong character encoding. This option suppresses
- that behavior. Under normal conditions, there are no reasons to use
- this option. See :ref:`ref.unicode-passwords` for a
- discussion
-
-:samp:`--password-mode={mode}`
- This option can be used to fine-tune how qpdf interprets Unicode
- (non-ASCII) password strings passed on the command line. With the
- exception of the :samp:`hex-bytes` mode, these only
- apply to passwords provided when encrypting files. The
- :samp:`hex-bytes` mode also applies to passwords
- specified for reading files. For additional discussion of the
- supported password modes and when you might want to use them, see
- :ref:`ref.unicode-passwords`. The following modes
- are supported:
-
- - :samp:`auto`: Automatically determine whether the
- specified password is a properly encoded Unicode (UTF-8) string,
- and transcode it as required by the PDF spec based on the type
- encryption being applied. On Windows starting with version 8.4.0,
- and on almost all other modern platforms, incoming passwords will
- be properly encoded in UTF-8, so this is almost always what you
- want.
-
- - :samp:`unicode`: Tells qpdf that the incoming
- password is UTF-8, overriding whatever its automatic detection
- determines. The only difference between this mode and
- :samp:`auto` is that qpdf will fail with an error
- message if the password is not valid UTF-8 instead of falling back
- to :samp:`bytes` mode with a warning.
-
- - :samp:`bytes`: Interpret the password as a literal
- byte string. For non-Windows platforms, this is what versions of
- qpdf prior to 8.4.0 did. For Windows platforms, there is no way to
- specify strings of binary data on the command line directly, but
- you can use the :samp:`@filename` option to do it,
- in which case this option forces qpdf to respect the string of
- bytes as provided. This option will allow you to encrypt PDF files
- with passwords that will not be usable by other readers.
-
- - :samp:`hex-bytes`: Interpret the password as a
- hex-encoded string. This provides a way to pass binary data as a
- password on all platforms including Windows. As with
- :samp:`bytes`, this option may allow creation of
- files that can't be opened by other readers. This mode affects
- qpdf's interpretation of passwords specified for decrypting files
- as well as for encrypting them. It makes it possible to specify
- strings that are encoded in some manner other than the system's
- default encoding.
-
-:samp:`--rotate=[+|-]angle[:page-range]`
- Apply rotation to specified pages. The
- :samp:`page-range` portion of the option value has
- the same format as page ranges in :ref:`ref.page-selection`. If the page range is omitted, the
- rotation is applied to all pages. The :samp:`angle`
- portion of the parameter may be either 0, 90, 180, or 270. If
- preceded by :samp:`+` or :samp:`-`,
- the angle is added to or subtracted from the specified pages'
- original rotations. This is almost always what you want. Otherwise
- the pages' rotations are set to the exact value, which may cause the
- appearances of the pages to be inconsistent, especially for scans.
- For example, the command :command:`qpdf in.pdf out.pdf
- --rotate=+90:2,4,6 --rotate=180:7-8` would rotate pages
- 2, 4, and 6 90 degrees clockwise from their original rotation and
- force the rotation of pages 7 through 8 to 180 degrees regardless of
- their original rotation, and the command :command:`qpdf in.pdf
- out.pdf --rotate=+180` would rotate all pages by 180
- degrees.
-
-:samp:`--keep-files-open={[yn]}`
- This option controls whether qpdf keeps individual files open while
- merging. Prior to version 8.1.0, qpdf always kept all files open, but
- this meant that the number of files that could be merged was limited
- by the operating system's open file limit. Version 8.1.0 opened files
- as they were referenced and closed them after each read, but this
- caused a major performance impact. Version 8.2.0 optimized the
- performance but did so in a way that, for local file systems, there
- was a small but unavoidable performance hit, but for networked file
- systems, the performance impact could be very high. Starting with
- version 8.2.1, the default behavior is that files are kept open if no
- more than 200 files are specified, but this default behavior can be
- explicitly overridden with the
- :samp:`--keep-files-open` flag. If you are merging
- more than 200 files but less than the operating system's max open
- files limit, you may want to use
- :samp:`--keep-files-open=y`, especially if working
- over a networked file system. If you are using a local file system
- where the overhead is low and you might sometimes merge more than the
- OS limit's number of files from a script and are not worried about a
- few seconds additional processing time, you may want to specify
- :samp:`--keep-files-open=n`. The threshold for
- switching may be changed from the default 200 with the
- :samp:`--keep-files-open-threshold` option.
-
-:samp:`--keep-files-open-threshold={count}`
- If specified, overrides the default value of 200 used as the
- threshold for qpdf deciding whether or not to keep files open. See
- :samp:`--keep-files-open` for details.
-
-:samp:`--pages options --`
- Select specific pages from one or more input files. See :ref:`ref.page-selection` for details on how to do
- page selection (splitting and merging).
-
-:samp:`--collate={n}`
- When specified, collate rather than concatenate pages from files
- specified with :samp:`--pages`. With a numeric
- argument, collate in groups of :samp:`{n}`.
- The default is 1. See :ref:`ref.page-selection` for additional details.
-
-:samp:`--flatten-rotation`
- For each page that is rotated using the ``/Rotate`` key in the page's
- dictionary, remove the ``/Rotate`` key and implement the identical
- rotation semantics by modifying the page's contents. This option can
- be useful to prepare files for buggy PDF applications that don't
- properly handle rotated pages.
-
-:samp:`--split-pages=[n]`
- Write each group of :samp:`n` pages to a separate
- output file. If :samp:`n` is not specified, create
- single pages. Output file names are generated as follows:
-
- - If the string ``%d`` appears in the output file name, it is
- replaced with a range of zero-padded page numbers starting from 1.
-
- - Otherwise, if the output file name ends in
- :file:`.pdf` (case insensitive), a zero-padded
- page range, preceded by a dash, is inserted before the file
- extension.
-
- - Otherwise, the file name is appended with a zero-padded page range
- preceded by a dash.
-
- Page ranges are a single number in the case of single-page groups or
- two numbers separated by a dash otherwise. For example, if
- :file:`infile.pdf` has 12 pages
-
- - :command:`qpdf --split-pages infile.pdf %d-out`
- would generate files :file:`01-out` through
- :file:`12-out`
-
- - :command:`qpdf --split-pages=2 infile.pdf
- outfile.pdf` would generate files
- :file:`outfile-01-02.pdf` through
- :file:`outfile-11-12.pdf`
-
- - :command:`qpdf --split-pages infile.pdf
- something.else` would generate files
- :file:`something.else-01` through
- :file:`something.else-12`
-
- Note that outlines, threads, and other global features of the
- original PDF file are not preserved. For each page of output, this
- option creates an empty PDF and copies a single page from the output
- into it. If you require the global data, you will have to run
- :command:`qpdf` with the
- :samp:`--pages` option once for each file. Using
- :samp:`--split-pages` is much faster if you don't
- require the global data.
-
-:samp:`--overlay options --`
- Overlay pages from another file onto the output pages. See :ref:`ref.overlay-underlay` for details on
- overlay/underlay.
-
-:samp:`--underlay options --`
- Overlay pages from another file onto the output pages. See :ref:`ref.overlay-underlay` for details on
- overlay/underlay.
-
-Password-protected files may be opened by specifying a password. By
-default, qpdf will preserve any encryption data associated with a file.
-If :samp:`--decrypt` is specified, qpdf will attempt to
-remove any encryption information. If :samp:`--encrypt`
-is specified, qpdf will replace the document's encryption parameters
-with whatever is specified.
-
-Note that qpdf does not obey encryption restrictions already imposed on
-the file. Doing so would be meaningless since qpdf can be used to remove
-encryption from the file entirely. This functionality is not intended to
-be used for bypassing copyright restrictions or other restrictions
-placed on files by their producers.
-
-Prior to 8.4.0, in the case of passwords that contain characters that
-fall outside of 7-bit US-ASCII, qpdf left the burden of supplying
-properly encoded encryption and decryption passwords to the user.
-Starting in qpdf 8.4.0, qpdf does this automatically in most cases. For
-an in-depth discussion, please see :ref:`ref.unicode-passwords`. Previous versions of this manual
-described workarounds using the :command:`iconv` command.
-Such workarounds are no longer required or recommended with qpdf 8.4.0.
-However, for backward compatibility, qpdf attempts to detect those
-workarounds and do the right thing in most cases.
-
-.. _ref.encryption-options:
-
-Encryption Options
-------------------
-
-To change the encryption parameters of a file, use the --encrypt flag.
-The syntax is
-
-::
-
- --encrypt user-password owner-password key-length [ restrictions ] --
-
-Note that ":samp:`--`" terminates parsing of encryption
-flags and must be present even if no restrictions are present.
-
-Either or both of the user password and the owner password may be empty
-strings. Starting in qpdf 10.2, qpdf defaults to not allowing creation
-of PDF files with a non-empty user password, an empty owner password,
-and a 256-bit key since such files can be opened with no password. If
-you want to create such files, specify the encryption option
-:samp:`--allow-insecure`, as described below.
-
-The value for
-:samp:`{key-length}` may
-be 40, 128, or 256. The restriction flags are dependent upon key length.
-When no additional restrictions are given, the default is to be fully
-permissive.
-
-If :samp:`{key-length}`
-is 40, the following restriction options are available:
-
-:samp:`--print=[yn]`
- Determines whether or not to allow printing.
-
-:samp:`--modify=[yn]`
- Determines whether or not to allow document modification.
-
-:samp:`--extract=[yn]`
- Determines whether or not to allow text/image extraction.
-
-:samp:`--annotate=[yn]`
- Determines whether or not to allow comments and form fill-in and
- signing.
-
-If :samp:`{key-length}`
-is 128, the following restriction options are available:
-
-:samp:`--accessibility=[yn]`
- Determines whether or not to allow accessibility to visually
- impaired. The qpdf library disregards this field when AES is used or
- when 256-bit encryption is used. You should really never disable
- accessibility, but qpdf lets you do it in case you need to configure
- a file this way for testing purposes. The PDF spec says that
- conforming readers should disregard this permission and always allow
- accessibility.
-
-:samp:`--extract=[yn]`
- Determines whether or not to allow text/graphic extraction.
-
-:samp:`--assemble=[yn]`
- Determines whether document assembly (rotation and reordering of
- pages) is allowed.
-
-:samp:`--annotate=[yn]`
- Determines whether modifying annotations is allowed. This includes
- adding comments and filling in form fields. Also allows editing of
- form fields if :samp:`--modify-other=y` is given.
-
-:samp:`--form=[yn]`
- Determines whether filling form fields is allowed.
-
-:samp:`--modify-other=[yn]`
- Allow all document editing except those controlled separately by the
- :samp:`--assemble`,
- :samp:`--annotate`, and
- :samp:`--form` options.
-
-:samp:`--print={print-opt}`
- Controls printing access.
- :samp:`{print-opt}`
- may be one of the following:
-
- - :samp:`full`: allow full printing
-
- - :samp:`low`: allow low-resolution printing only
-
- - :samp:`none`: disallow printing
-
-:samp:`--modify={modify-opt}`
- Controls modify access. This way of controlling modify access has
- less granularity than new options added in qpdf 8.4.
- :samp:`{modify-opt}`
- may be one of the following:
-
- - :samp:`all`: allow full document modification
-
- - :samp:`annotate`: allow comment authoring, form
- operations, and document assembly
-
- - :samp:`form`: allow form field fill-in and signing
- and document assembly
-
- - :samp:`assembly`: allow document assembly only
-
- - :samp:`none`: allow no modifications
-
- Using the :samp:`--modify` option does not allow you
- to create certain combinations of permissions such as allowing form
- filling but not allowing document assembly. Starting with qpdf 8.4,
- you can either just use the other options to control fields
- individually, or you can use something like :samp:`--modify=form
- --assembly=n` to fine tune.
-
-:samp:`--cleartext-metadata`
- If specified, any metadata stream in the document will be left
- unencrypted even if the rest of the document is encrypted. This also
- forces the PDF version to be at least 1.5.
-
-:samp:`--use-aes=[yn]`
- If :samp:`--use-aes=y` is specified, AES encryption
- will be used instead of RC4 encryption. This forces the PDF version
- to be at least 1.6.
-
-:samp:`--allow-insecure`
- From qpdf 10.2, qpdf defaults to not allowing creation of PDF files
- where the user password is non-empty, the owner password is empty,
- and a 256-bit key is in use. Files created in this way are insecure
- since they can be opened without a password. Users would ordinarily
- never want to create such files. If you are using qpdf to
- intentionally created strange files for testing (a definite valid use
- of qpdf!), this option allows you to create such insecure files.
-
-:samp:`--force-V4`
- Use of this option forces the ``/V`` and ``/R`` parameters in the
- document's encryption dictionary to be set to the value ``4``. As
- qpdf will automatically do this when required, there is no reason to
- ever use this option. It exists primarily for use in testing qpdf
- itself. This option also forces the PDF version to be at least 1.5.
-
-If :samp:`{key-length}`
-is 256, the minimum PDF version is 1.7 with extension level 8, and the
-AES-based encryption format used is the PDF 2.0 encryption method
-supported by Acrobat X. the same options are available as with 128 bits
-with the following exceptions:
-
-:samp:`--use-aes`
- This option is not available with 256-bit keys. AES is always used
- with 256-bit encryption keys.
-
-:samp:`--force-V4`
- This option is not available with 256 keys.
-
-:samp:`--force-R5`
- If specified, qpdf sets the minimum version to 1.7 at extension level
- 3 and writes the deprecated encryption format used by Acrobat version
- IX. This option should not be used in practice to generate PDF files
- that will be in general use, but it can be useful to generate files
- if you are trying to test proper support in another application for
- PDF files encrypted in this way.
-
-The default for each permission option is to be fully permissive.
-
-.. _ref.page-selection:
-
-Page Selection Options
-----------------------
-
-Starting with qpdf 3.0, it is possible to split and merge PDF files by
-selecting pages from one or more input files. Whatever file is given as
-the primary input file is used as the starting point, but its pages are
-replaced with pages as specified.
-
-::
-
- --pages input-file [ --password=password ] [ page-range ] [ ... ] --
-
-Multiple input files may be specified. Each one is given as the name of
-the input file, an optional password (if required to open the file), and
-the range of pages. Note that ":samp:`--`" terminates
-parsing of page selection flags.
-
-Starting with qpf 8.4, the special input file name
-":file:`.`" can be used as a shortcut for the
-primary input filename.
-
-For each file that pages should be taken from, specify the file, a
-password needed to open the file (if any), and a page range. The
-password needs to be given only once per file. If any of the input files
-are the same as the primary input file or the file used to copy
-encryption parameters (if specified), you do not need to repeat the
-password here. The same file can be repeated multiple times. If a file
-that is repeated has a password, the password only has to be given the
-first time. All non-page data (info, outlines, page numbers, etc.) are
-taken from the primary input file. To discard these, use
-:samp:`--empty` as the primary input.
-
-Starting with qpdf 5.0.0, it is possible to omit the page range. If qpdf
-sees a value in the place where it expects a page range and that value
-is not a valid range but is a valid file name, qpdf will implicitly use
-the range ``1-z``, meaning that it will include all pages in the file.
-This makes it possible to easily combine all pages in a set of files
-with a command like :command:`qpdf --empty out.pdf --pages \*.pdf
---`.
-
-The page range is a set of numbers separated by commas, ranges of
-numbers separated dashes, or combinations of those. The character "z"
-represents the last page. A number preceded by an "r" indicates to count
-from the end, so ``r3-r1`` would be the last three pages of the
-document. Pages can appear in any order. Ranges can appear with a high
-number followed by a low number, which causes the pages to appear in
-reverse. Numbers may be repeated in a page range. A page range may be
-optionally appended with ``:even`` or ``:odd`` to indicate only the even
-or odd pages in the given range. Note that even and odd refer to the
-positions within the specified, range, not whether the original number
-is even or odd.
-
-Example page ranges:
-
-- ``1,3,5-9,15-12``: pages 1, 3, 5, 6, 7, 8, 9, 15, 14, 13, and 12 in
- that order.
-
-- ``z-1``: all pages in the document in reverse
-
-- ``r3-r1``: the last three pages of the document
-
-- ``r1-r3``: the last three pages of the document in reverse order
-
-- ``1-20:even``: even pages from 2 to 20
-
-- ``5,7-9,12:odd``: pages 5, 8, and, 12, which are the pages in odd
- positions from among the original range, which represents pages 5, 7,
- 8, 9, and 12.
-
-Starting in qpdf version 8.3, you can specify the
-:samp:`--collate` option. Note that this option is
-specified outside of :samp:`--pages ... --`. When
-:samp:`--collate` is specified, it changes the meaning
-of :samp:`--pages` so that the specified files, as
-modified by page ranges, are collated rather than concatenated. For
-example, if you add the files :file:`odd.pdf` and
-:file:`even.pdf` containing odd and even pages of a
-document respectively, you could run :command:`qpdf --collate odd.pdf
---pages odd.pdf even.pdf -- all.pdf` to collate the pages.
-This would pick page 1 from odd, page 1 from even, page 2 from odd, page
-2 from even, etc. until all pages have been included. Any number of
-files and page ranges can be specified. If any file has fewer pages,
-that file is just skipped when its pages have all been included. For
-example, if you ran :command:`qpdf --collate --empty --pages a.pdf
-1-5 b.pdf 6-4 c.pdf r1 -- out.pdf`, you would get the
-following pages in this order:
-
-- a.pdf page 1
-
-- b.pdf page 6
-
-- c.pdf last page
-
-- a.pdf page 2
-
-- b.pdf page 5
-
-- a.pdf page 3
-
-- b.pdf page 4
-
-- a.pdf page 4
-
-- a.pdf page 5
-
-Starting in qpdf version 10.2, you may specify a numeric argument to
-:samp:`--collate`. With
-:samp:`--collate={n}`,
-pull groups of :samp:`{n}` pages from each file,
-again, stopping when there are no more pages. For example, if you ran
-:command:`qpdf --collate=2 --empty --pages a.pdf 1-5 b.pdf 6-4 c.pdf
-r1 -- out.pdf`, you would get the following pages in this
-order:
-
-- a.pdf page 1
-
-- a.pdf page 2
-
-- b.pdf page 6
-
-- b.pdf page 5
-
-- c.pdf last page
-
-- a.pdf page 3
-
-- a.pdf page 4
-
-- b.pdf page 4
-
-- a.pdf page 5
-
-Starting in qpdf version 8.3, when you split and merge files, any page
-labels (page numbers) are preserved in the final file. It is expected
-that more document features will be preserved by splitting and merging.
-In the mean time, semantics of splitting and merging vary across
-features. For example, the document's outlines (bookmarks) point to
-actual page objects, so if you select some pages and not others,
-bookmarks that point to pages that are in the output file will work, and
-remaining bookmarks will not work. A future version of
-:command:`qpdf` may do a better job at handling these
-issues. (Note that the qpdf library already contains all of the APIs
-required in order to implement this in your own application if you need
-it.) In the mean time, you can always use
-:samp:`--empty` as the primary input file to avoid
-copying all of that from the first file. For example, to take pages 1
-through 5 from a :file:`infile.pdf` while preserving
-all metadata associated with that file, you could use
-
-::
-
- qpdf infile.pdf --pages . 1-5 -- outfile.pdf
-
-If you wanted pages 1 through 5 from
-:file:`infile.pdf` but you wanted the rest of the
-metadata to be dropped, you could instead run
-
-::
-
- qpdf --empty --pages infile.pdf 1-5 -- outfile.pdf
-
-If you wanted to take pages 1 through 5 from
-:file:`file1.pdf` and pages 11 through 15 from
-:file:`file2.pdf` in reverse, taking document-level
-metadata from :file:`file2.pdf`, you would run
-
-::
-
- qpdf file2.pdf --pages file1.pdf 1-5 . 15-11 -- outfile.pdf
-
-If, for some reason, you wanted to take the first page of an encrypted
-file called :file:`encrypted.pdf` with password
-``pass`` and repeat it twice in an output file, and if you wanted to
-drop document-level metadata but preserve encryption, you would use
-
-::
-
- qpdf --empty --copy-encryption=encrypted.pdf --encryption-file-password=pass
- --pages encrypted.pdf --password=pass 1 ./encrypted.pdf --password=pass 1 --
- outfile.pdf
-
-Note that we had to specify the password all three times because giving
-a password as :samp:`--encryption-file-password` doesn't
-count for page selection, and as far as qpdf is concerned,
-:file:`encrypted.pdf` and
-:file:`./encrypted.pdf` are separated files. These
-are all corner cases that most users should hopefully never have to be
-bothered with.
-
-Prior to version 8.4, it was not possible to specify the same page from
-the same file directly more than once, and the workaround of specifying
-the same file in more than one way was required. Version 8.4 removes
-this limitation, but there is still a valid use case. When you specify
-the same page from the same file more than once, qpdf will share objects
-between the pages. If you are going to do further manipulation on the
-file and need the two instances of the same original page to be deep
-copies, then you can specify the file in two different ways. For example
-:command:`qpdf in.pdf --pages . 1 ./in.pdf 1 -- out.pdf`
-would create a file with two copies of the first page of the input, and
-the two copies would share any objects in common. This includes fonts,
-images, and anything else the page references.
-
-.. _ref.overlay-underlay:
-
-Overlay and Underlay Options
-----------------------------
-
-Starting with qpdf 8.4, it is possible to overlay or underlay pages from
-other files onto the output generated by qpdf. Specify overlay or
-underlay as follows:
-
-::
-
- { --overlay | --underlay } file [ options ] --
-
-Overlay and underlay options are processed late, so they can be combined
-with other like merging and will apply to the final output. The
-:samp:`--overlay` and :samp:`--underlay`
-options work the same way, except underlay pages are drawn underneath
-the page to which they are applied, possibly obscured by the original
-page, and overlay files are drawn on top of the page to which they are
-applied, possibly obscuring the page. You can combine overlay and
-underlay.
-
-The default behavior of overlay and underlay is that pages are taken
-from the overlay/underlay file in sequence and applied to corresponding
-pages in the output until there are no more output pages. If the overlay
-or underlay file runs out of pages, remaining output pages are left
-alone. This behavior can be modified by options, which are provided
-between the :samp:`--overlay` or
-:samp:`--underlay` flag and the
-:samp:`--` option. The following options are supported:
-
-- :samp:`--password=password`: supply a password if the
- overlay/underlay file is encrypted.
-
-- :samp:`--to=page-range`: a range of pages in the same
- form at described in :ref:`ref.page-selection`
- indicates which pages in the output should have the overlay/underlay
- applied. If not specified, overlay/underlay are applied to all pages.
-
-- :samp:`--from=[page-range]`: a range of pages that
- specifies which pages in the overlay/underlay file will be used for
- overlay or underlay. If not specified, all pages will be used. This
- can be explicitly specified to be empty if
- :samp:`--repeat` is used.
-
-- :samp:`--repeat=page-range`: an optional range of
- pages that specifies which pages in the overlay/underlay file will be
- repeated after the "from" pages are used up. If you want to repeat a
- range of pages starting at the beginning, you can explicitly use
- :samp:`--from=`.
-
-Here are some examples.
-
-- :command:`--overlay o.pdf --to=1-5 --from=1-3 --repeat=4
- --`: overlay the first three pages from file
- :file:`o.pdf` onto the first three pages of the
- output, then overlay page 4 from :file:`o.pdf`
- onto pages 4 and 5 of the output. Leave remaining output pages
- untouched.
-
-- :command:`--underlay footer.pdf --from= --repeat=1,2
- --`: Underlay page 1 of
- :file:`footer.pdf` on all odd output pages, and
- underlay page 2 of :file:`footer.pdf` on all even
- output pages.
-
-.. _ref.attachments:
-
-Embedded Files/Attachments Options
-----------------------------------
-
-Starting with qpdf 10.2, you can work with file attachments in PDF files
-from the command line. The following options are available:
-
-:samp:`--list-attachments`
- Show the "key" and stream number for embedded files. With
- :samp:`--verbose`, additional information, including
- preferred file name, description, dates, and more are also displayed.
- The key is usually but not always equal to the file name, and is
- needed by some of the other options.
-
-:samp:`--show-attachment={key}`
- Write the contents of the specified attachment to standard output as
- binary data. The key should match one of the keys shown by
- :samp:`--list-attachments`. If specified multiple
- times, only the last attachment will be shown.
-
-:samp:`--add-attachment {file} {options} --`
- Add or replace an attachment with the contents of
- :samp:`{file}`. This may be specified more
- than once. The following additional options may appear before the
- ``--`` that ends this option:
-
- :samp:`--key={key}`
- The key to use to register the attachment in the embedded files
- table. Defaults to the last path element of
- :samp:`{file}`.
-
- :samp:`--filename={name}`
- The file name to be used for the attachment. This is what is
- usually displayed to the user and is the name most graphical PDF
- viewers will use when saving a file. It defaults to the last path
- element of :samp:`{file}`.
-
- :samp:`--creationdate={date}`
- The attachment's creation date in PDF format; defaults to the
- current time. The date format is explained below.
-
- :samp:`--moddate={date}`
- The attachment's modification date in PDF format; defaults to the
- current time. The date format is explained below.
-
- :samp:`--mimetype={type/subtype}`
- The mime type for the attachment, e.g. ``text/plain`` or
- ``application/pdf``. Note that the mimetype appears in a field
- called ``/Subtype`` in the PDF but actually includes the full type
- and subtype of the mime type.
-
- :samp:`--description={"text"}`
- Descriptive text for the attachment, displayed by some PDF
- viewers.
-
- :samp:`--replace`
- Indicates that any existing attachment with the same key should be
- replaced by the new attachment. Otherwise,
- :command:`qpdf` gives an error if an attachment
- with that key is already present.
-
-:samp:`--remove-attachment={key}`
- Remove the specified attachment. This doesn't only remove the
- attachment from the embedded files table but also clears out the file
- specification. That means that any potential internal links to the
- attachment will be broken. This option may be specified multiple
- times. Run with :samp:`--verbose` to see status of
- the removal.
-
-:samp:`--copy-attachments-from {file} {options} --`
- Copy attachments from another file. This may be specified more than
- once. The following additional options may appear before the ``--``
- that ends this option:
-
- :samp:`--password={password}`
- If required, the password needed to open
- :samp:`{file}`
-
- :samp:`--prefix={prefix}`
- Only required if the file from which attachments are being copied
- has attachments with keys that conflict with attachments already
- in the file. In this case, the specified prefix will be prepended
- to each key. This affects only the key in the embedded files
- table, not the file name. The PDF specification doesn't preclude
- multiple attachments having the same file name.
-
-When a date is required, the date should conform to the PDF date format
-specification, which is
-``D:``\ :samp:`{yyyymmddhhmmss<z>}`, where
-:samp:`{<z>}` is either ``Z`` for UTC or a
-timezone offset in the form :samp:`{-hh'mm'}` or
-:samp:`{+hh'mm'}`. Examples:
-``D:20210207161528-05'00'``, ``D:20210207211528Z``.
-
-.. _ref.advanced-parsing:
-
-Advanced Parsing Options
-------------------------
-
-These options control aspects of how qpdf reads PDF files. Mostly these
-are of use to people who are working with damaged files. There is little
-reason to use these options unless you are trying to solve specific
-problems. The following options are available:
-
-:samp:`--suppress-recovery`
- Prevents qpdf from attempting to recover damaged files.
-
-:samp:`--ignore-xref-streams`
- Tells qpdf to ignore any cross-reference streams.
-
-Ordinarily, qpdf will attempt to recover from certain types of errors in
-PDF files. These include errors in the cross-reference table, certain
-types of object numbering errors, and certain types of stream length
-errors. Sometimes, qpdf may think it has recovered but may not have
-actually recovered, so care should be taken when using this option as
-some data loss is possible. The
-:samp:`--suppress-recovery` option will prevent qpdf
-from attempting recovery. In this case, it will fail on the first error
-that it encounters.
-
-Ordinarily, qpdf reads cross-reference streams when they are present in
-a PDF file. If :samp:`--ignore-xref-streams` is
-specified, qpdf will ignore any cross-reference streams for hybrid PDF
-files. The purpose of hybrid files is to make some content available to
-viewers that are not aware of cross-reference streams. It is almost
-never desirable to ignore them. The only time when you might want to use
-this feature is if you are testing creation of hybrid PDF files and wish
-to see how a PDF consumer that doesn't understand object and
-cross-reference streams would interpret such a file.
-
-.. _ref.advanced-transformation:
-
-Advanced Transformation Options
--------------------------------
-
-These transformation options control fine points of how qpdf creates the
-output file. Mostly these are of use only to people who are very
-familiar with the PDF file format or who are PDF developers. The
-following options are available:
-
-:samp:`--compress-streams={[yn]}`
- By default, or with :samp:`--compress-streams=y`,
- qpdf will compress any stream with no other filters applied to it
- with the ``/FlateDecode`` filter when it writes it. To suppress this
- behavior and preserve uncompressed streams as uncompressed, use
- :samp:`--compress-streams=n`.
-
-:samp:`--decode-level={option}`
- Controls which streams qpdf tries to decode. The default is
- :samp:`generalized`. The following options are
- available:
-
- - :samp:`none`: do not attempt to decode any streams
-
- - :samp:`generalized`: decode streams filtered with
- supported generalized filters: ``/LZWDecode``, ``/FlateDecode``,
- ``/ASCII85Decode``, and ``/ASCIIHexDecode``. We define generalized
- filters as those to be used for general-purpose compression or
- encoding, as opposed to filters specifically designed for image
- data. Note that, by default, streams already compressed with
- ``/FlateDecode`` are not uncompressed and recompressed unless you
- also specify :samp:`--recompress-flate`.
-
- - :samp:`specialized`: in addition to generalized,
- decode streams with supported non-lossy specialized filters;
- currently this is just ``/RunLengthDecode``
-
- - :samp:`all`: in addition to generalized and
- specialized, decode streams with supported lossy filters;
- currently this is just ``/DCTDecode`` (JPEG)
-
-:samp:`--stream-data={option}`
- Controls transformation of stream data. This option predates the
- :samp:`--compress-streams` and
- :samp:`--decode-level` options. Those options can be
- used to achieve the same affect with more control. The value of
- :samp:`{option}` may
- be one of the following:
-
- - :samp:`compress`: recompress stream data when
- possible (default); equivalent to
- :samp:`--compress-streams=y`
- :samp:`--decode-level=generalized`. Does not
- recompress streams already compressed with ``/FlateDecode`` unless
- :samp:`--recompress-flate` is also specified.
-
- - :samp:`preserve`: leave all stream data as is;
- equivalent to :samp:`--compress-streams=n`
- :samp:`--decode-level=none`
-
- - :samp:`uncompress`: uncompress stream data
- compressed with generalized filters when possible; equivalent to
- :samp:`--compress-streams=n`
- :samp:`--decode-level=generalized`
-
-:samp:`--recompress-flate`
- By default, streams already compressed with ``/FlateDecode`` are left
- alone rather than being uncompressed and recompressed. This option
- causes qpdf to uncompress and recompress the streams. There is a
- significant performance cost to using this option, but you probably
- want to use it if you specify
- :samp:`--compression-level`.
-
-:samp:`--compression-level={level}`
- When writing new streams that are compressed with ``/FlateDecode``,
- use the specified compression level. The value of
- :samp:`level` should be a number from 1 to 9 and is
- passed directly to zlib, which implements deflate compression. Note
- that qpdf doesn't uncompress and recompress streams by default. To
- have this option apply to already compressed streams, you should also
- specify :samp:`--recompress-flate`. If your goal is
- to shrink the size of PDF files, you should also use
- :samp:`--object-streams=generate`.
-
-:samp:`--normalize-content=[yn]`
- Enables or disables normalization of content streams. Content
- normalization is enabled by default in QDF mode. Please see :ref:`ref.qdf` for additional discussion of QDF mode.
-
-:samp:`--object-streams={mode}`
- Controls handling of object streams. The value of
- :samp:`{mode}` may be
- one of the following:
-
- - :samp:`preserve`: preserve original object streams
- (default)
-
- - :samp:`disable`: don't write any object streams
-
- - :samp:`generate`: use object streams wherever
- possible
-
-:samp:`--preserve-unreferenced`
- Tells qpdf to preserve objects that are not referenced when writing
- the file. Ordinarily any object that is not referenced in a traversal
- of the document from the trailer dictionary will be discarded. This
- may be useful in working with some damaged files or inspecting files
- with known unreferenced objects.
-
- This flag is ignored for linearized files and has the effect of
- causing objects in the new file to be written in order by object ID
- from the original file. This does not mean that object numbers will
- be the same since qpdf may create stream lengths as direct or
- indirect differently from the original file, and the original file
- may have gaps in its numbering.
-
- See also :samp:`--preserve-unreferenced-resources`,
- which does something completely different.
-
-:samp:`--remove-unreferenced-resources={option}`
- The :samp:`{option}` may be ``auto``,
- ``yes``, or ``no``. The default is ``auto``.
-
- Starting with qpdf 8.1, when splitting pages, qpdf is able to attempt
- to remove images and fonts that are not used by a page even if they
- are referenced in the page's resources dictionary. When shared
- resources are in use, this behavior can greatly reduce the file sizes
- of split pages, but the analysis is very slow. In versions from 8.1
- through 9.1.1, qpdf did this analysis by default. Starting in qpdf
- 10.0.0, if ``auto`` is used, qpdf does a quick analysis of the file
- to determine whether the file is likely to have unreferenced objects
- on pages, a pattern that frequently occurs when resource dictionaries
- are shared across multiple pages and rarely occurs otherwise. If it
- discovers this pattern, then it will attempt to remove unreferenced
- resources. Usually this means you get the slower splitting speed only
- when it's actually going to create smaller files. You can suppress
- removal of unreferenced resources altogether by specifying ``no`` or
- force it to do the full algorithm by specifying ``yes``.
-
- Other than cases in which you don't care about file size and care a
- lot about runtime, there are few reasons to use this option,
- especially now that ``auto`` mode is supported. One reason to use
- this is if you suspect that qpdf is removing resources it shouldn't
- be removing. If you encounter that case, please report it as bug at
- https://github.com/qpdf/qpdf/issues/.
-
-:samp:`--preserve-unreferenced-resources`
- This is a synonym for
- :samp:`--remove-unreferenced-resources=no`.
-
- See also :samp:`--preserve-unreferenced`, which does
- something completely different.
-
-:samp:`--newline-before-endstream`
- Tells qpdf to insert a newline before the ``endstream`` keyword, not
- counted in the length, after any stream content even if the last
- character of the stream was a newline. This may result in two
- newlines in some cases. This is a requirement of PDF/A. While qpdf
- doesn't specifically know how to generate PDF/A-compliant PDFs, this
- at least prevents it from removing compliance on already compliant
- files.
-
-:samp:`--linearize-pass1={file}`
- Write the first pass of linearization to the named file. The
- resulting file is not a valid PDF file. This option is useful only
- for debugging ``QPDFWriter``'s linearization code. When qpdf
- linearizes files, it writes the file in two passes, using the first
- pass to calculate sizes and offsets that are required for hint tables
- and the linearization dictionary. Ordinarily, the first pass is
- discarded. This option enables it to be captured.
-
-:samp:`--coalesce-contents`
- When a page's contents are split across multiple streams, this option
- causes qpdf to combine them into a single stream. Use of this option
- is never necessary for ordinary usage, but it can help when working
- with some files in some cases. For example, this can also be combined
- with QDF mode or content normalization to make it easier to look at
- all of a page's contents at once.
-
-:samp:`--flatten-annotations={option}`
- This option collapses annotations into the pages' contents with
- special handling for form fields. Ordinarily, an annotation is
- rendered separately and on top of the page. Combining annotations
- into the page's contents effectively freezes the placement of the
- annotations, making them look right after various page
- transformations. The library functionality backing this option was
- added for the benefit of programs that want to create *n-up* page
- layouts and other similar things that don't work well with
- annotations. The :samp:`{option}` parameter
- may be any of the following:
-
- - :samp:`all`: include all annotations that are not
- marked invisible or hidden
-
- - :samp:`print`: only include annotations that
- indicate that they should appear when the page is printed
-
- - :samp:`screen`: omit annotations that indicate
- they should not appear on the screen
-
- Note that form fields are special because the annotations that are
- used to render filled-in form fields may become out of date from the
- fields' values if the form is filled in by a program that doesn't
- know how to update the appearances. If qpdf detects this case, its
- default behavior is not to flatten those annotations because doing so
- would cause the value of the form field to be lost. This gives you a
- chance to go back and resave the form with a program that knows how
- to generate appearances. QPDF itself can generate appearances with
- some limitations. See the
- :samp:`--generate-appearances` option below.
-
-:samp:`--generate-appearances`
- If a file contains interactive form fields and indicates that the
- appearances are out of date with the values of the form, this flag
- will regenerate appearances, subject to a few limitations. Note that
- there is not usually a reason to do this, but it can be necessary
- before using the :samp:`--flatten-annotations`
- option. Most of these are not a problem with well-behaved PDF files.
- The limitations are as follows:
-
- - Radio button and checkbox appearances use the pre-set values in
- the PDF file. QPDF just makes sure that the correct appearance is
- displayed based on the value of the field. This is fine for PDF
- files that create their forms properly. Some PDF writers save
- appearances for fields when they change, which could cause some
- controls to have inconsistent appearances.
-
- - For text fields and list boxes, any characters that fall outside
- of US-ASCII or, if detected, "Windows ANSI" or "Mac Roman"
- encoding, will be replaced by the ``?`` character.
-
- - Quadding is ignored. Quadding is used to specify whether the
- contents of a field should be left, center, or right aligned with
- the field.
-
- - Rich text, multi-line, and other more elaborate formatting
- directives are ignored.
-
- - There is no support for multi-select fields or signature fields.
-
- If qpdf doesn't do a good enough job with your form, use an external
- application to save your filled-in form before processing it with
- qpdf.
-
-:samp:`--optimize-images`
- This flag causes qpdf to recompress all images that are not
- compressed with DCT (JPEG) using DCT compression as long as doing so
- decreases the size in bytes of the image data and the image does not
- fall below minimum specified dimensions. Useful information is
- provided when used in combination with
- :samp:`--verbose`. See also the
- :samp:`--oi-min-width`,
- :samp:`--oi-min-height`, and
- :samp:`--oi-min-area` options. By default, starting
- in qpdf 8.4, inline images are converted to regular images and
- optimized as well. Use :samp:`--keep-inline-images`
- to prevent inline images from being included.
-
-:samp:`--oi-min-width={width}`
- Avoid optimizing images whose width is below the specified amount. If
- omitted, the default is 128 pixels. Use 0 for no minimum.
-
-:samp:`--oi-min-height={height}`
- Avoid optimizing images whose height is below the specified amount.
- If omitted, the default is 128 pixels. Use 0 for no minimum.
-
-:samp:`--oi-min-area={area-in-pixels}`
- Avoid optimizing images whose pixel count (width × height) is below
- the specified amount. If omitted, the default is 16,384 pixels. Use 0
- for no minimum.
-
-:samp:`--externalize-inline-images`
- Convert inline images to regular images. By default, images whose
- data is at least 1,024 bytes are converted when this option is
- selected. Use :samp:`--ii-min-bytes` to change the
- size threshold. This option is implicitly selected when
- :samp:`--optimize-images` is selected. Use
- :samp:`--keep-inline-images` to exclude inline images
- from image optimization.
-
-:samp:`--ii-min-bytes={bytes}`
- Avoid converting inline images whose size is below the specified
- minimum size to regular images. If omitted, the default is 1,024
- bytes. Use 0 for no minimum.
-
-:samp:`--keep-inline-images`
- Prevent inline images from being included in image optimization. This
- option has no affect when :samp:`--optimize-images`
- is not specified.
-
-:samp:`--remove-page-labels`
- Remove page labels from the output file.
-
-:samp:`--qdf`
- Turns on QDF mode. For additional information on QDF, please see :ref:`ref.qdf`. Note that :samp:`--linearize`
- disables QDF mode.
-
-:samp:`--min-version={version}`
- Forces the PDF version of the output file to be at least
- :samp:`{version}`. In other words, if the
- input file has a lower version than the specified version, the
- specified version will be used. If the input file has a higher
- version, the input file's original version will be used. It is seldom
- necessary to use this option since qpdf will automatically increase
- the version as needed when adding features that require newer PDF
- readers.
-
- The version number may be expressed in the form
- :samp:`{major.minor.extension-level}`, in
- which case the version is interpreted as
- :samp:`{major.minor}` at extension level
- :samp:`{extension-level}`. For example,
- version ``1.7.8`` represents version 1.7 at extension level 8. Note
- that minimal syntax checking is done on the command line.
-
-:samp:`--force-version={version}`
- This option forces the PDF version to be the exact version specified
- *even when the file may have content that is not supported in that
- version*. The version number is interpreted in the same way as with
- :samp:`--min-version` so that extension levels can be
- set. In some cases, forcing the output file's PDF version to be lower
- than that of the input file will cause qpdf to disable certain
- features of the document. Specifically, 256-bit keys are disabled if
- the version is less than 1.7 with extension level 8 (except R5 is
- disabled if less than 1.7 with extension level 3), AES encryption is
- disabled if the version is less than 1.6, cleartext metadata and
- object streams are disabled if less than 1.5, 128-bit encryption keys
- are disabled if less than 1.4, and all encryption is disabled if less
- than 1.3. Even with these precautions, qpdf won't be able to do
- things like eliminate use of newer image compression schemes,
- transparency groups, or other features that may have been added in
- more recent versions of PDF.
-
- As a general rule, with the exception of big structural things like
- the use of object streams or AES encryption, PDF viewers are supposed
- to ignore features in files that they don't support from newer
- versions. This means that forcing the version to a lower version may
- make it possible to open your PDF file with an older version, though
- bear in mind that some of the original document's functionality may
- be lost.
-
-By default, when a stream is encoded using non-lossy filters that qpdf
-understands and is not already compressed using a good compression
-scheme, qpdf will uncompress and recompress streams. Assuming proper
-filter implements, this is safe and generally results in smaller files.
-This behavior may also be explicitly requested with
-:samp:`--stream-data=compress`.
-
-When :samp:`--normalize-content=y` is specified, qpdf
-will attempt to normalize whitespace and newlines in page content
-streams. This is generally safe but could, in some cases, cause damage
-to the content streams. This option is intended for people who wish to
-study PDF content streams or to debug PDF content. You should not use
-this for "production" PDF files.
-
-When normalizing content, if qpdf runs into any lexical errors, it will
-print a warning indicating that content may be damaged. The only
-situation in which qpdf is known to cause damage during content
-normalization is when a page's contents are split across multiple
-streams and streams are split in the middle of a lexical token such as a
-string, name, or inline image. Note that files that do this are invalid
-since the PDF specification states that content streams are not to be
-split in the middle of a token. If you want to inspect the original
-content streams in an uncompressed format, you can always run with
-:samp:`--qdf --normalize-content=n` for a QDF file
-without content normalization, or alternatively
-:samp:`--stream-data=uncompress` for a regular non-QDF
-mode file with uncompressed streams. These will both uncompress all the
-streams but will not attempt to normalize content. Please note that if
-you are using content normalization or QDF mode for the purpose of
-manually inspecting files, you don't have to care about this.
-
-Object streams, also known as compressed objects, were introduced into
-the PDF specification at version 1.5, corresponding to Acrobat 6. Some
-older PDF viewers may not support files with object streams. qpdf can be
-used to transform files with object streams to files without object
-streams or vice versa. As mentioned above, there are three object stream
-modes: :samp:`preserve`,
-:samp:`disable`, and :samp:`generate`.
-
-In :samp:`preserve` mode, the relationship to objects
-and the streams that contain them is preserved from the original file.
-In :samp:`disable` mode, all objects are written as
-regular, uncompressed objects. The resulting file should be readable by
-older PDF viewers. (Of course, the content of the files may include
-features not supported by older viewers, but at least the structure will
-be supported.) In :samp:`generate` mode, qpdf will
-create its own object streams. This will usually result in more compact
-PDF files, though they may not be readable by older viewers. In this
-mode, qpdf will also make sure the PDF version number in the header is
-at least 1.5.
-
-The :samp:`--qdf` flag turns on QDF mode, which changes
-some of the defaults described above. Specifically, in QDF mode, by
-default, stream data is uncompressed, content streams are normalized,
-and encryption is removed. These defaults can still be overridden by
-specifying the appropriate options as described above. Additionally, in
-QDF mode, stream lengths are stored as indirect objects, objects are
-laid out in a less efficient but more readable fashion, and the
-documents are interspersed with comments that make it easier for the
-user to find things and also make it possible for
-:command:`fix-qdf` to work properly. QDF mode is intended
-for people, mostly developers, who wish to inspect or modify PDF files
-in a text editor. For details, please see :ref:`ref.qdf`.
-
-.. _ref.testing-options:
-
-Testing, Inspection, and Debugging Options
-------------------------------------------
-
-These options can be useful for digging into PDF files or for use in
-automated test suites for software that uses the qpdf library. When any
-of the options in this section are specified, no output file should be
-given. The following options are available:
-
-:samp:`--deterministic-id`
- Causes generation of a deterministic value for /ID. This prevents use
- of timestamp and output file name information in the /ID generation.
- Instead, at some slight additional runtime cost, the /ID field is
- generated to include a digest of the significant parts of the content
- of the output PDF file. This means that a given qpdf operation should
- generate the same /ID each time it is run, which can be useful when
- caching results or for generation of some test data. Use of this flag
- is not compatible with creation of encrypted files.
-
-:samp:`--static-id`
- Causes generation of a fixed value for /ID. This is intended for
- testing only. Never use it for production files. If you are trying to
- get the same /ID each time for a given file and you are not
- generating encrypted files, consider using the
- :samp:`--deterministic-id` option.
-
-:samp:`--static-aes-iv`
- Causes use of a static initialization vector for AES-CBC. This is
- intended for testing only so that output files can be reproducible.
- Never use it for production files. This option in particular is not
- secure since it significantly weakens the encryption.
-
-:samp:`--no-original-object-ids`
- Suppresses inclusion of original object ID comments in QDF files.
- This can be useful when generating QDF files for test purposes,
- particularly when comparing them to determine whether two PDF files
- have identical content.
-
-:samp:`--show-encryption`
- Shows document encryption parameters. Also shows the document's user
- password if the owner password is given.
-
-:samp:`--show-encryption-key`
- When encryption information is being displayed, as when
- :samp:`--check` or
- :samp:`--show-encryption` is given, display the
- computed or retrieved encryption key as a hexadecimal string. This
- value is not ordinarily useful to users, but it can be used as the
- argument to :samp:`--password` if the
- :samp:`--password-is-hex-key` is specified. Note
- that, when PDF files are encrypted, passwords and other metadata are
- used only to compute an encryption key, and the encryption key is
- what is actually used for encryption. This enables retrieval of that
- key.
-
-:samp:`--check-linearization`
- Checks file integrity and linearization status.
-
-:samp:`--show-linearization`
- Checks and displays all data in the linearization hint tables.
-
-:samp:`--show-xref`
- Shows the contents of the cross-reference table in a human-readable
- form. This is especially useful for files with cross-reference
- streams which are stored in a binary format.
-
-:samp:`--show-object=trailer|obj[,gen]`
- Show the contents of the given object. This is especially useful for
- inspecting objects that are inside of object streams (also known as
- "compressed objects").
-
-:samp:`--raw-stream-data`
- When used along with the :samp:`--show-object`
- option, if the object is a stream, shows the raw stream data instead
- of object's contents.
-
-:samp:`--filtered-stream-data`
- When used along with the :samp:`--show-object`
- option, if the object is a stream, shows the filtered stream data
- instead of object's contents. If the stream is filtered using filters
- that qpdf does not support, an error will be issued.
-
-:samp:`--show-npages`
- Prints the number of pages in the input file on a line by itself.
- Since the number of pages appears by itself on a line, this option
- can be useful for scripting if you need to know the number of pages
- in a file.
-
-:samp:`--show-pages`
- Shows the object and generation number for each page dictionary
- object and for each content stream associated with the page. Having
- this information makes it more convenient to inspect objects from a
- particular page.
-
-:samp:`--with-images`
- When used along with :samp:`--show-pages`, also shows
- the object and generation numbers for the image objects on each page.
- (At present, information about images in shared resource dictionaries
- are not output by this command. This is discussed in a comment in the
- source code.)
-
-:samp:`--json`
- Generate a JSON representation of the file. This is described in
- depth in :ref:`ref.json`
-
-:samp:`--json-help`
- Describe the format of the JSON output.
-
-:samp:`--json-key=key`
- This option is repeatable. If specified, only top-level keys
- specified will be included in the JSON output. If not specified, all
- keys will be shown.
-
-:samp:`--json-object=trailer|obj[,gen]`
- This option is repeatable. If specified, only specified objects will
- be shown in the "``objects``" key of the JSON output. If absent, all
- objects will be shown.
-
-:samp:`--check`
- Checks file structure and well as encryption, linearization, and
- encoding of stream data. A file for which
- :samp:`--check` reports no errors may still have
- errors in stream data content but should otherwise be structurally
- sound. If :samp:`--check` any errors, qpdf will exit
- with a status of 2. There are some recoverable conditions that
- :samp:`--check` detects. These are issued as warnings
- instead of errors. If qpdf finds no errors but finds warnings, it
- will exit with a status of 3 (as of version 2.0.4). When
- :samp:`--check` is combined with other options,
- checks are always performed before any other options are processed.
- For erroneous files, :samp:`--check` will cause qpdf
- to attempt to recover, after which other options are effectively
- operating on the recovered file. Combining
- :samp:`--check` with other options in this way can be
- useful for manually recovering severely damaged files. Note that
- :samp:`--check` produces no output to standard output
- when everything is valid, so if you are using this to
- programmatically validate files in bulk, it is safe to run without
- output redirected to :file:`/dev/null` and just
- check for a 0 exit code.
-
-The :samp:`--raw-stream-data` and
-:samp:`--filtered-stream-data` options are ignored
-unless :samp:`--show-object` is given. Either of these
-options will cause the stream data to be written to standard output. In
-order to avoid commingling of stream data with other output, it is
-recommend that these objects not be combined with other test/inspection
-options.
-
-If :samp:`--filtered-stream-data` is given and
-:samp:`--normalize-content=y` is also given, qpdf will
-attempt to normalize the stream data as if it is a page content stream.
-This attempt will be made even if it is not a page content stream, in
-which case it will produce unusable results.
-
-.. _ref.unicode-passwords:
-
-Unicode Passwords
------------------
-
-At the library API level, all methods that perform encryption and
-decryption interpret passwords as strings of bytes. It is up to the
-caller to ensure that they are appropriately encoded. Starting with qpdf
-version 8.4.0, qpdf will attempt to make this easier for you when
-interact with qpdf via its command line interface. The PDF specification
-requires passwords used to encrypt files with 40-bit or 128-bit
-encryption to be encoded with PDF Doc encoding. This encoding is a
-single-byte encoding that supports ISO-Latin-1 and a handful of other
-commonly used characters. It has a large overlap with Windows ANSI but
-is not exactly the same. There is generally not a way to provide PDF Doc
-encoded strings on the command line. As such, qpdf versions prior to
-8.4.0 would often create PDF files that couldn't be opened with other
-software when given a password with non-ASCII characters to encrypt a
-file with 40-bit or 128-bit encryption. Starting with qpdf 8.4.0, qpdf
-recognizes the encoding of the parameter and transcodes it as needed.
-The rest of this section provides the details about exactly how qpdf
-behaves. Most users will not need to know this information, but it might
-be useful if you have been working around qpdf's old behavior or if you
-are using qpdf to generate encrypted files for testing other PDF
-software.
-
-A note about Windows: when qpdf builds, it attempts to determine what it
-has to do to use ``wmain`` instead of ``main`` on Windows. The ``wmain``
-function is an alternative entry point that receives all arguments as
-UTF-16-encoded strings. When qpdf starts up this way, it converts all
-the strings to UTF-8 encoding and then invokes the regular main. This
-means that, as far as qpdf is concerned, it receives its command-line
-arguments with UTF-8 encoding, just as it would in any modern Linux or
-UNIX environment.
-
-If a file is being encrypted with 40-bit or 128-bit encryption and the
-supplied password is not a valid UTF-8 string, qpdf will fall back to
-the behavior of interpreting the password as a string of bytes. If you
-have old scripts that encrypt files by passing the output of
-:command:`iconv` to qpdf, you no longer need to do that,
-but if you do, qpdf should still work. The only exception would be for
-the extremely unlikely case of a password that is encoded with a
-single-byte encoding but also happens to be valid UTF-8. Such a password
-would contain strings of even numbers of characters that alternate
-between accented letters and symbols. In the extremely unlikely event
-that you are intentionally using such passwords and qpdf is thwarting
-you by interpreting them as UTF-8, you can use
-:samp:`--password-mode=bytes` to suppress qpdf's
-automatic behavior.
-
-The :samp:`--password-mode` option, as described earlier
-in this chapter, can be used to change qpdf's interpretation of supplied
-passwords. There are very few reasons to use this option. One would be
-the unlikely case described in the previous paragraph in which the
-supplied password happens to be valid UTF-8 but isn't supposed to be
-UTF-8. Your best bet would be just to provide the password as a valid
-UTF-8 string, but you could also use
-:samp:`--password-mode=bytes`. Another reason to use
-:samp:`--password-mode=bytes` would be to intentionally
-generate PDF files encrypted with passwords that are not properly
-encoded. The qpdf test suite does this to generate invalid files for the
-purpose of testing its password recovery capability. If you were trying
-to create intentionally incorrect files for a similar purposes, the
-:samp:`bytes` password mode can enable you to do this.
-
-When qpdf attempts to decrypt a file with a password that contains
-non-ASCII characters, it will generate a list of alternative passwords
-by attempting to interpret the password as each of a handful of
-different coding systems and then transcode them to the required format.
-This helps to compensate for the supplied password being given in the
-wrong coding system, such as would happen if you used the
-:command:`iconv` workaround that was previously needed.
-It also generates passwords by doing the reverse operation: translating
-from correct in incorrect encoding of the password. This would enable
-qpdf to decrypt files using passwords that were improperly encoded by
-whatever software encrypted the files, including older versions of qpdf
-invoked without properly encoded passwords. The combination of these two
-recovery methods should make qpdf transparently open most encrypted
-files with the password supplied correctly but in the wrong coding
-system. There are no real downsides to this behavior, but if you don't
-want qpdf to do this, you can use the
-:samp:`--suppress-password-recovery` option. One reason
-to do that is to ensure that you know the exact password that was used
-to encrypt the file.
-
-With these changes, qpdf now generates compliant passwords in most
-cases. There are still some exceptions. In particular, the PDF
-specification directs compliant writers to normalize Unicode passwords
-and to perform certain transformations on passwords with bidirectional
-text. Implementing this functionality requires using a real Unicode
-library like ICU. If a client application that uses qpdf wants to do
-this, the qpdf library will accept the resulting passwords, but qpdf
-will not perform these transformations itself. It is possible that this
-will be addressed in a future version of qpdf. The ``QPDFWriter``
-methods that enable encryption on the output file accept passwords as
-strings of bytes.
-
-Please note that the :samp:`--password-is-hex-key`
-option is unrelated to all this. This flag bypasses the normal process
-of going from password to encryption string entirely, allowing the raw
-encryption key to be specified directly. This is useful for forensic
-purposes or for brute-force recovery of files with unknown passwords.
-
-.. _ref.qdf:
-
-QDF Mode
-========
-
-In QDF mode, qpdf creates PDF files in what we call *QDF
-form*. A PDF file in QDF form, sometimes called a QDF
-file, is a completely valid PDF file that has ``%QDF-1.0`` as its third
-line (after the pdf header and binary characters) and has certain other
-characteristics. The purpose of QDF form is to make it possible to edit
-PDF files, with some restrictions, in an ordinary text editor. This can
-be very useful for experimenting with different PDF constructs or for
-making one-off edits to PDF files (though there are other reasons why
-this may not always work). Note that QDF mode does not support
-linearized files. If you enable linearization, QDF mode is automatically
-disabled.
-
-It is ordinarily very difficult to edit PDF files in a text editor for
-two reasons: most meaningful data in PDF files is compressed, and PDF
-files are full of offset and length information that makes it hard to
-add or remove data. A QDF file is organized in a manner such that, if
-edits are kept within certain constraints, the
-:command:`fix-qdf` program, distributed with qpdf, is
-able to restore edited files to a correct state. The
-:command:`fix-qdf` program takes no command-line
-arguments. It reads a possibly edited QDF file from standard input and
-writes a repaired file to standard output.
-
-The following attributes characterize a QDF file:
-
-- All objects appear in numerical order in the PDF file, including when
- objects appear in object streams.
-
-- Objects are printed in an easy-to-read format, and all line endings
- are normalized to UNIX line endings.
-
-- Unless specifically overridden, streams appear uncompressed (when
- qpdf supports the filters and they are compressed with a non-lossy
- compression scheme), and most content streams are normalized (line
- endings are converted to just a UNIX-style linefeeds).
-
-- All streams lengths are represented as indirect objects, and the
- stream length object is always the next object after the stream. If
- the stream data does not end with a newline, an extra newline is
- inserted, and a special comment appears after the stream indicating
- that this has been done.
-
-- If the PDF file contains object streams, if object stream *n*
- contains *k* objects, those objects are numbered from *n+1* through
- *n+k*, and the object number/offset pairs appear on a separate line
- for each object. Additionally, each object in the object stream is
- preceded by a comment indicating its object number and index. This
- makes it very easy to find objects in object streams.
-
-- All beginnings of objects, ``stream`` tokens, ``endstream`` tokens,
- and ``endobj`` tokens appear on lines by themselves. A blank line
- follows every ``endobj`` token.
-
-- If there is a cross-reference stream, it is unfiltered.
-
-- Page dictionaries and page content streams are marked with special
- comments that make them easy to find.
-
-- Comments precede each object indicating the object number of the
- corresponding object in the original file.
-
-When editing a QDF file, any edits can be made as long as the above
-constraints are maintained. This means that you can freely edit a page's
-content without worrying about messing up the QDF file. It is also
-possible to add new objects so long as those objects are added after the
-last object in the file or subsequent objects are renumbered. If a QDF
-file has object streams in it, you can always add the new objects before
-the xref stream and then change the number of the xref stream, since
-nothing generally ever references it by number.
-
-It is not generally practical to remove objects from QDF files without
-messing up object numbering, but if you remove all references to an
-object, you can run qpdf on the file (after running
-:command:`fix-qdf`), and qpdf will omit the now-orphaned
-object.
-
-When :command:`fix-qdf` is run, it goes through the file
-and recomputes the following parts of the file:
-
-- the ``/N``, ``/W``, and ``/First`` keys of all object stream
- dictionaries
-
-- the pairs of numbers representing object numbers and offsets of
- objects in object streams
-
-- all stream lengths
-
-- the cross-reference table or cross-reference stream
-
-- the offset to the cross-reference table or cross-reference stream
- following the ``startxref`` token
-
-.. _ref.using-library:
-
-Using the QPDF Library
-======================
-
-.. _ref.using.from-cxx:
-
-Using QPDF from C++
--------------------
-
-The source tree for the qpdf package has an
-:file:`examples` directory that contains a few
-example programs. The :file:`qpdf/qpdf.cc` source
-file also serves as a useful example since it exercises almost all of
-the qpdf library's public interface. The best source of documentation on
-the library itself is reading comments in
-:file:`include/qpdf/QPDF.hh`,
-:file:`include/qpdf/QPDFWriter.hh`, and
-:file:`include/qpdf/QPDFObjectHandle.hh`.
-
-All header files are installed in the
-:file:`include/qpdf` directory. It is recommend that
-you use ``#include <qpdf/QPDF.hh>`` rather than adding
-:file:`include/qpdf` to your include path.
-
-When linking against the qpdf static library, you may also need to
-specify ``-lz -ljpeg`` on your link command. If your system understands
-how to read libtool :file:`.la` files, this may not
-be necessary.
-
-The qpdf library is safe to use in a multithreaded program, but no
-individual ``QPDF`` object instance (including ``QPDF``,
-``QPDFObjectHandle``, or ``QPDFWriter``) can be used in more than one
-thread at a time. Multiple threads may simultaneously work with
-different instances of these and all other QPDF objects.
-
-.. _ref.using.other-languages:
-
-Using QPDF from other languages
--------------------------------
-
-The qpdf library is implemented in C++, which makes it hard to use
-directly in other languages. There are a few things that can help.
-
-"C"
- The qpdf library includes a "C" language interface that provides a
- subset of the overall capabilities. The header file
- :file:`qpdf/qpdf-c.h` includes information about
- its use. As long as you use a C++ linker, you can link C programs
- with qpdf and use the C API. For languages that can directly load
- methods from a shared library, the C API can also be useful. People
- have reported success using the C API from other languages on Windows
- by directly calling functions in the DLL.
-
-Python
- A Python module called
- `pikepdf <https://pypi.org/project/pikepdf/>`__ provides a clean and
- highly functional set of Python bindings to the qpdf library. Using
- pikepdf, you can work with PDF files in a natural way and combine
- qpdf's capabilities with other functionality provided by Python's
- rich standard library and available modules.
-
-Other Languages
- Starting with version 8.3.0, the :command:`qpdf`
- command-line tool can produce a JSON representation of the PDF file's
- non-content data. This can facilitate interacting programmatically
- with PDF files through qpdf's command line interface. For more
- information, please see :ref:`ref.json`.
-
-.. _ref.unicode-files:
-
-A Note About Unicode File Names
--------------------------------
-
-When strings are passed to qpdf library routines either as ``char*`` or
-as ``std::string``, they are treated as byte arrays except where
-otherwise noted. When Unicode is desired, qpdf wants UTF-8 unless
-otherwise noted in comments in header files. In modern UNIX/Linux
-environments, this generally does the right thing. In Windows, it's a
-bit more complicated. Starting in qpdf 8.4.0, passwords that contain
-Unicode characters are handled much better, and starting in qpdf 8.4.1,
-the library attempts to properly handle Unicode characters in filenames.
-In particular, in Windows, if a UTF-8 encoded string is used as a
-filename in either ``QPDF`` or ``QPDFWriter``, it is internally
-converted to ``wchar_t*``, and Unicode-aware Windows APIs are used. As
-such, qpdf will generally operate properly on files with non-ASCII
-characters in their names as long as the filenames are UTF-8 encoded for
-passing into the qpdf library API, but there are still some rough edges,
-such as the encoding of the filenames in error messages our CLI output
-messages. Patches or bug reports are welcome for any continuing issues
-with Unicode file names in Windows.
-
-.. _ref.weak-crypto:
-
-Weak Cryptography
-=================
-
-Start with version 10.4, qpdf is taking steps to reduce the likelihood
-of a user *accidentally* creating PDF files with insecure cryptography
-but will continue to allow creation of such files indefinitely with
-explicit acknowledgment.
-
-The PDF file format makes use of RC4, which is known to be a weak
-cryptography algorithm, and MD5, which is a weak hashing algorithm. In
-version 10.4, qpdf generates warnings for some (but not all) cases of
-writing files with weak cryptography when invoked from the command-line.
-These warnings can be suppressed using the
-:samp:`--allow-weak-crypto` option.
-
-It is planned for qpdf version 11 to be stricter, making it an error to
-write files with insecure cryptography from the command-line tool in
-most cases without specifying the
-:samp:`--allow-weak-crypto` flag and also to require
-explicit steps when using the C++ library to enable use of insecure
-cryptography.
-
-Note that qpdf must always retain support for weak cryptographic
-algorithms since this is required for reading older PDF files that use
-it. Additionally, qpdf will always retain the ability to create files
-using weak cryptographic algorithms since, as a development tool, qpdf
-explicitly supports creating older or deprecated types of PDF files
-since these are sometimes needed to test or work with older versions of
-software. Even if other cryptography libraries drop support for RC4 or
-MD5, qpdf can always fall back to its internal implementations of those
-algorithms, so they are not going to disappear from qpdf.
-
-.. _ref.json:
-
-QPDF JSON
-=========
-
-.. _ref.json-overview:
-
-Overview
---------
-
-Beginning with qpdf version 8.3.0, the :command:`qpdf`
-command-line program can produce a JSON representation of the
-non-content data in a PDF file. It includes a dump in JSON format of all
-objects in the PDF file excluding the content of streams. This JSON
-representation makes it very easy to look in detail at the structure of
-a given PDF file, and it also provides a great way to work with PDF
-files programmatically from the command-line in languages that can't
-call or link with the qpdf library directly. Note that stream data can
-be extracted from PDF files using other qpdf command-line options.
-
-.. _ref.json-guarantees:
-
-JSON Guarantees
----------------
-
-The qpdf JSON representation includes a JSON serialization of the raw
-objects in the PDF file as well as some computed information in a more
-easily extracted format. QPDF provides some guarantees about its JSON
-format. These guarantees are designed to simplify the experience of a
-developer working with the JSON format.
-
-Compatibility
- The top-level JSON object output is a dictionary. The JSON output
- contains various nested dictionaries and arrays. With the exception
- of dictionaries that are populated by the fields of objects from the
- file, all instances of a dictionary are guaranteed to have exactly
- the same keys. Future versions of qpdf are free to add additional
- keys but not to remove keys or change the type of object that a key
- points to. The qpdf program validates this guarantee, and in the
- unlikely event that a bug in qpdf should cause it to generate data
- that doesn't conform to this rule, it will ask you to file a bug
- report.
-
- The top-level JSON structure contains a "``version``" key whose value
- is simple integer. The value of the ``version`` key will be
- incremented if a non-compatible change is made. A non-compatible
- change would be any change that involves removal of a key, a change
- to the format of data pointed to by a key, or a semantic change that
- requires a different interpretation of a previously existing key. A
- strong effort will be made to avoid breaking compatibility.
-
-Documentation
- The :command:`qpdf` command can be invoked with the
- :samp:`--json-help` option. This will output a JSON
- structure that has the same structure as the JSON output that qpdf
- generates, except that each field in the help output is a description
- of the corresponding field in the JSON output. The specific
- guarantees are as follows:
-
- - A dictionary in the help output means that the corresponding
- location in the actual JSON output is also a dictionary with
- exactly the same keys; that is, no keys present in help are absent
- in the real output, and no keys will be present in the real output
- that are not in help. As a special case, if the dictionary has a
- single key whose name starts with ``<`` and ends with ``>``, it
- means that the JSON output is a dictionary that can have any keys,
- each of which conforms to the value of the special key. This is
- used for cases in which the keys of the dictionary are things like
- object IDs.
-
- - A string in the help output is a description of the item that
- appears in the corresponding location of the actual output. The
- corresponding output can have any format.
-
- - An array in the help output always contains a single element. It
- indicates that the corresponding location in the actual output is
- also an array, and that each element of the array has whatever
- format is implied by the single element of the help output's
- array.
-
- For example, the help output indicates includes a "``pagelabels``"
- key whose value is an array of one element. That element is a
- dictionary with keys "``index``" and "``label``". In addition to
- describing the meaning of those keys, this tells you that the actual
- JSON output will contain a ``pagelabels`` array, each of whose
- elements is a dictionary that contains an ``index`` key, a ``label``
- key, and no other keys.
-
-Directness and Simplicity
- The JSON output contains the value of every object in the file, but
- it also contains some processed data. This is analogous to how qpdf's
- library interface works. The processed data is similar to the helper
- functions in that it allows you to look at certain aspects of the PDF
- file without having to understand all the nuances of the PDF
- specification, while the raw objects allow you to mine the PDF for
- anything that the higher-level interfaces are lacking.
-
-.. _json.limitations:
-
-Limitations of JSON Representation
-----------------------------------
-
-There are a few limitations to be aware of with the JSON structure:
-
-- Strings, names, and indirect object references in the original PDF
- file are all converted to strings in the JSON representation. In the
- case of a "normal" PDF file, you can tell the difference because a
- name starts with a slash (``/``), and an indirect object reference
- looks like ``n n R``, but if there were to be a string that looked
- like a name or indirect object reference, there would be no way to
- tell this from the JSON output. Note that there are certain cases
- where you know for sure what something is, such as knowing that
- dictionary keys in objects are always names and that certain things
- in the higher-level computed data are known to contain indirect
- object references.
-
-- The JSON format doesn't support binary data very well. Mostly the
- details are not important, but they are presented here for
- information. When qpdf outputs a string in the JSON representation,
- it converts the string to UTF-8, assuming usual PDF string semantics.
- Specifically, if the original string is UTF-16, it is converted to
- UTF-8. Otherwise, it is assumed to have PDF doc encoding, and is
- converted to UTF-8 with that assumption. This causes strange things
- to happen to binary strings. For example, if you had the binary
- string ``<038051>``, this would be output to the JSON as ``\u0003•Q``
- because ``03`` is not a printable character and ``80`` is the bullet
- character in PDF doc encoding and is mapped to the Unicode value
- ``2022``. Since ``51`` is ``Q``, it is output as is. If you wanted to
- convert back from here to a binary string, would have to recognize
- Unicode values whose code points are higher than ``0xFF`` and map
- those back to their corresponding PDF doc encoding characters. There
- is no way to tell the difference between a Unicode string that was
- originally encoded as UTF-16 or one that was converted from PDF doc
- encoding. In other words, it's best if you don't try to use the JSON
- format to extract binary strings from the PDF file, but if you really
- had to, it could be done. Note that qpdf's
- :samp:`--show-object` option does not have this
- limitation and will reveal the string as encoded in the original
- file.
-
-.. _json.considerations:
-
-JSON: Special Considerations
-----------------------------
-
-For the most part, the built-in JSON help tells you everything you need
-to know about the JSON format, but there are a few non-obvious things to
-be aware of:
-
-- While qpdf guarantees that keys present in the help will be present
- in the output, those fields may be null or empty if the information
- is not known or absent in the file. Also, if you specify
- :samp:`--json-keys`, the keys that are not listed
- will be excluded entirely except for those that
- :samp:`--json-help` says are always present.
-
-- In a few places, there are keys with names containing
- ``pageposfrom1``. The values of these keys are null or an integer. If
- an integer, they point to a page index within the file numbering from
- 1. Note that JSON indexes from 0, and you would also use 0-based
- indexing using the API. However, 1-based indexing is easier in this
- case because the command-line syntax for specifying page ranges is
- 1-based. If you were going to write a program that looked through the
- JSON for information about specific pages and then use the
- command-line to extract those pages, 1-based indexing is easier.
- Besides, it's more convenient to subtract 1 from a program in a real
- programming language than it is to add 1 from shell code.
-
-- The image information included in the ``page`` section of the JSON
- output includes the key "``filterable``". Note that the value of this
- field may depend on the :samp:`--decode-level` that
- you invoke qpdf with. The JSON output includes a top-level key
- "``parameters``" that indicates the decode level used for computing
- whether a stream was filterable. For example, jpeg images will be
- shown as not filterable by default, but they will be shown as
- filterable if you run :command:`qpdf --json
- --decode-level=all`.
-
-.. _ref.design:
-
-Design and Library Notes
-========================
-
-.. _ref.design.intro:
-
-Introduction
-------------
-
-This section was written prior to the implementation of the qpdf package
-and was subsequently modified to reflect the implementation. In some
-cases, for purposes of explanation, it may differ slightly from the
-actual implementation. As always, the source code and test suite are
-authoritative. Even if there are some errors, this document should serve
-as a road map to understanding how this code works.
-
-In general, one should adhere strictly to a specification when writing
-but be liberal in reading. This way, the product of our software will be
-accepted by the widest range of other programs, and we will accept the
-widest range of input files. This library attempts to conform to that
-philosophy whenever possible but also aims to provide strict checking
-for people who want to validate PDF files. If you don't want to see
-warnings and are trying to write something that is tolerant, you can
-call ``setSuppressWarnings(true)``. If you want to fail on the first
-error, you can call ``setAttemptRecovery(false)``. The default behavior
-is to generating warnings for recoverable problems. Note that recovery
-will not always produce the desired results even if it is able to get
-through the file. Unlike most other PDF files that produce generic
-warnings such as "This file is damaged,", qpdf generally issues a
-detailed error message that would be most useful to a PDF developer.
-This is by design as there seems to be a shortage of PDF validation
-tools out there. This was, in fact, one of the major motivations behind
-the initial creation of qpdf.
-
-.. _ref.design-goals:
-
-Design Goals
-------------
-
-The QPDF package includes support for reading and rewriting PDF files.
-It aims to hide from the user details involving object locations,
-modified (appended) PDF files, the directness/indirectness of objects,
-and stream filters including encryption. It does not aim to hide
-knowledge of the object hierarchy or content stream contents. Put
-another way, a user of the qpdf library is expected to have knowledge
-about how PDF files work, but is not expected to have to keep track of
-bookkeeping details such as file positions.
-
-A user of the library never has to care whether an object is direct or
-indirect, though it is possible to determine whether an object is direct
-or not if this information is needed. All access to objects deals with
-this transparently. All memory management details are also handled by
-the library.
-
-The ``PointerHolder`` object is used internally by the library to deal
-with memory management. This is basically a smart pointer object very
-similar in spirit to C++-11's ``std::shared_ptr`` object, but predating
-it by several years. This library also makes use of a technique for
-giving fine-grained access to methods in one class to other classes by
-using public subclasses with friends and only private members that in
-turn call private methods of the containing class. See
-``QPDFObjectHandle::Factory`` as an example.
-
-The top-level qpdf class is ``QPDF``. A ``QPDF`` object represents a PDF
-file. The library provides methods for both accessing and mutating PDF
-files.
-
-The primary class for interacting with PDF objects is
-``QPDFObjectHandle``. Instances of this class can be passed around by
-value, copied, stored in containers, etc. with very low overhead.
-Instances of ``QPDFObjectHandle`` created by reading from a file will
-always contain a reference back to the ``QPDF`` object from which they
-were created. A ``QPDFObjectHandle`` may be direct or indirect. If
-indirect, the ``QPDFObject`` the ``PointerHolder`` initially points to
-is a null pointer. In this case, the first attempt to access the
-underlying ``QPDFObject`` will result in the ``QPDFObject`` being
-resolved via a call to the referenced ``QPDF`` instance. This makes it
-essentially impossible to make coding errors in which certain things
-will work for some PDF files and not for others based on which objects
-are direct and which objects are indirect.
-
-Instances of ``QPDFObjectHandle`` can be directly created and modified
-using static factory methods in the ``QPDFObjectHandle`` class. There
-are factory methods for each type of object as well as a convenience
-method ``QPDFObjectHandle::parse`` that creates an object from a string
-representation of the object. Existing instances of ``QPDFObjectHandle``
-can also be modified in several ways. See comments in
-:file:`QPDFObjectHandle.hh` for details.
-
-An instance of ``QPDF`` is constructed by using the class's default
-constructor. If desired, the ``QPDF`` object may be configured with
-various methods that change its default behavior. Then the
-``QPDF::processFile()`` method is passed the name of a PDF file, which
-permanently associates the file with that QPDF object. A password may
-also be given for access to password-protected files. QPDF does not
-enforce encryption parameters and will treat user and owner passwords
-equivalently. Either password may be used to access an encrypted file.
-``QPDF`` will allow recovery of a user password given an owner password.
-The input PDF file must be seekable. (Output files written by
-``QPDFWriter`` need not be seekable, even when creating linearized
-files.) During construction, ``QPDF`` validates the PDF file's header,
-and then reads the cross reference tables and trailer dictionaries. The
-``QPDF`` class keeps only the first trailer dictionary though it does
-read all of them so it can check the ``/Prev`` key. ``QPDF`` class users
-may request the root object and the trailer dictionary specifically. The
-cross reference table is kept private. Objects may then be requested by
-number of by walking the object tree.
-
-When a PDF file has a cross-reference stream instead of a
-cross-reference table and trailer, requesting the document's trailer
-dictionary returns the stream dictionary from the cross-reference stream
-instead.
-
-There are some convenience routines for very common operations such as
-walking the page tree and returning a vector of all page objects. For
-full details, please see the header files
-:file:`QPDF.hh` and
-:file:`QPDFObjectHandle.hh`. There are also some
-additional helper classes that provide higher level API functions for
-certain document constructions. These are discussed in :ref:`ref.helper-classes`.
-
-.. _ref.helper-classes:
-
-Helper Classes
---------------
-
-QPDF version 8.1 introduced the concept of helper classes. Helper
-classes are intended to contain higher level APIs that allow developers
-to work with certain document constructs at an abstraction level above
-that of ``QPDFObjectHandle`` while staying true to qpdf's philosophy of
-not hiding document structure from the developer. As with qpdf in
-general, the goal is take away some of the more tedious bookkeeping
-aspects of working with PDF files, not to remove the need for the
-developer to understand how the PDF construction in question works. The
-driving factor behind the creation of helper classes was to allow the
-evolution of higher level interfaces in qpdf without polluting the
-interfaces of the main top-level classes ``QPDF`` and
-``QPDFObjectHandle``.
-
-There are two kinds of helper classes: *document* helpers and *object*
-helpers. Document helpers are constructed with a reference to a ``QPDF``
-object and provide methods for working with structures that are at the
-document level. Object helpers are constructed with an instance of a
-``QPDFObjectHandle`` and provide methods for working with specific types
-of objects.
-
-Examples of document helpers include ``QPDFPageDocumentHelper``, which
-contains methods for operating on the document's page trees, such as
-enumerating all pages of a document and adding and removing pages; and
-``QPDFAcroFormDocumentHelper``, which contains document-level methods
-related to interactive forms, such as enumerating form fields and
-creating mappings between form fields and annotations.
-
-Examples of object helpers include ``QPDFPageObjectHelper`` for
-performing operations on pages such as page rotation and some operations
-on content streams, ``QPDFFormFieldObjectHelper`` for performing
-operations related to interactive form fields, and
-``QPDFAnnotationObjectHelper`` for working with annotations.
-
-It is always possible to retrieve the underlying ``QPDF`` reference from
-a document helper and the underlying ``QPDFObjectHandle`` reference from
-an object helper. Helpers are designed to be helpers, not wrappers. The
-intention is that, in general, it is safe to freely intermix operations
-that use helpers with operations that use the underlying objects.
-Document and object helpers do not attempt to provide a complete
-interface for working with the things they are helping with, nor do they
-attempt to encapsulate underlying structures. They just provide a few
-methods to help with error-prone, repetitive, or complex tasks. In some
-cases, a helper object may cache some information that is expensive to
-gather. In such cases, the helper classes are implemented so that their
-own methods keep the cache consistent, and the header file will provide
-a method to invalidate the cache and a description of what kinds of
-operations would make the cache invalid. If in doubt, you can always
-discard a helper class and create a new one with the same underlying
-objects, which will ensure that you have discarded any stale
-information.
-
-By Convention, document helpers are called
-``QPDFSomethingDocumentHelper`` and are derived from
-``QPDFDocumentHelper``, and object helpers are called
-``QPDFSomethingObjectHelper`` and are derived from ``QPDFObjectHelper``.
-For details on specific helpers, please see their header files. You can
-find them by looking at
-:file:`include/qpdf/QPDF*DocumentHelper.hh` and
-:file:`include/qpdf/QPDF*ObjectHelper.hh`.
-
-In order to avoid creation of circular dependencies, the following
-general guidelines are followed with helper classes:
-
-- Core class interfaces do not know about helper classes. For example,
- no methods of ``QPDF`` or ``QPDFObjectHandle`` will include helper
- classes in their interfaces.
-
-- Interfaces of object helpers will usually not use document helpers in
- their interfaces. This is because it is much more useful for document
- helpers to have methods that return object helpers. Most operations
- in PDF files start at the document level and go from there to the
- object level rather than the other way around. It can sometimes be
- useful to map back from object-level structures to document-level
- structures. If there is a desire to do this, it will generally be
- provided by a method in the document helper class.
-
-- Most of the time, object helpers don't know about other object
- helpers. However, in some cases, one type of object may be a
- container for another type of object, in which case it may make sense
- for the outer object to know about the inner object. For example,
- there are methods in the ``QPDFPageObjectHelper`` that know
- ``QPDFAnnotationObjectHelper`` because references to annotations are
- contained in page dictionaries.
-
-- Any helper or core library class may use helpers in their
- implementations.
-
-Prior to qpdf version 8.1, higher level interfaces were added as
-"convenience functions" in either ``QPDF`` or ``QPDFObjectHandle``. For
-compatibility, older convenience functions for operating with pages will
-remain in those classes even as alternatives are provided in helper
-classes. Going forward, new higher level interfaces will be provided
-using helper classes.
-
-.. _ref.implementation-notes:
-
-Implementation Notes
---------------------
-
-This section contains a few notes about QPDF's internal implementation,
-particularly around what it does when it first processes a file. This
-section is a bit of a simplification of what it actually does, but it
-could serve as a starting point to someone trying to understand the
-implementation. There is nothing in this section that you need to know
-to use the qpdf library.
-
-``QPDFObject`` is the basic PDF Object class. It is an abstract base
-class from which are derived classes for each type of PDF object.
-Clients do not interact with Objects directly but instead interact with
-``QPDFObjectHandle``.
-
-When the ``QPDF`` class creates a new object, it dynamically allocates
-the appropriate type of ``QPDFObject`` and immediately hands the pointer
-to an instance of ``QPDFObjectHandle``. The parser reads a token from
-the current file position. If the token is a not either a dictionary or
-array opener, an object is immediately constructed from the single token
-and the parser returns. Otherwise, the parser iterates in a special mode
-in which it accumulates objects until it finds a balancing closer.
-During this process, the "``R``" keyword is recognized and an indirect
-``QPDFObjectHandle`` may be constructed.
-
-The ``QPDF::resolve()`` method, which is used to resolve an indirect
-object, may be invoked from the ``QPDFObjectHandle`` class. It first
-checks a cache to see whether this object has already been read. If not,
-it reads the object from the PDF file and caches it. It the returns the
-resulting ``QPDFObjectHandle``. The calling object handle then replaces
-its ``PointerHolder<QDFObject>`` with the one from the newly returned
-``QPDFObjectHandle``. In this way, only a single copy of any direct
-object need exist and clients can access objects transparently without
-knowing caring whether they are direct or indirect objects.
-Additionally, no object is ever read from the file more than once. That
-means that only the portions of the PDF file that are actually needed
-are ever read from the input file, thus allowing the qpdf package to
-take advantage of this important design goal of PDF files.
-
-If the requested object is inside of an object stream, the object stream
-itself is first read into memory. Then the tokenizer reads objects from
-the memory stream based on the offset information stored in the stream.
-Those individual objects are cached, after which the temporary buffer
-holding the object stream contents are discarded. In this way, the first
-time an object in an object stream is requested, all objects in the
-stream are cached.
-
-The following example should clarify how ``QPDF`` processes a simple
-file.
-
-- Client constructs ``QPDF`` ``pdf`` and calls
- ``pdf.processFile("a.pdf");``.
-
-- The ``QPDF`` class checks the beginning of
- :file:`a.pdf` for a PDF header. It then reads the
- cross reference table mentioned at the end of the file, ensuring that
- it is looking before the last ``%%EOF``. After getting to ``trailer``
- keyword, it invokes the parser.
-
-- The parser sees "``<<``", so it calls itself recursively in
- dictionary creation mode.
-
-- In dictionary creation mode, the parser keeps accumulating objects
- until it encounters "``>>``". Each object that is read is pushed onto
- a stack. If "``R``" is read, the last two objects on the stack are
- inspected. If they are integers, they are popped off the stack and
- their values are used to construct an indirect object handle which is
- then pushed onto the stack. When "``>>``" is finally read, the stack
- is converted into a ``QPDF_Dictionary`` which is placed in a
- ``QPDFObjectHandle`` and returned.
-
-- The resulting dictionary is saved as the trailer dictionary.
-
-- The ``/Prev`` key is searched. If present, ``QPDF`` seeks to that
- point and repeats except that the new trailer dictionary is not
- saved. If ``/Prev`` is not present, the initial parsing process is
- complete.
-
- If there is an encryption dictionary, the document's encryption
- parameters are initialized.
-
-- The client requests root object. The ``QPDF`` class gets the value of
- root key from trailer dictionary and returns it. It is an unresolved
- indirect ``QPDFObjectHandle``.
-
-- The client requests the ``/Pages`` key from root
- ``QPDFObjectHandle``. The ``QPDFObjectHandle`` notices that it is
- indirect so it asks ``QPDF`` to resolve it. ``QPDF`` looks in the
- object cache for an object with the root dictionary's object ID and
- generation number. Upon not seeing it, it checks the cross reference
- table, gets the offset, and reads the object present at that offset.
- It stores the result in the object cache and returns the cached
- result. The calling ``QPDFObjectHandle`` replaces its object pointer
- with the one from the resolved ``QPDFObjectHandle``, verifies that it
- a valid dictionary object, and returns the (unresolved indirect)
- ``QPDFObject`` handle to the top of the Pages hierarchy.
-
- As the client continues to request objects, the same process is
- followed for each new requested object.
-
-.. _ref.casting:
-
-Casting Policy
---------------
-
-This section describes the casting policy followed by qpdf's
-implementation. This is no concern to qpdf's end users and largely of no
-concern to people writing code that uses qpdf, but it could be of
-interest to people who are porting qpdf to a new platform or who are
-making modifications to the code.
-
-The C++ code in qpdf is free of old-style casts except where unavoidable
-(e.g. where the old-style cast is in a macro provided by a third-party
-header file). When there is a need for a cast, it is handled, in order
-of preference, by rewriting the code to avoid the need for a cast,
-calling ``const_cast``, calling ``static_cast``, calling
-``reinterpret_cast``, or calling some combination of the above. As a
-last resort, a compiler-specific ``#pragma`` may be used to suppress a
-warning that we don't want to fix. Examples may include suppressing
-warnings about the use of old-style casts in code that is shared between
-C and C++ code.
-
-The ``QIntC`` namespace, provided by
-:file:`include/qpdf/QIntC.hh`, implements safe
-functions for converting between integer types. These functions do range
-checking and throw a ``std::range_error``, which is subclass of
-``std::runtime_error``, if conversion from one integer type to another
-results in loss of information. There are many cases in which we have to
-move between different integer types because of incompatible integer
-types used in interoperable interfaces. Some are unavoidable, such as
-moving between sizes and offsets, and others are there because of old
-code that is too in entrenched to be fixable without breaking source
-compatibility and causing pain for users. QPDF is compiled with extra
-warnings to detect conversions with potential data loss, and all such
-cases should be fixed by either using a function from ``QIntC`` or a
-``static_cast``.
-
-When the intention is just to switch the type because of exchanging data
-between incompatible interfaces, use ``QIntC``. This is the usual case.
-However, there are some cases in which we are explicitly intending to
-use the exact same bit pattern with a different type. This is most
-common when switching between signed and unsigned characters. A lot of
-qpdf's code uses unsigned characters internally, but ``std::string`` and
-``char`` are signed. Using ``QIntC::to_char`` would be wrong for
-converting from unsigned to signed characters because a negative
-``char`` value and the corresponding ``unsigned char`` value greater
-than 127 *mean the same thing*. There are also
-cases in which we use ``static_cast`` when working with bit fields where
-we are not representing a numerical value but rather a bunch of bits
-packed together in some integer type. Also note that ``size_t`` and
-``long`` both typically differ between 32-bit and 64-bit environments,
-so sometimes an explicit cast may not be needed to avoid warnings on one
-platform but may be needed on another. A conversion with ``QIntC``
-should always be used when the types are different even if the
-underlying size is the same. QPDF's CI build builds on 32-bit and 64-bit
-platforms, and the test suite is very thorough, so it is hard to make
-any of the potential errors here without being caught in build or test.
-
-Non-const ``unsigned char*`` is used in the ``Pipeline`` interface. The
-pipeline interface has a ``write`` call that uses ``unsigned char*``
-without a ``const`` qualifier. The main reason for this is
-to support pipelines that make calls to third-party libraries, such as
-zlib, that don't include ``const`` in their interfaces. Unfortunately,
-there are many places in the code where it is desirable to have
-``const char*`` with pipelines. None of the pipeline implementations
-in qpdf
-currently modify the data passed to write, and doing so would be counter
-to the intent of ``Pipeline``, but there is nothing in the code to
-prevent this from being done. There are places in the code where
-``const_cast`` is used to remove the const-ness of pointers going into
-``Pipeline``\ s. This could theoretically be unsafe, but there is
-adequate testing to assert that it is safe and will remain safe in
-qpdf's code.
-
-.. _ref.encryption:
-
-Encryption
-----------
-
-Encryption is supported transparently by qpdf. When opening a PDF file,
-if an encryption dictionary exists, the ``QPDF`` object processes this
-dictionary using the password (if any) provided. The primary decryption
-key is computed and cached. No further access is made to the encryption
-dictionary after that time. When an object is read from a file, the
-object ID and generation of the object in which it is contained is
-always known. Using this information along with the stored encryption
-key, all stream and string objects are transparently decrypted. Raw
-encrypted objects are never stored in memory. This way, nothing in the
-library ever has to know or care whether it is reading an encrypted
-file.
-
-An interface is also provided for writing encrypted streams and strings
-given an encryption key. This is used by ``QPDFWriter`` when it rewrites
-encrypted files.
-
-When copying encrypted files, unless otherwise directed, qpdf will
-preserve any encryption in force in the original file. qpdf can do this
-with either the user or the owner password. There is no difference in
-capability based on which password is used. When 40 or 128 bit
-encryption keys are used, the user password can be recovered with the
-owner password. With 256 keys, the user and owner passwords are used
-independently to encrypt the actual encryption key, so while either can
-be used, the owner password can no longer be used to recover the user
-password.
-
-Starting with version 4.0.0, qpdf can read files that are not encrypted
-but that contain encrypted attachments, but it cannot write such files.
-qpdf also requires the password to be specified in order to open the
-file, not just to extract attachments, since once the file is open, all
-decryption is handled transparently. When copying files like this while
-preserving encryption, qpdf will apply the file's encryption to
-everything in the file, not just to the attachments. When decrypting the
-file, qpdf will decrypt the attachments. In general, when copying PDF
-files with multiple encryption formats, qpdf will choose the newest
-format. The only exception to this is that clear-text metadata will be
-preserved as clear-text if it is that way in the original file.
-
-One point of confusion some people have about encrypted PDF files is
-that encryption is not the same as password protection. Password
-protected files are always encrypted, but it is also possible to create
-encrypted files that do not have passwords. Internally, such files use
-the empty string as a password, and most readers try the empty string
-first to see if it works and prompt for a password only if the empty
-string doesn't work. Normally such files have an empty user password and
-a non-empty owner password. In that way, if the file is opened by an
-ordinary reader without specification of password, the restrictions
-specified in the encryption dictionary can be enforced. Most users
-wouldn't even realize such a file was encrypted. Since qpdf always
-ignores the restrictions (except for the purpose of reporting what they
-are), qpdf doesn't care which password you use. QPDF will allow you to
-create PDF files with non-empty user passwords and empty owner
-passwords. Some readers will require a password when you open these
-files, and others will open the files without a password and not enforce
-restrictions. Having a non-empty user password and an empty owner
-password doesn't really make sense because it would mean that opening
-the file with the user password would be more restrictive than not
-supplying a password at all. QPDF also allows you to create PDF files
-with the same password as both the user and owner password. Some readers
-will not ever allow such files to be accessed without restrictions
-because they never try the password as the owner password if it works as
-the user password. Nonetheless, one of the powerful aspects of qpdf is
-that it allows you to finely specify the way encrypted files are
-created, even if the results are not useful to some readers. One use
-case for this would be for testing a PDF reader to ensure that it
-handles odd configurations of input files.
-
-.. _ref.random-numbers:
-
-Random Number Generation
-------------------------
-
-QPDF generates random numbers to support generation of encrypted data.
-Starting in qpdf 10.0.0, qpdf uses the crypto provider as its source of
-random numbers. Older versions used the OS-provided source of secure
-random numbers or, if allowed at build time, insecure random numbers
-from stdlib. Starting with version 5.1.0, you can disable use of
-OS-provided secure random numbers at build time. This is especially
-useful on Windows if you want to avoid a dependency on Microsoft's
-cryptography API. You can also supply your own random data provider. For
-details on how to do this, please refer to the top-level README.md file
-in the source distribution and to comments in
-:file:`QUtil.hh`.
-
-.. _ref.adding-and-remove-pages:
-
-Adding and Removing Pages
--------------------------
-
-While qpdf's API has supported adding and modifying objects for some
-time, version 3.0 introduces specific methods for adding and removing
-pages. These are largely convenience routines that handle two tricky
-issues: pushing inheritable resources from the ``/Pages`` tree down to
-individual pages and manipulation of the ``/Pages`` tree itself. For
-details, see ``addPage`` and surrounding methods in
-:file:`QPDF.hh`.
-
-.. _ref.reserved-objects:
-
-Reserving Object Numbers
-------------------------
-
-Version 3.0 of qpdf introduced the concept of reserved objects. These
-are seldom needed for ordinary operations, but there are cases in which
-you may want to add a series of indirect objects with references to each
-other to a ``QPDF`` object. This causes a problem because you can't
-determine the object ID that a new indirect object will have until you
-add it to the ``QPDF`` object with ``QPDF::makeIndirectObject``. The
-only way to add two mutually referential objects to a ``QPDF`` object
-prior to version 3.0 would be to add the new objects first and then make
-them refer to each other after adding them. Now it is possible to create
-a *reserved object* using
-``QPDFObjectHandle::newReserved``. This is an indirect object that stays
-"unresolved" even if it is queried for its type. So now, if you want to
-create a set of mutually referential objects, you can create
-reservations for each one of them and use those reservations to
-construct the references. When finished, you can call
-``QPDF::replaceReserved`` to replace the reserved objects with the real
-ones. This functionality will never be needed by most applications, but
-it is used internally by QPDF when copying objects from other PDF files,
-as discussed in :ref:`ref.foreign-objects`. For an example of how to use reserved
-objects, search for ``newReserved`` in
-:file:`test_driver.cc` in qpdf's sources.
-
-.. _ref.foreign-objects:
-
-Copying Objects From Other PDF Files
-------------------------------------
-
-Version 3.0 of qpdf introduced the ability to copy objects into a
-``QPDF`` object from a different ``QPDF`` object, which we refer to as
-*foreign objects*. This allows arbitrary
-merging of PDF files. The "from" ``QPDF`` object must remain valid after
-the copy as discussed in the note below. The
-:command:`qpdf` command-line tool provides limited
-support for basic page selection, including merging in pages from other
-files, but the library's API makes it possible to implement arbitrarily
-complex merging operations. The main method for copying foreign objects
-is ``QPDF::copyForeignObject``. This takes an indirect object from
-another ``QPDF`` and copies it recursively into this object while
-preserving all object structure, including circular references. This
-means you can add a direct object that you create from scratch to a
-``QPDF`` object with ``QPDF::makeIndirectObject``, and you can add an
-indirect object from another file with ``QPDF::copyForeignObject``. The
-fact that ``QPDF::makeIndirectObject`` does not automatically detect a
-foreign object and copy it is an explicit design decision. Copying a
-foreign object seems like a sufficiently significant thing to do that it
-should be done explicitly.
-
-The other way to copy foreign objects is by passing a page from one
-``QPDF`` to another by calling ``QPDF::addPage``. In contrast to
-``QPDF::makeIndirectObject``, this method automatically distinguishes
-between indirect objects in the current file, foreign objects, and
-direct objects.
-
-Please note: when you copy objects from one ``QPDF`` to another, the
-source ``QPDF`` object must remain valid until you have finished with
-the destination object. This is because the original object is still
-used to retrieve any referenced stream data from the copied object.
-
-.. _ref.rewriting:
-
-Writing PDF Files
------------------
-
-The qpdf library supports file writing of ``QPDF`` objects to PDF files
-through the ``QPDFWriter`` class. The ``QPDFWriter`` class has two
-writing modes: one for non-linearized files, and one for linearized
-files. See :ref:`ref.linearization` for a description of
-linearization is implemented. This section describes how we write
-non-linearized files including the creation of QDF files (see :ref:`ref.qdf`.
-
-This outline was written prior to implementation and is not exactly
-accurate, but it provides a correct "notional" idea of how writing
-works. Look at the code in ``QPDFWriter`` for exact details.
-
-- Initialize state:
-
- - next object number = 1
-
- - object queue = empty
-
- - renumber table: old object id/generation to new id/0 = empty
-
- - xref table: new id -> offset = empty
-
-- Create a QPDF object from a file.
-
-- Write header for new PDF file.
-
-- Request the trailer dictionary.
-
-- For each value that is an indirect object, grab the next object
- number (via an operation that returns and increments the number). Map
- object to new number in renumber table. Push object onto queue.
-
-- While there are more objects on the queue:
-
- - Pop queue.
-
- - Look up object's new number *n* in the renumbering table.
-
- - Store current offset into xref table.
-
- - Write ``:samp:`{n}` 0 obj``.
-
- - If object is null, whether direct or indirect, write out null,
- thus eliminating unresolvable indirect object references.
-
- - If the object is a stream stream, write stream contents, piped
- through any filters as required, to a memory buffer. Use this
- buffer to determine the stream length.
-
- - If object is not a stream, array, or dictionary, write out its
- contents.
-
- - If object is an array or dictionary (including stream), traverse
- its elements (for array) or values (for dictionaries), handling
- recursive dictionaries and arrays, looking for indirect objects.
- When an indirect object is found, if it is not resolvable, ignore.
- (This case is handled when writing it out.) Otherwise, look it up
- in the renumbering table. If not found, grab the next available
- object number, assign to the referenced object in the renumbering
- table, and push the referenced object onto the queue. As a special
- case, when writing out a stream dictionary, replace length,
- filters, and decode parameters as required.
-
- Write out dictionary or array, replacing any unresolvable indirect
- object references with null (pdf spec says reference to
- non-existent object is legal and resolves to null) and any
- resolvable ones with references to the renumbered objects.
-
- - If the object is a stream, write ``stream\n``, the stream contents
- (from the memory buffer), and ``\nendstream\n``.
-
- - When done, write ``endobj``.
-
-Once we have finished the queue, all referenced objects will have been
-written out and all deleted objects or unreferenced objects will have
-been skipped. The new cross-reference table will contain an offset for
-every new object number from 1 up to the number of objects written. This
-can be used to write out a new xref table. Finally we can write out the
-trailer dictionary with appropriately computed /ID (see spec, 8.3, File
-Identifiers), the cross reference table offset, and ``%%EOF``.
-
-.. _ref.filtered-streams:
-
-Filtered Streams
-----------------
-
-Support for streams is implemented through the ``Pipeline`` interface
-which was designed for this package.
-
-When reading streams, create a series of ``Pipeline`` objects. The
-``Pipeline`` abstract base requires implementation ``write()`` and
-``finish()`` and provides an implementation of ``getNext()``. Each
-pipeline object, upon receiving data, does whatever it is going to do
-and then writes the data (possibly modified) to its successor.
-Alternatively, a pipeline may be an end-of-the-line pipeline that does
-something like store its output to a file or a memory buffer ignoring a
-successor. For additional details, look at
-:file:`Pipeline.hh`.
-
-``QPDF`` can read raw or filtered streams. When reading a filtered
-stream, the ``QPDF`` class creates a ``Pipeline`` object for one of each
-appropriate filter object and chains them together. The last filter
-should write to whatever type of output is required. The ``QPDF`` class
-has an interface to write raw or filtered stream contents to a given
-pipeline.
-
-.. _ref.object-accessors:
-
-Object Accessor Methods
------------------------
-
-..
- This section is referenced in QPDFObjectHandle.hh
-
-For general information about how to access instances of
-``QPDFObjectHandle``, please see the comments in
-:file:`QPDFObjectHandle.hh`. Search for "Accessor
-methods". This section provides a more in-depth discussion of the
-behavior and the rationale for the behavior.
-
-*Why were type errors made into warnings?* When type checks were
-introduced into qpdf in the early days, it was expected that type errors
-would only occur as a result of programmer error. However, in practice,
-type errors would occur with malformed PDF files because of assumptions
-made in code, including code within the qpdf library and code written by
-library users. The most common case would be chaining calls to
-``getKey()`` to access keys deep within a dictionary. In many cases,
-qpdf would be able to recover from these situations, but the old
-behavior often resulted in crashes rather than graceful recovery. For
-this reason, the errors were changed to warnings.
-
-*Why even warn about type errors when the user can't usually do anything
-about them?* Type warnings are extremely valuable during development.
-Since it's impossible to catch at compile time things like typos in
-dictionary key names or logic errors around what the structure of a PDF
-file might be, the presence of type warnings can save lots of developer
-time. They have also proven useful in exposing issues in qpdf itself
-that would have otherwise gone undetected.
-
-*Can there be a type-safe ``QPDFObjectHandle``?* It would be great if
-``QPDFObjectHandle`` could be more strongly typed so that you'd have to
-have check that something was of a particular type before calling
-type-specific accessor methods. However, implementing this at this stage
-of the library's history would be quite difficult, and it would make a
-the common pattern of drilling into an object no longer work. While it
-would be possible to have a parallel interface, it would create a lot of
-extra code. If qpdf were written in a language like rust, an interface
-like this would make a lot of sense, but, for a variety of reasons, the
-qpdf API is consistent with other APIs of its time, relying on exception
-handling to catch errors. The underlying PDF objects are inherently not
-type-safe. Forcing stronger type safety in ``QPDFObjectHandle`` would
-ultimately cause a lot more code to have to be written and would like
-make software that uses qpdf more brittle, and even so, checks would
-have to occur at runtime.
-
-*Why do type errors sometimes raise exceptions?* The way warnings work
-in qpdf requires a ``QPDF`` object to be associated with an object
-handle for a warning to be issued. It would be nice if this could be
-fixed, but it would require major changes to the API. Rather than
-throwing away these conditions, we convert them to exceptions. It's not
-that bad though. Since any object handle that was read from a file has
-an associated ``QPDF`` object, it would only be type errors on objects
-that were created explicitly that would cause exceptions, and in that
-case, type errors are much more likely to be the result of a coding
-error than invalid input.
-
-*Why does the behavior of a type exception differ between the C and C++
-API?* There is no way to throw and catch exceptions in C short of
-something like ``setjmp`` and ``longjmp``, and that approach is not
-portable across language barriers. Since the C API is often used from
-other languages, it's important to keep things as simple as possible.
-Starting in qpdf 10.5, exceptions that used to crash code using the C
-API will be written to stderr by default, and it is possible to register
-an error handler. There's no reason that the error handler can't
-simulate exception handling in some way, such as by using ``setjmp`` and
-``longjmp`` or by setting some variable that can be checked after
-library calls are made. In retrospect, it might have been better if the
-C API object handle methods returned error codes like the other methods
-and set return values in passed-in pointers, but this would complicate
-both the implementation and the use of the library for a case that is
-actually quite rare and largely avoidable.
-
-.. _ref.linearization:
-
-Linearization
-=============
-
-This chapter describes how ``QPDF`` and ``QPDFWriter`` implement
-creation and processing of linearized PDFS.
-
-.. _ref.linearization-strategy:
-
-Basic Strategy for Linearization
---------------------------------
-
-To avoid the incestuous problem of having the qpdf library validate its
-own linearized files, we have a special linearized file checking mode
-which can be invoked via :command:`qpdf
---check-linearization` (or :command:`qpdf
---check`). This mode reads the linearization parameter
-dictionary and the hint streams and validates that object ordering,
-parameters, and hint stream contents are correct. The validation code
-was first tested against linearized files created by external tools
-(Acrobat and pdlin) and then used to validate files created by
-``QPDFWriter`` itself.
-
-.. _ref.linearized.preparation:
-
-Preparing For Linearization
----------------------------
-
-Before creating a linearized PDF file from any other PDF file, the PDF
-file must be altered such that all page attributes are propagated down
-to the page level (and not inherited from parents in the ``/Pages``
-tree). We also have to know which objects refer to which other objects,
-being concerned with page boundaries and a few other cases. We refer to
-this part of preparing the PDF file as
-*optimization*, discussed in
-:ref:`ref.optimization`. Note the, in this context, the
-term *optimization* is a qpdf term, and the
-term *linearization* is a term from the PDF
-specification. Do not be confused by the fact that many applications
-refer to linearization as optimization or web optimization.
-
-When creating linearized PDF files from optimized PDF files, there are
-really only a few issues that need to be dealt with:
-
-- Creation of hints tables
-
-- Placing objects in the correct order
-
-- Filling in offsets and byte sizes
-
-.. _ref.optimization:
-
-Optimization
-------------
-
-In order to perform various operations such as linearization and
-splitting files into pages, it is necessary to know which objects are
-referenced by which pages, page thumbnails, and root and trailer
-dictionary keys. It is also necessary to ensure that all page-level
-attributes appear directly at the page level and are not inherited from
-parents in the pages tree.
-
-We refer to the process of enforcing these constraints as
-*optimization*. As mentioned above, note
-that some applications refer to linearization as optimization. Although
-this optimization was initially motivated by the need to create
-linearized files, we are using these terms separately.
-
-PDF file optimization is implemented in the
-:file:`QPDF_optimization.cc` source file. That file
-is richly commented and serves as the primary reference for the
-optimization process.
-
-After optimization has been completed, the private member variables
-``obj_user_to_objects`` and ``object_to_obj_users`` in ``QPDF`` have
-been populated. Any object that has more than one value in the
-``object_to_obj_users`` table is shared. Any object that has exactly one
-value in the ``object_to_obj_users`` table is private. To find all the
-private objects in a page or a trailer or root dictionary key, one
-merely has make this determination for each element in the
-``obj_user_to_objects`` table for the given page or key.
-
-Note that pages and thumbnails have different object user types, so the
-above test on a page will not include objects referenced by the page's
-thumbnail dictionary and nothing else.
-
-.. _ref.linearization.writing:
-
-Writing Linearized Files
-------------------------
-
-We will create files with only primary hint streams. We will never write
-overflow hint streams. (As of PDF version 1.4, Acrobat doesn't either,
-and they are never necessary.) The hint streams contain offset
-information to objects that point to where they would be if the hint
-stream were not present. This means that we have to calculate all object
-positions before we can generate and write the hint table. This means
-that we have to generate the file in two passes. To make this reliable,
-``QPDFWriter`` in linearization mode invokes exactly the same code twice
-to write the file to a pipeline.
-
-In the first pass, the target pipeline is a count pipeline chained to a
-discard pipeline. The count pipeline simply passes its data through to
-the next pipeline in the chain but can return the number of bytes passed
-through it at any intermediate point. The discard pipeline is an end of
-line pipeline that just throws its data away. The hint stream is not
-written and dummy values with adequate padding are stored in the first
-cross reference table, linearization parameter dictionary, and /Prev key
-of the first trailer dictionary. All the offset, length, object
-renumbering information, and anything else we need for the second pass
-is stored.
-
-At the end of the first pass, this information is passed to the ``QPDF``
-class which constructs a compressed hint stream in a memory buffer and
-returns it. ``QPDFWriter`` uses this information to write a complete
-hint stream object into a memory buffer. At this point, the length of
-the hint stream is known.
-
-In the second pass, the end of the pipeline chain is a regular file
-instead of a discard pipeline, and we have known values for all the
-offsets and lengths that we didn't have in the first pass. We have to
-adjust offsets that appear after the start of the hint stream by the
-length of the hint stream, which is known. Anything that is of variable
-length is padded, with the padding code surrounding any writing code
-that differs in the two passes. This ensures that changes to the way
-things are represented never results in offsets that were gathered
-during the first pass becoming incorrect for the second pass.
-
-Using this strategy, we can write linearized files to a non-seekable
-output stream with only a single pass to disk or wherever the output is
-going.
-
-.. _ref.linearization-data:
-
-Calculating Linearization Data
-------------------------------
-
-Once a file is optimized, we have information about which objects access
-which other objects. We can then process these tables to decide which
-part (as described in "Linearized PDF Document Structure" in the PDF
-specification) each object is contained within. This tells us the exact
-order in which objects are written. The ``QPDFWriter`` class asks for
-this information and enqueues objects for writing in the proper order.
-It also turns on a check that causes an exception to be thrown if an
-object is encountered that has not already been queued. (This could
-happen only if there were a bug in the traversal code used to calculate
-the linearization data.)
-
-.. _ref.linearization-issues:
-
-Known Issues with Linearization
--------------------------------
-
-There are a handful of known issues with this linearization code. These
-issues do not appear to impact the behavior of linearized files which
-still work as intended: it is possible for a web browser to begin to
-display them before they are fully downloaded. In fact, it seems that
-various other programs that create linearized files have many of these
-same issues. These items make reference to terminology used in the
-linearization appendix of the PDF specification.
-
-- Thread Dictionary information keys appear in part 4 with the rest of
- Threads instead of in part 9. Objects in part 9 are not grouped
- together functionally.
-
-- We are not calculating numerators for shared object positions within
- content streams or interleaving them within content streams.
-
-- We generate only page offset, shared object, and outline hint tables.
- It would be relatively easy to add some additional tables. We gather
- most of the information needed to create thumbnail hint tables. There
- are comments in the code about this.
-
-.. _ref.linearization-debugging:
-
-Debugging Note
---------------
-
-The :command:`qpdf --show-linearization` command can show
-the complete contents of linearization hint streams. To look at the raw
-data, you can extract the filtered contents of the linearization hint
-tables using :command:`qpdf --show-object=n
---filtered-stream-data`. Then, to convert this into a bit
-stream (since linearization tables are bit streams written without
-regard to byte boundaries), you can pipe the resulting data through the
-following perl code:
-
-.. code-block:: perl
-
- use bytes;
- binmode STDIN;
- undef $/;
- my $a = <STDIN>;
- my @ch = split(//, $a);
- map { printf("%08b", ord($_)) } @ch;
- print "\n";
-
-.. _ref.object-and-xref-streams:
-
-Object and Cross-Reference Streams
-==================================
-
-This chapter provides information about the implementation of object
-stream and cross-reference stream support in qpdf.
-
-.. _ref.object-streams:
-
-Object Streams
---------------
-
-Object streams can contain any regular object except the following:
-
-- stream objects
-
-- objects with generation > 0
-
-- the encryption dictionary
-
-- objects containing the /Length of another stream
-
-In addition, Adobe reader (at least as of version 8.0.0) appears to not
-be able to handle having the document catalog appear in an object stream
-if the file is encrypted, though this is not specifically disallowed by
-the specification.
-
-There are additional restrictions for linearized files. See
-:ref:`ref.object-streams-linearization` for details.
-
-The PDF specification refers to objects in object streams as "compressed
-objects" regardless of whether the object stream is compressed.
-
-The generation number of every object in an object stream must be zero.
-It is possible to delete and replace an object in an object stream with
-a regular object.
-
-The object stream dictionary has the following keys:
-
-- ``/N``: number of objects
-
-- ``/First``: byte offset of first object
-
-- ``/Extends``: indirect reference to stream that this extends
-
-Stream collections are formed with ``/Extends``. They must form a
-directed acyclic graph. These can be used for semantic information and
-are not meaningful to the PDF document's syntactic structure. Although
-qpdf preserves stream collections, it never generates them and doesn't
-make use of this information in any way.
-
-The specification recommends limiting the number of objects in object
-stream for efficiency in reading and decoding. Acrobat 6 uses no more
-than 100 objects per object stream for linearized files and no more 200
-objects per stream for non-linearized files. ``QPDFWriter``, in object
-stream generation mode, never puts more than 100 objects in an object
-stream.
-
-Object stream contents consists of *N* pairs of integers, each of which
-is the object number and the byte offset of the object relative to the
-first object in the stream, followed by the objects themselves,
-concatenated.
-
-.. _ref.xref-streams:
-
-Cross-Reference Streams
------------------------
-
-For non-hybrid files, the value following ``startxref`` is the byte
-offset to the xref stream rather than the word ``xref``.
-
-For hybrid files (files containing both xref tables and cross-reference
-streams), the xref table's trailer dictionary contains the key
-``/XRefStm`` whose value is the byte offset to a cross-reference stream
-that supplements the xref table. A PDF 1.5-compliant application should
-read the xref table first. Then it should replace any object that it has
-already seen with any defined in the xref stream. Then it should follow
-any ``/Prev`` pointer in the original xref table's trailer dictionary.
-The specification is not clear about what should be done, if anything,
-with a ``/Prev`` pointer in the xref stream referenced by an xref table.
-The ``QPDF`` class ignores it, which is probably reasonable since, if
-this case were to appear for any sensible PDF file, the previous xref
-table would probably have a corresponding ``/XRefStm`` pointer of its
-own. For example, if a hybrid file were appended, the appended section
-would have its own xref table and ``/XRefStm``. The appended xref table
-would point to the previous xref table which would point the
-``/XRefStm``, meaning that the new ``/XRefStm`` doesn't have to point to
-it.
-
-Since xref streams must be read very early, they may not be encrypted,
-and the may not contain indirect objects for keys required to read them,
-which are these:
-
-- ``/Type``: value ``/XRef``
-
-- ``/Size``: value *n+1*: where *n* is highest object number (same as
- ``/Size`` in the trailer dictionary)
-
-- ``/Index`` (optional): value
- ``[:samp:`{n count}` ...]`` used to determine
- which objects' information is stored in this stream. The default is
- ``[0 /Size]``.
-
-- ``/Prev``: value :samp:`{offset}`: byte
- offset of previous xref stream (same as ``/Prev`` in the trailer
- dictionary)
-
-- ``/W [...]``: sizes of each field in the xref table
-
-The other fields in the xref stream, which may be indirect if desired,
-are the union of those from the xref table's trailer dictionary.
-
-.. _ref.xref-stream-data:
-
-Cross-Reference Stream Data
-~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The stream data is binary and encoded in big-endian byte order. Entries
-are concatenated, and each entry has a length equal to the total of the
-entries in ``/W`` above. Each entry consists of one or more fields, the
-first of which is the type of the field. The number of bytes for each
-field is given by ``/W`` above. A 0 in ``/W`` indicates that the field
-is omitted and has the default value. The default value for the field
-type is "``1``". All other default values are "``0``".
-
-PDF 1.5 has three field types:
-
-- 0: for free objects. Format: ``0 obj next-generation``, same as the
- free table in a traditional cross-reference table
-
-- 1: regular non-compressed object. Format: ``1 offset generation``
-
-- 2: for objects in object streams. Format: ``2 object-stream-number
- index``, the number of object stream containing the object and the
- index within the object stream of the object.
-
-It seems standard to have the first entry in the table be ``0 0 0``
-instead of ``0 0 ffff`` if there are no deleted objects.
-
-.. _ref.object-streams-linearization:
-
-Implications for Linearized Files
----------------------------------
-
-For linearized files, the linearization dictionary, document catalog,
-and page objects may not be contained in object streams.
-
-Objects stored within object streams are given the highest range of
-object numbers within the main and first-page cross-reference sections.
-
-It is okay to use cross-reference streams in place of regular xref
-tables. There are on special considerations.
-
-Hint data refers to object streams themselves, not the objects in the
-streams. Shared object references should also be made to the object
-streams. There are no reference in any hint tables to the object numbers
-of compressed objects (objects within object streams).
-
-When numbering objects, all shared objects within both the first and
-second halves of the linearized files must be numbered consecutively
-after all normal uncompressed objects in that half.
-
-.. _ref.object-stream-implementation:
-
-Implementation Notes
---------------------
-
-There are three modes for writing object streams:
-:samp:`disable`, :samp:`preserve`, and
-:samp:`generate`. In disable mode, we do not generate
-any object streams, and we also generate an xref table rather than xref
-streams. This can be used to generate PDF files that are viewable with
-older readers. In preserve mode, we write object streams such that
-written object streams contain the same objects and ``/Extends``
-relationships as in the original file. This is equal to disable if the
-file has no object streams. In generate, we create object streams
-ourselves by grouping objects that are allowed in object streams
-together in sets of no more than 100 objects. We also ensure that the
-PDF version is at least 1.5 in generate mode, but we preserve the
-version header in the other modes. The default is
-:samp:`preserve`.
-
-We do not support creation of hybrid files. When we write files, even in
-preserve mode, we will lose any xref tables and merge any appended
-sections.
-
-.. _ref.release-notes:
-
-Release Notes
-=============
-
-For a detailed list of changes, please see the file
-:file:`ChangeLog` in the source distribution.
-
-10.5.0: XXX Month dd, YYYY
- - Library Enhancements
-
- - Since qpdf version 8, using object accessor methods on an
- instance of ``QPDFObjectHandle`` may create warnings if the
- object is not of the expected type. These warnings now have an
- error code of ``qpdf_e_object`` instead of
- ``qpdf_e_damaged_pdf``. Also, comments have been added to
- :file:`QPDFObjectHandle.hh` to explain in more detail what the
- behavior is. See :ref:`ref.object-accessors` for a more in-depth
- discussion.
-
- - Add ``Pl_Buffer::getMallocBuffer()`` to initialize a buffer
- allocated with ``malloc()`` for better cross-language
- interoperability.
-
- - C API Enhancements
-
- - Overhaul error handling for the object handle functions C API.
- Some rare error conditions that would previously have caused a
- crash are now trapped and reported, and the functions that
- generate them return fallback values. See comments in the
- ``ERROR HANDLING`` section of :file:`include/qpdf/qpdf-c.h` for
- details. In particular, exceptions thrown by the underlying C++
- code when calling object accessors are caught and converted into
- errors. The errors can be checked by call ``qpdf_has_error``.
- Use ``qpdf_silence_errors`` to prevent the error from being
- written to stderr.
-
- - Add ``qpdf_get_last_string_length`` to the C API to get the
- length of the last string that was returned. This is needed to
- handle strings that contain embedded null characters.
-
- - Add ``qpdf_oh_is_initialized`` and
- ``qpdf_oh_new_uninitialized`` to the C API to make it possible
- to work with uninitialized objects.
-
- - Add ``qpdf_oh_new_object`` to the C API. This allows you to
- clone an object handle.
-
- - Add ``qpdf_get_object_by_id``, ``qpdf_make_indirect_object``,
- and ``qpdf_replace_object``, exposing the corresponding methods
- in ``QPDF`` and ``QPDFObjectHandle``.
-
- - Add several functions for working with pages. See ``PAGE
- FUNCTIONS`` in ``include/qpdf/qpdf-c.h`` for details.
-
- - Add several functions for working with streams. See ``STREAM
- FUNCTIONS`` in ``include/qpdf/qpdf-c.h`` for details.
-
- - Add ``qpdf_oh_get_type_code`` and ``qpdf_oh_get_type_name``.
-
- - Documentation change
-
- - The documentation sources have been switched from docbook to
- reStructuredText processed with `Sphinx
- <https://sphinx-doc.org>`__. This is mostly transparent (other
- than format change) with the exception that all section links
- have changed. What used to be `#ref.something` is now
- `#something`. A top-to-bottom review of the documentation is
- planned for an upcoming release.
-
-10.4.0: November 16, 2021
- - Handling of Weak Cryptography Algorithms
-
- - From the qpdf CLI, the
- :samp:`--allow-weak-crypto` is now required to
- suppress a warning when explicitly creating PDF files using RC4
- encryption. While qpdf will always retain the ability to read
- and write such files, doing so will require explicit
- acknowledgment moving forward. For qpdf 10.4, this change only
- affects the command-line tool. Starting in qpdf 11, there will
- be small API changes to require explicit acknowledgment in
- those cases as well. For additional information, see :ref:`ref.weak-crypto`.
-
- - Bug Fixes
-
- - Fix potential bounds error when handling shell completion that
- could occur when given bogus input.
-
- - Properly handle overlay/underlay on completely empty pages
- (with no resource dictionary).
-
- - Fix crash that could occur under certain conditions when using
- :samp:`--pages` with files that had form
- fields.
-
- - Library Enhancements
-
- - Make ``QPDF::findPage`` functions public.
-
- - Add methods to ``Pl_Flate`` to be able to receive warnings on
- certain recoverable conditions.
-
- - Add an extra check to the library to detect when foreign
- objects are inserted directly (instead of using
- ``QPDF::copyForeignObject``) at the time of insertion rather
- than when the file is written. Catching the error sooner makes
- it much easier to locate the incorrect code.
-
- - CLI Enhancements
-
- - Improve diagnostics around parsing
- :samp:`--pages` command-line options
-
- - Packaging Changes
-
- - The Windows binary distribution is now built with crypto
- provided by OpenSSL 3.0.
-
-10.3.2: May 8, 2021
- - Bug Fixes
-
- - When generating a file while preserving object streams,
- unreferenced objects are correctly removed unless
- :samp:`--preserve-unreferenced` is specified.
-
- - Library Enhancements
-
- - When adding a page that already exists, make a shallow copy
- instead of throwing an exception. This makes the library
- behavior consistent with the CLI behavior. See
- :file:`ChangeLog` for additional notes.
-
-10.3.1: March 11, 2021
- - Bug Fixes
-
- - Form field copying failed on files where /DR was a direct
- object in the document-level form dictionary.
-
-10.3.0: March 4, 2021
- - Bug Fixes
-
- - The code for handling form fields when copying pages from
- 10.2.0 was not quite right and didn't work in a number of
- situations, such as when the same page was copied multiple
- times or when there were conflicting resource or field names
- across multiple copies. The 10.3.0 code has been much more
- thoroughly tested with more complex cases and with a multitude
- of readers and should be much closer to correct. The 10.2.0
- code worked well enough for page splitting or for copying pages
- with form fields into documents that didn't already have them
- but was still not quite correct in handling of field-level
- resources.
-
- - When ``QPDF::replaceObject`` or ``QPDF::swapObjects`` is
- called, existing ``QPDFObjectHandle`` instances no longer point
- to the old objects. The next time they are accessed, they
- automatically notice the change to the underlying object and
- update themselves. This resolves a very longstanding source of
- confusion, albeit in a very rarely used method call.
-
- - Fix form field handling code to look for default appearances,
- quadding, and default resources in the right places. The code
- was not looking for things in the document-level interactive
- form dictionary that it was supposed to be finding there. This
- required adding a few new methods to
- ``QPDFFormFieldObjectHelper``.
-
- - Library Enhancements
-
- - Reworked the code that handles copying annotations and form
- fields during page operations. There were additional methods
- added to the public API from 10.2.0 and a one deprecation of a
- method added in 10.2.0. The majority of the API changes are in
- methods most people would never call and that will hopefully be
- superseded by higher-level interfaces for handling page copies.
- Please see the :file:`ChangeLog` file for
- details.
-
- - The method ``QPDF::numWarnings`` was added so that you can tell
- whether any warnings happened during a specific block of code.
-
-10.2.0: February 23, 2021
- - CLI Behavior Changes
-
- - Operations that work on combining pages are much better about
- protecting form fields. In particular,
- :samp:`--split-pages` and
- :samp:`--pages` now preserve interaction form
- functionality by copying the relevant form field information
- from the original files. Additionally, if you use
- :samp:`--pages` to select only some pages from
- the original input file, unused form fields are removed, which
- prevents lots of unused annotations from being retained.
-
- - By default, :command:`qpdf` no longer allows
- creation of encrypted PDF files whose user password is
- non-empty and owner password is empty when a 256-bit key is in
- use. The :samp:`--allow-insecure` option,
- specified inside the :samp:`--encrypt` options,
- allows creation of such files. Behavior changes in the CLI are
- avoided when possible, but an exception was made here because
- this is security-related. qpdf must always allow creation of
- weird files for testing purposes, but it should not default to
- letting users unknowingly create insecure files.
-
- - Library Behavior Changes
-
- - Note: the changes in this section cause differences in output
- in some cases. These differences change the syntax of the PDF
- but do not change the semantics (meaning). I make a strong
- effort to avoid gratuitous changes in qpdf's output so that
- qpdf changes don't break people's tests. In this case, the
- changes significantly improve the readability of the generated
- PDF and don't affect any output that's generated by simple
- transformation. If you are annoyed by having to update test
- files, please rest assured that changes like this have been and
- will continue to be rare events.
-
- - ``QPDFObjectHandle::newUnicodeString`` now uses whichever of
- ASCII, PDFDocEncoding, of UTF-16 is sufficient to encode all
- the characters in the string. This reduces needless encoding in
- UTF-16 of strings that can be encoded in ASCII. This change may
- cause qpdf to generate different output than before when form
- field values are set using ``QPDFFormFieldObjectHelper`` but
- does not change the meaning of the output.
-
- - The code that places form XObjects and also the code that
- flattens rotations trim trailing zeroes from real numbers that
- they calculate. This causes slight (but semantically
- equivalent) differences in generated appearance streams and
- form XObject invocations in overlay/underlay code or in user
- code that calls the methods that place form XObjects on a page.
-
- - CLI Enhancements
-
- - Add new command line options for listing, saving, adding,
- removing, and and copying file attachments. See :ref:`ref.attachments` for details.
-
- - Page splitting and merging operations, as well as
- :samp:`--flatten-rotation`, are better behaved
- with respect to annotations and interactive form fields. In
- most cases, interactive form field functionality and proper
- formatting and functionality of annotations is preserved by
- these operations. There are still some cases that aren't
- perfect, such as when functionality of annotations depends on
- document-level data that qpdf doesn't yet understand or when
- there are problems with referential integrity among form fields
- and annotations (e.g., when a single form field object or its
- associated annotations are shared across multiple pages, a case
- that is out of spec but that works in most viewers anyway).
-
- - The option
- :samp:`--password-file={filename}`
- can now be used to read the decryption password from a file.
- You can use ``-`` as the file name to read the password from
- standard input. This is an easier/more obvious way to read
- passwords from files or standard input than using
- :samp:`@file` for this purpose.
-
- - Add some information about attachments to the json output, and
- added ``attachments`` as an additional json key. The
- information included here is limited to the preferred name and
- content stream and a reference to the file spec object. This is
- enough detail for clients to avoid the hassle of navigating a
- name tree and provides what is needed for basic enumeration and
- extraction of attachments. More detailed information can be
- obtained by following the reference to the file spec object.
-
- - Add numeric option to :samp:`--collate`. If
- :samp:`--collate={n}`
- is given, take pages in groups of
- :samp:`{n}` from the given files.
-
- - It is now valid to provide :samp:`--rotate=0`
- to clear rotation from a page.
-
- - Library Enhancements
-
- - This release includes numerous additions to the API. Not all
- changes are listed here. Please see the
- :file:`ChangeLog` file in the source
- distribution for a comprehensive list. Highlights appear below.
-
- - Add ``QPDFObjectHandle::ditems()`` and
- ``QPDFObjectHandle::aitems()`` that enable C++-style iteration,
- including range-for iteration, over dictionary and array
- QPDFObjectHandles. See comments in
- :file:`include/qpdf/QPDFObjectHandle.hh`
- and
- :file:`examples/pdf-name-number-tree.cc`
- for details.
-
- - Add ``QPDFObjectHandle::copyStream`` for making a copy of a
- stream within the same ``QPDF`` instance.
-
- - Add new helper classes for supporting file attachments, also
- known as embedded files. New classes are
- ``QPDFEmbeddedFileDocumentHelper``,
- ``QPDFFileSpecObjectHelper``, and ``QPDFEFStreamObjectHelper``.
- See their respective headers for details and
- :file:`examples/pdf-attach-file.cc` for an
- example.
-
- - Add a version of ``QPDFObjectHandle::parse`` that takes a
- ``QPDF`` pointer as context so that it can parse strings
- containing indirect object references. This is illustrated in
- :file:`examples/pdf-attach-file.cc`.
-
- - Re-implement ``QPDFNameTreeObjectHelper`` and
- ``QPDFNumberTreeObjectHelper`` to be more efficient, add an
- iterator-based API, give them the capability to repair broken
- trees, and create methods for modifying the trees. With this
- change, qpdf has a robust read/write implementation of name and
- number trees.
-
- - Add new versions of ``QPDFObjectHandle::replaceStreamData``
- that take ``std::function`` objects for cases when you need
- something between a static string and a full-fledged
- StreamDataProvider. Using this with ``QUtil::file_provider`` is
- a very easy way to create a stream from the contents of a file.
-
- - The ``QPDFMatrix`` class, formerly a private, internal class,
- has been added to the public API. See
- :file:`include/qpdf/QPDFMatrix.hh` for
- details. This class is for working with transformation
- matrices. Some methods in ``QPDFPageObjectHelper`` make use of
- this to make information about transformation matrices
- available. For an example, see
- :file:`examples/pdf-overlay-page.cc`.
-
- - Several new methods were added to
- ``QPDFAcroFormDocumentHelper`` for adding, removing, getting
- information about, and enumerating form fields.
-
- - Add method
- ``QPDFAcroFormDocumentHelper::transformAnnotations``, which
- applies a transformation to each annotation on a page.
-
- - Add ``QPDFPageObjectHelper::copyAnnotations``, which copies
- annotations and, if applicable, associated form fields, from
- one page to another, possibly transforming the rectangles.
-
- - Build Changes
-
- - A C++-14 compiler is now required to build qpdf. There is no
- intention to require anything newer than that for a while.
- C++-14 includes modest enhancements to C++-11 and appears to be
- supported about as widely as C++-11.
-
- - Bug Fixes
-
- - The :samp:`--flatten-rotation` option applies
- transformations to any annotations that may be on the page.
-
- - If a form XObject lacks a resources dictionary, consider any
- names in that form XObject to be referenced from the containing
- page. This is compliant with older PDF versions. Also detect if
- any form XObjects have any unresolved names and, if so, don't
- remove unreferenced resources from them or from the page that
- contains them. Unfortunately this has the side effect of
- preventing removal of unreferenced resources in some cases
- where names appear that don't refer to resources, such as with
- tagged PDF. This is a bit of a corner case that is not likely
- to cause a significant problem in practice, but the only side
- effect would be lack of removal of shared resources. A future
- version of qpdf may be more sophisticated in its detection of
- names that refer to resources.
-
- - Properly handle strings if they appear in inline image
- dictionaries while externalizing inline images.
-
-10.1.0: January 5, 2021
- - CLI Enhancements
-
- - Add :samp:`--flatten-rotation` command-line
- option, which causes all pages that are rotated using
- parameters in the page's dictionary to instead be identically
- rotated in the page's contents. The change is not user-visible
- for compliant PDF readers but can be used to work around broken
- PDF applications that don't properly handle page rotation.
-
- - Library Enhancements
-
- - Support for user-provided (pluggable, modular) stream filters.
- It is now possible to derive a class from ``QPDFStreamFilter``
- and register it with ``QPDF`` so that regular library methods,
- including those used by ``QPDFWriter``, can decode streams with
- filters not directly supported by the library. The example
- :file:`examples/pdf-custom-filter.cc`
- illustrates how to use this capability.
-
- - Add methods to ``QPDFPageObjectHelper`` to iterate through
- XObjects on a page or form XObjects, possibly recursing into
- nested form XObjects: ``forEachXObject``, ``ForEachImage``,
- ``forEachFormXObject``.
-
- - Enhance several methods in ``QPDFPageObjectHelper`` to work
- with form XObjects as well as pages, as noted in comments. See
- :file:`ChangeLog` for a full list.
-
- - Rename some functions in ``QPDFPageObjectHelper``, while
- keeping old names for compatibility:
-
- - ``getPageImages`` to ``getImages``
-
- - ``filterPageContents`` to ``filterContents``
-
- - ``pipePageContents`` to ``pipeContents``
-
- - ``parsePageContents`` to ``parseContents``
-
- - Add method ``QPDFPageObjectHelper::getFormXObjects`` to return
- a map of form XObjects directly on a page or form XObject
-
- - Add new helper methods to ``QPDFObjectHandle``:
- ``isFormXObject``, ``isImage``
-
- - Add the optional ``allow_streams`` parameter
- ``QPDFObjectHandle::makeDirect``. When
- ``QPDFObjectHandle::makeDirect`` is called in this way, it
- preserves references to streams rather than throwing an
- exception.
-
- - Add ``QPDFObjectHandle::setFilterOnWrite`` method. Calling this
- on a stream prevents ``QPDFWriter`` from attempting to
- uncompress, recompress, or otherwise filter a stream even if it
- could. Developers can use this to protect streams that are
- optimized should be protected from ``QPDFWriter``'s default
- behavior for any other reason.
-
- - Add ``ostream`` ``<<`` operator for ``QPDFObjGen``. This is
- useful to have for debugging.
-
- - Add method ``QPDFPageObjectHelper::flattenRotation``, which
- replaces a page's ``/Rotate`` keyword by rotating the page
- within the content stream and altering the page's bounding
- boxes so the rendering is the same. This can be used to work
- around buggy PDF readers that can't properly handle page
- rotation.
-
- - C API Enhancements
-
- - Add several new functions to the C API for working with
- objects. These are wrappers around many of the methods in
- ``QPDFObjectHandle``. Their inclusion adds considerable new
- capability to the C API.
-
- - Add ``qpdf_register_progress_reporter`` to the C API,
- corresponding to ``QPDFWriter::registerProgressReporter``.
-
- - Performance Enhancements
-
- - Improve steps ``QPDFWriter`` takes to prepare a ``QPDF`` object
- for writing, resulting in about an 8% improvement in write
- performance while allowing indirect objects to appear in
- ``/DecodeParms``.
-
- - When extracting pages, the :command:`qpdf` CLI
- only removes unreferenced resources from the pages that are
- being kept, resulting in a significant performance improvement
- when extracting small numbers of pages from large, complex
- documents.
-
- - Bug Fixes
-
- - ``QPDFPageObjectHelper::externalizeInlineImages`` was not
- externalizing images referenced from form XObjects that
- appeared on the page.
-
- - ``QPDFObjectHandle::filterPageContents`` was broken for pages
- with multiple content streams.
-
- - Tweak zsh completion code to behave a little better with
- respect to path completion.
-
-10.0.4: November 21, 2020
- - Bug Fixes
-
- - Fix a handful of integer overflows. This includes cases found
- by fuzzing as well as having qpdf not do range checking on
- unused values in the xref stream.
-
-10.0.3: October 31, 2020
- - Bug Fixes
-
- - The fix to the bug involving copying streams with indirect
- filters was incorrect and introduced a new, more serious bug.
- The original bug has been fixed correctly, as has the bug
- introduced in 10.0.2.
-
-10.0.2: October 27, 2020
- - Bug Fixes
-
- - When concatenating content streams, as with
- :samp:`--coalesce-contents`, there were cases
- in which qpdf would merge two lexical tokens together, creating
- invalid results. A newline is now inserted between merged
- content streams if one is not already present.
-
- - Fix an internal error that could occur when copying foreign
- streams whose stream data had been replaced using a stream data
- provider if those streams had indirect filters or decode
- parameters. This is a rare corner case.
-
- - Ensure that the caller's locale settings do not change the
- results of numeric conversions performed internally by the qpdf
- library. Note that the problem here could only be caused when
- the qpdf library was used programmatically. Using the qpdf CLI
- already ignored the user's locale for numeric conversion.
-
- - Fix several instances in which warnings were not suppressed in
- spite of :samp:`--no-warn` and/or errors or
- warnings were written to standard output rather than standard
- error.
-
- - Fixed a memory leak that could occur under specific
- circumstances when
- :samp:`--object-streams=generate` was used.
-
- - Fix various integer overflows and similar conditions found by
- the OSS-Fuzz project.
-
- - Enhancements
-
- - New option :samp:`--warning-exit-0` causes qpdf
- to exit with a status of ``0`` rather than ``3`` if there are
- warnings but no errors. Combine with
- :samp:`--no-warn` to completely ignore
- warnings.
-
- - Performance improvements have been made to
- ``QPDF::processMemoryFile``.
-
- - The OpenSSL crypto provider produces more detailed error
- messages.
-
- - Build Changes
-
- - The option :samp:`--disable-rpath` is now
- supported by qpdf's :command:`./configure`
- script. Some distributions' packaging standards recommended the
- use of this option.
-
- - Selection of a printf format string for ``long long`` has
- been moved from ``ifdefs`` to an autoconf
- test. If you are using your own build system, you will need to
- provide a value for ``LL_FMT`` in
- :file:`libqpdf/qpdf/qpdf-config.h`, which
- would typically be ``"%lld"`` or, for some Windows compilers,
- ``"%I64d"``.
-
- - Several improvements were made to build-time configuration of
- the OpenSSL crypto provider.
-
- - A nearly stand-alone Linux binary zip file is now included with
- the qpdf release. This is built on an older (but supported)
- Ubuntu LTS release, but would work on most reasonably recent
- Linux distributions. It contains only the executables and
- required shared libraries that would not be present on a
- minimal system. It can be used for including qpdf in a minimal
- environment, such as a docker container. The zip file is also
- known to work as a layer in AWS Lambda.
-
- - QPDF's automated build has been migrated from Azure Pipelines
- to GitHub Actions.
-
- - Windows-specific Changes
-
- - The Windows executables distributed with qpdf releases now use
- the OpenSSL crypto provider by default. The native crypto
- provider is also compiled in and can be selected at runtime
- with the ``QPDF_CRYPTO_PROVIDER`` environment variable.
-
- - Improvements have been made to how a cryptographic provider is
- obtained in the native Windows crypto implementation. However
- mostly this is shadowed by OpenSSL being used by default.
-
-10.0.1: April 9, 2020
- - Bug Fixes
-
- - 10.0.0 introduced a bug in which calling
- ``QPDFObjectHandle::getStreamData`` on a stream that can't be
- filtered was returning the raw data instead of throwing an
- exception. This is now fixed.
-
- - Fix a bug that was preventing qpdf from linking with some
- versions of clang on some platforms.
-
- - Enhancements
-
- - Improve the :file:`pdf-invert-images`
- example to avoid having to load all the images into RAM at the
- same time.
-
-10.0.0: April 6, 2020
- - Performance Enhancements
-
- - The qpdf library and executable should run much faster in this
- version than in the last several releases. Several internal
- library optimizations have been made, and there has been
- improved behavior on page splitting as well. This version of
- qpdf should outperform any of the 8.x or 9.x versions.
-
- - Incompatible API (source-level) Changes (minor)
-
- - The ``QUtil::srandom`` method was removed. It didn't do
- anything unless insecure random numbers were compiled in, and
- they have been off by default for a long time. If you were
- calling it, just remove the call since it wasn't doing anything
- anyway.
-
- - Build/Packaging Changes
-
- - Add a ``openssl`` crypto provider, which is implemented with
- OpenSSL and also works with BoringSSL. Thanks to Dean Scarff
- for this contribution. If you maintain qpdf for a distribution,
- pay special attention to make sure that you are including
- support for the crypto providers you want. Package maintainers
- will have to weigh the advantages of allowing users to pick a
- crypto provider at runtime against the disadvantages of adding
- more dependencies to qpdf.
-
- - Allow qpdf to built on stripped down systems whose C/C++
- libraries lack the ``wchar_t`` type. Search for ``wchar_t`` in
- qpdf's README.md for details. This should be very rare, but it
- is known to be helpful in some embedded environments.
-
- - CLI Enhancements
-
- - Add ``objectinfo`` key to the JSON output. This will be a place
- to put computed metadata or other information about PDF objects
- that are not immediately evident in other ways or that seem
- useful for some other reason. In this version, information is
- provided about each object indicating whether it is a stream
- and, if so, what its length and filters are. Without this, it
- was not possible to tell conclusively from the JSON output
- alone whether or not an object was a stream. Run
- :command:`qpdf --json-help` for details.
-
- - Add new option
- :samp:`--remove-unreferenced-resources` which
- takes ``auto``, ``yes``, or ``no`` as arguments. The new
- ``auto`` mode, which is the default, performs a fast heuristic
- over a PDF file when splitting pages to determine whether the
- expensive process of finding and removing unreferenced
- resources is likely to be of benefit. For most files, this new
- default will result in a significant performance improvement
- for splitting pages. See :ref:`ref.advanced-transformation` for a more detailed
- discussion.
-
- - The :samp:`--preserve-unreferenced-resources`
- is now just a synonym for
- :samp:`--remove-unreferenced-resources=no`.
-
- - If the ``QPDF_EXECUTABLE`` environment variable is set when
- invoking :command:`qpdf --bash-completion` or
- :command:`qpdf --zsh-completion`, the completion
- command that it outputs will refer to qpdf using the value of
- that variable rather than what :command:`qpdf`
- determines its executable path to be. This can be useful when
- wrapping :command:`qpdf` with a script, working
- with a version in the source tree, using an AppImage, or other
- situations where there is some indirection.
-
- - Library Enhancements
-
- - Random number generation is now delegated to the crypto
- provider. The old behavior is still used by the native crypto
- provider. It is still possible to provide your own random
- number generator.
-
- - Add a new version of
- ``QPDFObjectHandle::StreamDataProvider::provideStreamData``
- that accepts the ``suppress_warnings`` and ``will_retry``
- options and allows a success code to be returned. This makes it
- possible to implement a ``StreamDataProvider`` that calls
- ``pipeStreamData`` on another stream and to pass the response
- back to the caller, which enables better error handling on
- those proxied streams.
-
- - Update ``QPDFObjectHandle::pipeStreamData`` to return an
- overall success code that goes beyond whether or not filtered
- data was written successfully. This allows better error
- handling of cases that were not filtering errors. You have to
- call this explicitly. Methods in previously existing APIs have
- the same semantics as before.
-
- - The ``QPDFPageObjectHelper::placeFormXObject`` method now
- allows separate control over whether it should be willing to
- shrink or expand objects to fit them better into the
- destination rectangle. The previous behavior was that shrinking
- was allowed but expansion was not. The previous behavior is
- still the default.
-
- - When calling the C API, any non-zero value passed to a boolean
- parameter is treated as ``TRUE``. Previously only the value
- ``1`` was accepted. This makes the C API behave more like most
- C interfaces and is known to improve compatibility with some
- Windows environments that dynamically load the DLL and call
- functions from it.
-
- - Add ``QPDFObjectHandle::unsafeShallowCopy`` for copying only
- top-level dictionary keys or array items. This is unsafe
- because it creates a situation in which changing a lower-level
- item in one object may also change it in another object, but
- for cases in which you *know* you are only inserting or
- replacing top-level items, it is much faster than
- ``QPDFObjectHandle::shallowCopy``.
-
- - Add ``QPDFObjectHandle::filterAsContents``, which filter's a
- stream's data as a content stream. This is useful for parsing
- the contents for form XObjects in the same way as parsing page
- content streams.
-
- - Bug Fixes
-
- - When detecting and removing unreferenced resources during page
- splitting, traverse into form XObjects and handle their
- resources dictionaries as well.
-
- - The same error recovery is applied to streams in other than the
- primary input file when merging or splitting pages.
-
-9.1.1: January 26, 2020
- - Build/Packaging Changes
-
- - The fix-qdf program was converted from perl to C++. As such,
- qpdf no longer has a runtime dependency on perl.
-
- - Library Enhancements
-
- - Added new helper routine ``QUtil::call_main_from_wmain`` which
- converts ``wchar_t`` arguments to UTF-8 encoded strings. This
- is useful for qpdf because library methods expect file names to
- be UTF-8 encoded, even on Windows
-
- - Added new ``QUtil::read_lines_from_file`` methods that take
- ``FILE*`` arguments and that allow preservation of end-of-line
- characters. This also fixes a bug where
- ``QUtil::read_lines_from_file`` wouldn't work properly with
- Unicode filenames.
-
- - CLI Enhancements
-
- - Added options :samp:`--is-encrypted` and
- :samp:`--requires-password` for testing whether
- a file is encrypted or requires a password other than the
- supplied (or empty) password. These communicate via exit
- status, making them useful for shell scripts. They also work on
- encrypted files with unknown passwords.
-
- - Added ``encrypt`` key to JSON options. With the exception of
- the reconstructed user password for older encryption formats,
- this provides the same information as
- :samp:`--show-encryption` but in a consistent,
- parseable format. See output of :command:`qpdf
- --json-help` for details.
-
- - Bug Fixes
-
- - In QDF mode, be sure not to write more than one XRef stream to
- a file, even when
- :samp:`--preserve-unreferenced` is used.
- :command:`fix-qdf` assumes that there is only
- one XRef stream, and that it appears at the end of the file.
-
- - When externalizing inline images, properly handle images whose
- color space is a reference to an object in the page's resource
- dictionary.
-
- - Windows-specific fix for acquiring crypt context with a new
- keyset.
-
-9.1.0: November 17, 2019
- - Build Changes
-
- - A C++-11 compiler is now required to build qpdf.
-
- - A new crypto provider that uses gnutls for crypto functions is
- now available and can be enabled at build time. See :ref:`ref.crypto` for more information about crypto
- providers and :ref:`ref.crypto.build` for specific information about
- the build.
-
- - Library Enhancements
-
- - Incorporate contribution from Masamichi Hosoda to properly
- handle signature dictionaries by not including them in object
- streams, formatting the ``Contents`` key has a hexadecimal
- string, and excluding the ``/Contents`` key from encryption and
- decryption.
-
- - Incorporate contribution from Masamichi Hosoda to provide new
- API calls for getting file-level information about input and
- output files, enabling certain operations on the files at the
- file level rather than the object level. New methods include
- ``QPDF::getXRefTable()``,
- ``QPDFObjectHandle::getParsedOffset()``,
- ``QPDFWriter::getRenumberedObjGen(QPDFObjGen)``, and
- ``QPDFWriter::getWrittenXRefTable()``.
-
- - Support build-time and runtime selectable crypto providers.
- This includes the addition of new classes
- ``QPDFCryptoProvider`` and ``QPDFCryptoImpl`` and the
- recognition of the ``QPDF_CRYPTO_PROVIDER`` environment
- variable. Crypto providers are described in depth in :ref:`ref.crypto`.
-
- - CLI Enhancements
-
- - Addition of the :samp:`--show-crypto` option in
- support of selectable crypto providers, as described in :ref:`ref.crypto`.
-
- - Allow ``:even`` or ``:odd`` to be appended to numeric ranges
- for specification of the even or odd pages from among the pages
- specified in the range.
-
- - Fix shell wildcard expansion behavior (``*`` and ``?``) of the
- :command:`qpdf.exe` as built my MSVC.
-
-9.0.2: October 12, 2019
- - Bug Fix
-
- - Fix the name of the temporary file used by
- :samp:`--replace-input` so that it doesn't
- require path splitting and works with paths include
- directories.
-
-9.0.1: September 20, 2019
- - Bug Fixes/Enhancements
-
- - Fix some build and test issues on big-endian systems and
- compilers with characters that are unsigned by default. The
- problems were in build and test only. There were no actual bugs
- in the qpdf library itself relating to endianness or unsigned
- characters.
-
- - When a dictionary has a duplicated key, report this with a
- warning. The behavior of the library in this case is unchanged,
- but the error condition is no longer silently ignored.
-
- - When a form field's display rectangle is erroneously specified
- with inverted coordinates, detect and correct this situation.
- This avoids some form fields from being flipped when flattening
- annotations on files with this condition.
-
-9.0.0: August 31, 2019
- - Incompatible API (source-level) Changes (minor)
-
- - The method ``QUtil::strcasecmp`` has been renamed to
- ``QUtil::str_compare_nocase``. This incompatible change is
- necessary to enable qpdf to build on platforms that define
- ``strcasecmp`` as a macro.
-
- - The ``QPDF::copyForeignObject`` method had an overloaded
- version that took a boolean parameter that was not used. If you
- were using this version, just omit the extra parameter.
-
- - There was a version ``QPDFTokenizer::expectInlineImage`` that
- took no arguments. This version has been removed since it
- caused the tokenizer to return incorrect inline images. A new
- version was added some time ago that produces correct output.
- This is a very low level method that doesn't make sense to call
- outside of qpdf's lexical engine. There are higher level
- methods for tokenizing content streams.
-
- - Change ``QPDFOutlineDocumentHelper::getTopLevelOutlines`` and
- ``QPDFOutlineObjectHelper::getKids`` to return a
- ``std::vector`` instead of a ``std::list`` of
- ``QPDFOutlineObjectHelper`` objects.
-
- - Remove method ``QPDFTokenizer::allowPoundAnywhereInName``. This
- function would allow creation of name tokens whose value would
- change when unparsed, which is never the correct behavior.
-
- - CLI Enhancements
-
- - The :samp:`--replace-input` option may be given
- in place of an output file name. This causes qpdf to overwrite
- the input file with the output. See the description of
- :samp:`--replace-input` in :ref:`ref.basic-options` for more details.
-
- - The :samp:`--recompress-flate` instructs
- :command:`qpdf` to recompress streams that are
- already compressed with ``/FlateDecode``. Useful with
- :samp:`--compression-level`.
-
- - The
- :samp:`--compression-level={level}`
- sets the zlib compression level used for any streams compressed
- by ``/FlateDecode``. Most effective when combined with
- :samp:`--recompress-flate`.
-
- - Library Enhancements
-
- - A new namespace ``QIntC``, provided by
- :file:`qpdf/QIntC.hh`, provides safe
- conversion methods between different integer types. These
- conversion methods do range checking to ensure that the cast
- can be performed with no loss of information. Every use of
- ``static_cast`` in the library was inspected to see if it could
- use one of these safe converters instead. See :ref:`ref.casting` for additional details.
-
- - Method ``QPDF::anyWarnings`` tells whether there have been any
- warnings without clearing the list of warnings.
-
- - Method ``QPDF::closeInputSource`` closes or otherwise releases
- the input source. This enables the input file to be deleted or
- renamed.
-
- - New methods have been added to ``QUtil`` for converting back
- and forth between strings and unsigned integers:
- ``uint_to_string``, ``uint_to_string_base``,
- ``string_to_uint``, and ``string_to_ull``.
-
- - New methods have been added to ``QPDFObjectHandle`` that return
- the value of ``Integer`` objects as ``int`` or ``unsigned int``
- with range checking and sensible fallback values, and a new
- method was added to return an unsigned value. This makes it
- easier to write code that is safe from unintentional data loss.
- Functions: ``getUIntValue``, ``getIntValueAsInt``,
- ``getUIntValueAsUInt``.
-
- - When parsing content streams with
- ``QPDFObjectHandle::ParserCallbacks``, in place of the method
- ``handleObject(QPDFObjectHandle)``, the developer may override
- ``handleObject(QPDFObjectHandle, size_t offset, size_t
- length)``. If this method is defined, it will
- be invoked with the object along with its offset and length
- within the overall contents being parsed. Intervening spaces
- and comments are not included in offset and length.
- Additionally, a new method ``contentSize(size_t)`` may be
- implemented. If present, it will be called prior to the first
- call to ``handleObject`` with the total size in bytes of the
- combined contents.
-
- - New methods ``QPDF::userPasswordMatched`` and
- ``QPDF::ownerPasswordMatched`` have been added to enable a
- caller to determine whether the supplied password was the user
- password, the owner password, or both. This information is also
- displayed by :command:`qpdf --show-encryption`
- and :command:`qpdf --check`.
-
- - Static method ``Pl_Flate::setCompressionLevel`` can be called
- to set the zlib compression level globally used by all
- instances of Pl_Flate in deflate mode.
-
- - The method ``QPDFWriter::setRecompressFlate`` can be called to
- tell ``QPDFWriter`` to uncompress and recompress streams
- already compressed with ``/FlateDecode``.
-
- - The underlying implementation of QPDF arrays has been enhanced
- to be much more memory efficient when dealing with arrays with
- lots of nulls. This enables qpdf to use drastically less memory
- for certain types of files.
-
- - When traversing the pages tree, if nodes are encountered with
- invalid types, the types are fixed, and a warning is issued.
-
- - A new helper method ``QUtil::read_file_into_memory`` was added.
-
- - All conditions previously reported by
- ``QPDF::checkLinearization()`` as errors are now presented as
- warnings.
-
- - Name tokens containing the ``#`` character not preceded by two
- hexadecimal digits, which is invalid in PDF 1.2 and above, are
- properly handled by the library: a warning is generated, and
- the name token is properly preserved, even if invalid, in the
- output. See :file:`ChangeLog` for a more
- complete description of this change.
-
- - Bug Fixes
-
- - A small handful of memory issues, assertion failures, and
- unhandled exceptions that could occur on badly mangled input
- files have been fixed. Most of these problems were found by
- Google's OSS-Fuzz project.
-
- - When :command:`qpdf --check` or
- :command:`qpdf --check-linearization` encounters
- a file with linearization warnings but not errors, it now
- properly exits with exit code 3 instead of 2.
-
- - The :samp:`--completion-bash` and
- :samp:`--completion-zsh` options now work
- properly when qpdf is invoked as an AppImage.
-
- - Calling ``QPDFWriter::set*EncryptionParameters`` on a
- ``QPDFWriter`` object whose output filename has not yet been
- set no longer produces a segmentation fault.
-
- - When reading encrypted files, follow the spec more closely
- regarding encryption key length. This allows qpdf to open
- encrypted files in most cases when they have invalid or missing
- /Length keys in the encryption dictionary.
-
- - Build Changes
-
- - On platforms that support it, qpdf now builds with
- :samp:`-fvisibility=hidden`. If you build qpdf
- with your own build system, this is now safe to use. This
- prevents methods that are not part of the public API from being
- exported by the shared library, and makes qpdf's ELF shared
- libraries (used on Linux, MacOS, and most other UNIX flavors)
- behave more like the Windows DLL. Since the DLL already behaves
- in much this way, it is unlikely that there are any methods
- that were accidentally not exported. However, with ELF shared
- libraries, typeinfo for some classes has to be explicitly
- exported. If there are problems in dynamically linked code
- catching exceptions or subclassing, this could be the reason.
- If you see this, please report a bug at
- https://github.com/qpdf/qpdf/issues/.
-
- - QPDF is now compiled with integer conversion and sign
- conversion warnings enabled. Numerous changes were made to the
- library to make this safe.
-
- - QPDF's :command:`make install` target explicitly
- specifies the mode to use when installing files instead of
- relying the user's umask. It was previously doing this for some
- files but not others.
-
- - If :command:`pkg-config` is available, use it to
- locate :file:`libjpeg` and
- :file:`zlib` dependencies, falling back on
- old behavior if unsuccessful.
-
- - Other Notes
-
- - QPDF has been fully integrated into `Google's OSS-Fuzz
- project <https://github.com/google/oss-fuzz>`__. This project
- exercises code with randomly mutated inputs and is great for
- discovering hidden security crashes and security issues.
- Several bugs found by oss-fuzz have already been fixed in qpdf.
-
-8.4.2: May 18, 2019
- This release has just one change: correction of a buffer overrun in
- the Windows code used to open files. Windows users should take this
- update. There are no code changes that affect non-Windows releases.
-
-8.4.1: April 27, 2019
- - Enhancements
-
- - When :command:`qpdf --version` is run, it will
- detect if the qpdf CLI was built with a different version of
- qpdf than the library, which may indicate a problem with the
- installation.
-
- - New option :samp:`--remove-page-labels` will
- remove page labels before generating output. This used to
- happen if you ran :command:`qpdf --empty --pages ..
- --`, but the behavior changed in qpdf 8.3.0. This
- option enables people who were relying on the old behavior to
- get it again.
-
- - New option
- :samp:`--keep-files-open-threshold={count}`
- can be used to override number of files that qpdf will use to
- trigger the behavior of not keeping all files open when merging
- files. This may be necessary if your system allows fewer than
- the default value of 200 files to be open at the same time.
-
- - Bug Fixes
-
- - Handle Unicode characters in filenames on Windows. The changes
- to support Unicode on the CLI in Windows broke Unicode
- filenames for Windows.
-
- - Slightly tighten logic that determines whether an object is a
- page. This should resolve problems in some rare files where
- some non-page objects were passing qpdf's test for whether
- something was a page, thus causing them to be erroneously lost
- during page splitting operations.
-
- - Revert change that included preservation of outlines
- (bookmarks) in :samp:`--split-pages`. The way
- it was implemented in 8.3.0 and 8.4.0 caused a very significant
- degradation of performance for splitting certain files. A
- future release of qpdf may re-introduce the behavior in a more
- performant and also more correct fashion.
-
- - In JSON mode, add missing leading 0 to decimal values between
- -1 and 1 even if not present in the input. The JSON
- specification requires the leading 0. The PDF specification
- does not.
-
-8.4.0: February 1, 2019
- - Command-line Enhancements
-
- - *Non-compatible CLI change:* The qpdf command-line tool
- interprets passwords given at the command-line differently from
- previous releases when the passwords contain non-ASCII
- characters. In some cases, the behavior differs from previous
- releases. For a discussion of the current behavior, please see
- :ref:`ref.unicode-passwords`. The
- incompatibilities are as follows:
-
- - On Windows, qpdf now receives all command-line options as
- Unicode strings if it can figure out the appropriate
- compile/link options. This is enabled at least for MSVC and
- mingw builds. That means that if non-ASCII strings are
- passed to the qpdf CLI in Windows, qpdf will now correctly
- receive them. In the past, they would have either been
- encoded as Windows code page 1252 (also known as "Windows
- ANSI" or as something unintelligible. In almost all cases,
- qpdf is able to properly interpret Unicode arguments now,
- whereas in the past, it would almost never interpret them
- properly. The result is that non-ASCII passwords given to
- the qpdf CLI on Windows now have a much greater chance of
- creating PDF files that can be opened by a variety of
- readers. In the past, usually files encrypted from the
- Windows CLI using non-ASCII passwords would not be readable
- by most viewers. Note that the current version of qpdf is
- able to decrypt files that it previously created using the
- previously supplied password.
-
- - The PDF specification requires passwords to be encoded as
- UTF-8 for 256-bit encryption and with PDF Doc encoding for
- 40-bit or 128-bit encryption. Older versions of qpdf left it
- up to the user to provide passwords with the correct
- encoding. The qpdf CLI now detects when a password is given
- with UTF-8 encoding and automatically transcodes it to what
- the PDF spec requires. While this is almost always the
- correct behavior, it is possible to override the behavior if
- there is some reason to do so. This is discussed in more
- depth in :ref:`ref.unicode-passwords`.
-
- - New options
- :samp:`--externalize-inline-images`,
- :samp:`--ii-min-bytes`, and
- :samp:`--keep-inline-images` control qpdf's
- handling of inline images and possible conversion of them to
- regular images. By default,
- :samp:`--optimize-images` now also applies to
- inline images. These options are discussed in :ref:`ref.advanced-transformation`.
-
- - Add options :samp:`--overlay` and
- :samp:`--underlay` for overlaying or
- underlaying pages of other files onto output pages. See
- :ref:`ref.overlay-underlay` for
- details.
-
- - When opening an encrypted file with a password, if the
- specified password doesn't work and the password contains any
- non-ASCII characters, qpdf will try a number of alternative
- passwords to try to compensate for possible character encoding
- errors. This behavior can be suppressed with the
- :samp:`--suppress-password-recovery` option.
- See :ref:`ref.unicode-passwords` for a full
- discussion.
-
- - Add the :samp:`--password-mode` option to
- fine-tune how qpdf interprets password arguments, especially
- when they contain non-ASCII characters. See :ref:`ref.unicode-passwords` for more information.
-
- - In the :samp:`--pages` option, it is now
- possible to copy the same page more than once from the same
- file without using the previous workaround of specifying two
- different paths to the same file.
-
- - In the :samp:`--pages` option, allow use of "."
- as a shortcut for the primary input file. That way, you can do
- :command:`qpdf in.pdf --pages . 1-2 -- out.pdf`
- instead of having to repeat :file:`in.pdf`
- in the command.
-
- - When encrypting with 128-bit and 256-bit encryption, new
- encryption options :samp:`--assemble`,
- :samp:`--annotate`,
- :samp:`--form`, and
- :samp:`--modify-other` allow more fine-grained
- granularity in configuring options. Before, the
- :samp:`--modify` option only configured certain
- predefined groups of permissions.
-
- - Bug Fixes and Enhancements
-
- - *Potential data-loss bug:* Versions of qpdf between 8.1.0 and
- 8.3.0 had a bug that could cause page splitting and merging
- operations to drop some font or image resources if the PDF
- file's internal structure shared these resource lists across
- pages and if some but not all of the pages in the output did
- not reference all the fonts and images. Using the
- :samp:`--preserve-unreferenced-resources`
- option would work around the incorrect behavior. This bug was
- the result of a typo in the code and a deficiency in the test
- suite. The case that triggered the error was known, just not
- handled properly. This case is now exercised in qpdf's test
- suite and properly handled.
-
- - When optimizing images, detect and refuse to optimize images
- that can't be converted to JPEG because of bit depth or color
- space.
-
- - Linearization and page manipulation APIs now detect and recover
- from files that have duplicate Page objects in the pages tree.
-
- - Using older option
- :samp:`--stream-data=compress` with object
- streams, object streams and xref streams were not compressed.
-
- - When the tokenizer returns inline image tokens, delimiters
- following ``ID`` and ``EI`` operators are no longer excluded.
- This makes it possible to reliably extract the actual image
- data.
-
- - Library Enhancements
-
- - Add method ``QPDFPageObjectHelper::externalizeInlineImages`` to
- convert inline images to regular images.
-
- - Add method ``QUtil::possible_repaired_encodings()`` to generate
- a list of strings that represent other ways the given string
- could have been encoded. This is the method the QPDF CLI uses
- to generate the strings it tries when recovering incorrectly
- encoded Unicode passwords.
-
- - Add new versions of
- ``QPDFWriter::setR{3,4,5,6}EncryptionParameters`` that allow
- more granular setting of permissions bits. See
- :file:`QPDFWriter.hh` for details.
-
- - Add new versions of the transcoders from UTF-8 to single-byte
- coding systems in ``QUtil`` that report success or failure
- rather than just substituting a specified unknown character.
-
- - Add method ``QUtil::analyze_encoding()`` to determine whether a
- string has high-bit characters and is appears to be UTF-16 or
- valid UTF-8 encoding.
-
- - Add new method ``QPDFPageObjectHelper::shallowCopyPage()`` to
- copy a new page that is a "shallow copy" of a page. The
- resulting object is an indirect object ready to be passed to
- ``QPDFPageDocumentHelper::addPage()`` for either the original
- ``QPDF`` object or a different one. This is what the
- :command:`qpdf` command-line tool uses to copy
- the same page multiple times from the same file during
- splitting and merging operations.
-
- - Add method ``QPDF::getUniqueId()``, which returns a unique
- identifier for the given QPDF object. The identifier will be
- unique across the life of the application. The returned value
- can be safely used as a map key.
-
- - Add method ``QPDF::setImmediateCopyFrom``. This further
- enhances qpdf's ability to allow a ``QPDF`` object from which
- objects are being copied to go out of scope before the
- destination object is written. If you call this method on a
- ``QPDF`` instances, objects copied *from* this instance will be
- copied immediately instead of lazily. This option uses more
- memory but allows the source object to go out of scope before
- the destination object is written in all cases. See comments in
- :file:`QPDF.hh` for details.
-
- - Add method ``QPDFPageObjectHelper::getAttribute`` for
- retrieving an attribute from the page dictionary taking
- inheritance into consideration, and optionally making a copy if
- your intention is to modify the attribute.
-
- - Fix long-standing limitation of
- ``QPDFPageObjectHelper::getPageImages`` so that it now properly
- reports images from inherited resources dictionaries,
- eliminating the need to call
- ``QPDFPageDocumentHelper::pushInheritedAttributesToPage`` in
- this case.
-
- - Add method ``QPDFObjectHandle::getUniqueResourceName`` for
- finding an unused name in a resource dictionary.
-
- - Add method ``QPDFPageObjectHelper::getFormXObjectForPage`` for
- generating a form XObject equivalent to a page. The resulting
- object can be used in the same file or copied to another file
- with ``copyForeignObject``. This can be useful for implementing
- underlay, overlay, n-up, thumbnails, or any other functionality
- requiring replication of pages in other contexts.
-
- - Add method ``QPDFPageObjectHelper::placeFormXObject`` for
- generating content stream text that places a given form XObject
- on a page, centered and fit within a specified rectangle. This
- method takes care of computing the proper transformation matrix
- and may optionally compensate for rotation or scaling of the
- destination page.
-
- - Build Improvements
-
- - Add new configure option
- :samp:`--enable-avoid-windows-handle`, which
- causes the preprocessor symbol ``AVOID_WINDOWS_HANDLE`` to be
- defined. When defined, qpdf will avoid referencing the Windows
- ``HANDLE`` type, which is disallowed with certain versions of
- the Windows SDK.
-
- - For Windows builds, attempt to determine what options, if any,
- have to be passed to the compiler and linker to enable use of
- ``wmain``. This causes the preprocessor symbol
- ``WINDOWS_WMAIN`` to be defined. If you do your own builds with
- other compilers, you can define this symbol to cause ``wmain``
- to be used. This is needed to allow the Windows
- :command:`qpdf` command to receive Unicode
- command-line options.
-
-8.3.0: January 7, 2019
- - Command-line Enhancements
-
- - Shell completion: you can now use eval :command:`$(qpdf
- --completion-bash)` and eval :command:`$(qpdf
- --completion-zsh)` to enable shell completion for
- bash and zsh.
-
- - Page numbers (also known as page labels) are now preserved when
- merging and splitting files with the
- :samp:`--pages` and
- :samp:`--split-pages` options.
-
- - Bookmarks are partially preserved when splitting pages with the
- :samp:`--split-pages` option. Specifically, the
- outlines dictionary and some supporting metadata are copied
- into the split files. The result is that all bookmarks from the
- original file appear, those that point to pages that are
- preserved work, and those that point to pages that are not
- preserved don't do anything. This is an interim step toward
- proper support for bookmarks in splitting and merging
- operations.
-
- - Page collation: add new option
- :samp:`--collate`. When specified, the
- semantics of :samp:`--pages` change from
- concatenation to collation. See :ref:`ref.page-selection` for examples and discussion.
-
- - Generation of information in JSON format, primarily to
- facilitate use of qpdf from languages other than C++. Add new
- options :samp:`--json`,
- :samp:`--json-key`, and
- :samp:`--json-object` to generate a JSON
- representation of the PDF file. Run :command:`qpdf
- --json-help` to get a description of the JSON
- format. For more information, see :ref:`ref.json`.
-
- - The :samp:`--generate-appearances` flag will
- cause qpdf to generate appearances for form fields if the PDF
- file indicates that form field appearances are out of date.
- This can happen when PDF forms are filled in by a program that
- doesn't know how to regenerate the appearances of the filled-in
- fields.
-
- - The :samp:`--flatten-annotations` flag can be
- used to *flatten* annotations, including form fields.
- Ordinarily, annotations are drawn separately from the page.
- Flattening annotations is the process of combining their
- appearances into the page's contents. You might want to do this
- if you are going to rotate or combine pages using a tool that
- doesn't understand about annotations. You may also want to use
- :samp:`--generate-appearances` when using this
- flag since annotations for outdated form fields are not
- flattened as that would cause loss of information.
-
- - The :samp:`--optimize-images` flag tells qpdf
- to recompresses every image using DCT (JPEG) compression as
- long as the image is not already compressed with lossy
- compression and recompressing the image reduces its size. The
- additional options :samp:`--oi-min-width`,
- :samp:`--oi-min-height`, and
- :samp:`--oi-min-area` prevent recompression of
- images whose width, height, or pixel area (width × height) are
- below a specified threshold.
-
- - The :samp:`--show-object` option can now be
- given as :samp:`--show-object=trailer` to show
- the trailer dictionary.
-
- - Bug Fixes and Enhancements
-
- - QPDF now automatically detects and recovers from dangling
- references. If a PDF file contained an indirect reference to a
- non-existent object, which is valid, when adding a new object
- to the file, it was possible for the new object to take the
- object ID of the dangling reference, thereby causing the
- dangling reference to point to the new object. This case is now
- prevented.
-
- - Fixes to form field setting code: strings are always written in
- UTF-16 format, and checkboxes and radio buttons are handled
- properly with respect to synchronization of values and
- appearance states.
-
- - The ``QPDF::checkLinearization()`` no longer causes the program
- to crash when it detects problems with linearization data.
- Instead, it issues a normal warning or error.
-
- - Ordinarily qpdf treats an argument of the form
- :samp:`@file` to mean that command-line options
- should be read from :file:`file`. Now, if
- :file:`file` does not exist but
- :file:`@file` does, qpdf will treat
- :file:`@file` as a regular option. This
- makes it possible to work more easily with PDF files whose
- names happen to start with the ``@`` character.
-
- - Library Enhancements
-
- - Remove the restriction in most cases that the source QPDF
- object used in a ``QPDF::copyForeignObject`` call has to stick
- around until the destination QPDF is written. The exceptional
- case is when the source stream gets is data using a
- QPDFObjectHandle::StreamDataProvider. For a more in-depth
- discussion, see comments around ``copyForeignObject`` in
- :file:`QPDF.hh`.
-
- - Add new method ``QPDFWriter::getFinalVersion()``, which returns
- the PDF version that will ultimately be written to the final
- file. See comments in :file:`QPDFWriter.hh`
- for some restrictions on its use.
-
- - Add several methods for transcoding strings to some of the
- character sets used in PDF files: ``QUtil::utf8_to_ascii``,
- ``QUtil::utf8_to_win_ansi``, ``QUtil::utf8_to_mac_roman``, and
- ``QUtil::utf8_to_utf16``. For the single-byte encodings that
- support only a limited character sets, these methods replace
- unsupported characters with a specified substitute.
-
- - Add new methods to ``QPDFAnnotationObjectHelper`` and
- ``QPDFFormFieldObjectHelper`` for querying flags and
- interpretation of different field types. Define constants in
- :file:`qpdf/Constants.h` to help with
- interpretation of flag values.
-
- - Add new methods
- ``QPDFAcroFormDocumentHelper::generateAppearancesIfNeeded`` and
- ``QPDFFormFieldObjectHelper::generateAppearance`` for
- generating appearance streams. See discussion in
- :file:`QPDFFormFieldObjectHelper.hh` for
- limitations.
-
- - Add two new helper functions for dealing with resource
- dictionaries: ``QPDFObjectHandle::getResourceNames()`` returns
- a list of all second-level keys, which correspond to the names
- of resources, and ``QPDFObjectHandle::mergeResources()`` merges
- two resources dictionaries as long as they have non-conflicting
- keys. These methods are useful for certain types of objects
- that resolve resources from multiple places, such as form
- fields.
-
- - Add methods ``QPDFPageDocumentHelper::flattenAnnotations()``
- and
- ``QPDFAnnotationObjectHelper::getPageContentForAppearance()``
- for handling low-level details of annotation flattening.
-
- - Add new helper classes: ``QPDFOutlineDocumentHelper``,
- ``QPDFOutlineObjectHelper``, ``QPDFPageLabelDocumentHelper``,
- ``QPDFNameTreeObjectHelper``, and
- ``QPDFNumberTreeObjectHelper``.
-
- - Add method ``QPDFObjectHandle::getJSON()`` that returns a JSON
- representation of the object. Call ``serialize()`` on the
- result to convert it to a string.
-
- - Add a simple JSON serializer. This is not a complete or
- general-purpose JSON library. It allows assembly and
- serialization of JSON structures with some restrictions, which
- are described in the header file. This is the serializer used
- by qpdf's new JSON representation.
-
- - Add new ``QPDFObjectHandle::Matrix`` class along with a few
- convenience methods for dealing with six-element numerical
- arrays as matrices.
-
- - Add new method ``QPDFObjectHandle::wrapInArray``, which returns
- the object itself if it is an array, or an array containing the
- object otherwise. This is a common construct in PDF. This
- method prevents you from having to explicitly test whether
- something is a single element or an array.
-
- - Build Improvements
-
- - It is no longer necessary to run
- :command:`autogen.sh` to build from a pristine
- checkout. Automatically generated files are now committed so
- that it is possible to build on platforms without autoconf
- directly from a clean checkout of the repository. The
- :command:`configure` script detects if the files
- are out of date when it also determines that the tools are
- present to regenerate them.
-
- - Pull requests and the master branch are now built automatically
- in `Azure
- Pipelines <https://dev.azure.com/qpdf/qpdf/_build>`__, which is
- free for open source projects. The build includes Linux, mac,
- Windows 32-bit and 64-bit with mingw and MSVC, and an AppImage
- build. Official qpdf releases are now built with Azure
- Pipelines.
-
- - Notes for Packagers
-
- - A new section has been added to the documentation with notes
- for packagers. Please see :ref:`ref.packaging`.
-
- - The qpdf detects out-of-date automatically generated files. If
- your packaging system automatically refreshes libtool or
- autoconf files, it could cause this check to fail. To avoid
- this problem, pass
- :samp:`--disable-check-autofiles` to
- :command:`configure`.
-
- - If you would like to have qpdf completion enabled
- automatically, you can install completion files in the
- distribution's default location. You can find sample completion
- files to install in the :file:`completions`
- directory.
-
-8.2.1: August 18, 2018
- - Command-line Enhancements
-
- - Add
- :samp:`--keep-files-open={[yn]}`
- to override default determination of whether to keep files open
- when merging. Please see the discussion of
- :samp:`--keep-files-open` in :ref:`ref.basic-options` for additional details.
-
-8.2.0: August 16, 2018
- - Command-line Enhancements
-
- - Add :samp:`--no-warn` option to suppress
- issuing warning messages. If there are any conditions that
- would have caused warnings to be issued, the exit status is
- still 3.
-
- - Bug Fixes and Optimizations
-
- - Performance fix: optimize page merging operation to avoid
- unnecessary open/close calls on files being merged. This solves
- a dramatic slow-down that was observed when merging certain
- types of files.
-
- - Optimize how memory was used for the TIFF predictor,
- drastically improving performance and memory usage for files
- containing high-resolution images compressed with Flate using
- the TIFF predictor.
-
- - Bug fix: end of line characters were not properly handled
- inside strings in some cases.
-
- - Bug fix: using :samp:`--progress` on very small
- files could cause an infinite loop.
-
- - API enhancements
-
- - Add new class ``QPDFSystemError``, derived from
- ``std::runtime_error``, which is now thrown by
- ``QUtil::throw_system_error``. This enables the triggering
- ``errno`` value to be retrieved.
-
- - Add ``ClosedFileInputSource::stayOpen`` method, enabling a
- ``ClosedFileInputSource`` to stay open during manually
- indicated periods of high activity, thus reducing the overhead
- of frequent open/close operations.
-
- - Build Changes
-
- - For the mingw builds, change the name of the DLL import library
- from :file:`libqpdf.a` to
- :file:`libqpdf.dll.a` to more accurately
- reflect that it is an import library rather than a static
- library. This potentially clears the way for supporting a
- static library in the future, though presently, the qpdf
- Windows build only builds the DLL and executables.
-
-8.1.0: June 23, 2018
- - Usability Improvements
-
- - When splitting files, qpdf detects fonts and images that the
- document metadata claims are referenced from a page but are not
- actually referenced and omits them from the output file. This
- change can cause a significant reduction in the size of split
- PDF files for files created by some software packages. In some
- cases, it can also make page splitting slower. Prior versions
- of qpdf would believe the document metadata and sometimes
- include all the images from all the other pages even though the
- pages were no longer present. In the unlikely event that the
- old behavior should be desired, or if you have a case where
- page splitting is very slow, the old behavior (and speed) can
- be enabled by specifying
- :samp:`--preserve-unreferenced-resources`. For
- additional details, please see :ref:`ref.advanced-transformation`.
-
- - When merging multiple PDF files, qpdf no longer leaves all the
- files open. This makes it possible to merge numbers of files
- that may exceed the operating system's limit for the maximum
- number of open files.
-
- - The :samp:`--rotate` option's syntax has been
- extended to make the page range optional. If you specify
- :samp:`--rotate={angle}`
- without specifying a page range, the rotation will be applied
- to all pages. This can be especially useful for adjusting a PDF
- created from a multi-page document that was scanned upside
- down.
-
- - When merging multiple files, the
- :samp:`--verbose` option now prints information
- about each file as it operates on that file.
-
- - When the :samp:`--progress` option is
- specified, qpdf will print a running indicator of its best
- guess at how far through the writing process it is. Note that,
- as with all progress meters, it's an approximation. This option
- is implemented in a way that makes it useful for software that
- uses the qpdf library; see API Enhancements below.
-
- - Bug Fixes
-
- - Properly decrypt files that use revision 3 of the standard
- security handler but use 40 bit keys (even though revision 3
- supports 128-bit keys).
-
- - Limit depth of nested data structures to prevent crashes from
- certain types of malformed (malicious) PDFs.
-
- - In "newline before endstream" mode, insert the required extra
- newline before the ``endstream`` at the end of object streams.
- This one case was previously omitted.
-
- - API Enhancements
-
- - The first round of higher level "helper" interfaces has been
- introduced. These are designed to provide a more convenient way
- of interacting with certain document features than using
- ``QPDFObjectHandle`` directly. For details on helpers, see
- :ref:`ref.helper-classes`. Specific additional
- interfaces are described below.
-
- - Add two new document helper classes: ``QPDFPageDocumentHelper``
- for working with pages, and ``QPDFAcroFormDocumentHelper`` for
- working with interactive forms. No old methods have been
- removed, but ``QPDFPageDocumentHelper`` is now the preferred
- way to perform operations on pages rather than calling the old
- methods in ``QPDFObjectHandle`` and ``QPDF`` directly. Comments
- in the header files direct you to the new interfaces. Please
- see the header files and :file:`ChangeLog`
- for additional details.
-
- - Add three new object helper class: ``QPDFPageObjectHelper`` for
- pages, ``QPDFFormFieldObjectHelper`` for interactive form
- fields, and ``QPDFAnnotationObjectHelper`` for annotations. All
- three classes are fairly sparse at the moment, but they have
- some useful, basic functionality.
-
- - A new example program
- :file:`examples/pdf-set-form-values.cc` has
- been added that illustrates use of the new document and object
- helpers.
-
- - The method ``QPDFWriter::registerProgressReporter`` has been
- added. This method allows you to register a function that is
- called by ``QPDFWriter`` to update your idea of the percentage
- it thinks it is through writing its output. Client programs can
- use this to implement reasonably accurate progress meters. The
- :command:`qpdf` command line tool uses this to
- implement its :samp:`--progress` option.
-
- - New methods ``QPDFObjectHandle::newUnicodeString`` and
- ``QPDFObject::unparseBinary`` have been added to allow for more
- convenient creation of strings that are explicitly encoded
- using big-endian UTF-16. This is useful for creating strings
- that appear outside of content streams, such as labels, form
- fields, outlines, document metadata, etc.
-
- - A new class ``QPDFObjectHandle::Rectangle`` has been added to
- ease working with PDF rectangles, which are just arrays of four
- numeric values.
-
-8.0.2: March 6, 2018
- - When a loop is detected while following cross reference streams or
- tables, treat this as damage instead of silently ignoring the
- previous table. This prevents loss of otherwise recoverable data
- in some damaged files.
-
- - Properly handle pages with no contents.
-
-8.0.1: March 4, 2018
- - Disregard data check errors when uncompressing ``/FlateDecode``
- streams. This is consistent with most other PDF readers and allows
- qpdf to recover data from another class of malformed PDF files.
-
- - On the command line when specifying page ranges, support preceding
- a page number by "r" to indicate that it should be counted from
- the end. For example, the range ``r3-r1`` would indicate the last
- three pages of a document.
-
-8.0.0: February 25, 2018
- - Packaging and Distribution Changes
-
- - QPDF is now distributed as an
- `AppImage <https://appimage.org/>`__ in addition to all the
- other ways it is distributed. The AppImage can be found in the
- download area with the other packages. Thanks to Kurt Pfeifle
- and Simon Peter for their contributions.
-
- - Bug Fixes
-
- - ``QPDFObjectHandle::getUTF8Val`` now properly treats
- non-Unicode strings as encoded with PDF Doc Encoding.
-
- - Improvements to handling of objects in PDF files that are not
- of the expected type. In most cases, qpdf will be able to warn
- for such cases rather than fail with an exception. Previous
- versions of qpdf would sometimes fail with errors such as
- "operation for dictionary object attempted on object of wrong
- type". This situation should be mostly or entirely eliminated
- now.
-
- - Enhancements to the :command:`qpdf` Command-line
- Tool. All new options listed here are documented in more detail in
- :ref:`ref.using`.
-
- - The option
- :samp:`--linearize-pass1={file}`
- has been added for debugging qpdf's linearization code.
-
- - The option :samp:`--coalesce-contents` can be
- used to combine content streams of a page whose contents are an
- array of streams into a single stream.
-
- - API Enhancements. All new API calls are documented in their
- respective classes' header files. There are no non-compatible
- changes to the API.
-
- - Add function ``qpdf_check_pdf`` to the C API. This function
- does basic checking that is a subset of what :command:`qpdf
- --check` performs.
-
- - Major enhancements to the lexical layer of qpdf. For a complete
- list of enhancements, please refer to the
- :file:`ChangeLog` file. Most of the changes
- result in improvements to qpdf's ability handle erroneous
- files. It is also possible for programs to handle whitespace,
- comments, and inline images as tokens.
-
- - New API for working with PDF content streams at a lexical
- level. The new class ``QPDFObjectHandle::TokenFilter`` allows
- the developer to provide token handlers. Token filters can be
- used with several different methods in ``QPDFObjectHandle`` as
- well as with a lower-level interface. See comments in
- :file:`QPDFObjectHandle.hh` as well as the
- new examples
- :file:`examples/pdf-filter-tokens.cc` and
- :file:`examples/pdf-count-strings.cc` for
- details.
-
-7.1.1: February 4, 2018
- - Bug fix: files whose /ID fields were other than 16 bytes long can
- now be properly linearized
-
- - A few compile and link issues have been corrected for some
- platforms.
-
-7.1.0: January 14, 2018
- - PDF files contain streams that may be compressed with various
- compression algorithms which, in some cases, may be enhanced by
- various predictor functions. Previously only the PNG up predictor
- was supported. In this version, all the PNG predictors as well as
- the TIFF predictor are supported. This increases the range of
- files that qpdf is able to handle.
-
- - QPDF now allows a raw encryption key to be specified in place of a
- password when opening encrypted files, and will optionally display
- the encryption key used by a file. This is a non-standard
- operation, but it can be useful in certain situations. Please see
- the discussion of :samp:`--password-is-hex-key` in
- :ref:`ref.basic-options` or the comments around
- ``QPDF::setPasswordIsHexKey`` in
- :file:`QPDF.hh` for additional details.
-
- - Bug fix: numbers ending with a trailing decimal point are now
- properly recognized as numbers.
-
- - Bug fix: when building qpdf from source on some platforms
- (especially MacOS), the build could get confused by older versions
- of qpdf installed on the system. This has been corrected.
-
-7.0.0: September 15, 2017
- - Packaging and Distribution Changes
-
- - QPDF's primary license is now `version 2.0 of the Apache
- License <http://www.apache.org/licenses/LICENSE-2.0>`__ rather
- than version 2.0 of the Artistic License. You may still, at
- your option, consider qpdf to be licensed with version 2.0 of
- the Artistic license.
-
- - QPDF no longer has a dependency on the PCRE (Perl-Compatible
- Regular Expression) library. QPDF now has an added dependency
- on the JPEG library.
-
- - Bug Fixes
-
- - This release contains many bug fixes for various infinite
- loops, memory leaks, and other memory errors that could be
- encountered with specially crafted or otherwise erroneous PDF
- files.
-
- - New Features
-
- - QPDF now supports reading and writing streams encoded with JPEG
- or RunLength encoding. Library API enhancements and
- command-line options have been added to control this behavior.
- See command-line options
- :samp:`--compress-streams` and
- :samp:`--decode-level` and methods
- ``QPDFWriter::setCompressStreams`` and
- ``QPDFWriter::setDecodeLevel``.
-
- - QPDF is much better at recovering from broken files. In most
- cases, qpdf will skip invalid objects and will preserve broken
- stream data by not attempting to filter broken streams. QPDF is
- now able to recover or at least not crash on dozens of broken
- test files I have received over the past few years.
-
- - Page rotation is now supported and accessible from both the
- library and the command line.
-
- - ``QPDFWriter`` supports writing files in a way that preserves
- PCLm compliance in support of driverless printing. This is very
- specialized and is only useful to applications that already
- know how to create PCLm files.
-
- - Enhancements to the :command:`qpdf` Command-line
- Tool. All new options listed here are documented in more detail in
- :ref:`ref.using`.
-
- - Command-line arguments can now be read from files or standard
- input using ``@file`` or ``@-`` syntax. Please see :ref:`ref.invocation`.
-
- - :samp:`--rotate`: request page rotation
-
- - :samp:`--newline-before-endstream`: ensure that
- a newline appears before every ``endstream`` keyword in the
- file; used to prevent qpdf from breaking PDF/A compliance on
- already compliant files.
-
- - :samp:`--preserve-unreferenced`: preserve
- unreferenced objects in the input PDF
-
- - :samp:`--split-pages`: break output into chunks
- with fixed numbers of pages
-
- - :samp:`--verbose`: print the name of each
- output file that is created
-
- - :samp:`--compress-streams` and
- :samp:`--decode-level` replace
- :samp:`--stream-data` for improving granularity
- of controlling compression and decompression of stream data.
- The :samp:`--stream-data` option will remain
- available.
-
- - When running :command:`qpdf --check` with other
- options, checks are always run first. This enables qpdf to
- perform its full recovery logic before outputting other
- information. This can be especially useful when manually
- recovering broken files, looking at qpdf's regenerated cross
- reference table, or other similar operations.
-
- - Process :command:`--pages` earlier so that other
- options like :samp:`--show-pages` or
- :samp:`--split-pages` can operate on the file
- after page splitting/merging has occurred.
-
- - API Changes. All new API calls are documented in their respective
- classes' header files.
-
- - ``QPDFObjectHandle::rotatePage``: apply rotation to a page
- object
-
- - ``QPDFWriter::setNewlineBeforeEndstream``: force newline to
- appear before ``endstream``
-
- - ``QPDFWriter::setPreserveUnreferencedObjects``: preserve
- unreferenced objects that appear in the input PDF. The default
- behavior is to discard them.
-
- - New ``Pipeline`` types ``Pl_RunLength`` and ``Pl_DCT`` are
- available for developers who wish to produce or consume
- RunLength or DCT stream data directly. The
- :file:`examples/pdf-create.cc` example
- illustrates their use.
-
- - ``QPDFWriter::setCompressStreams`` and
- ``QPDFWriter::setDecodeLevel`` methods control handling of
- different types of stream compression.
-
- - Add new C API functions ``qpdf_set_compress_streams``,
- ``qpdf_set_decode_level``,
- ``qpdf_set_preserve_unreferenced_objects``, and
- ``qpdf_set_newline_before_endstream`` corresponding to the new
- ``QPDFWriter`` methods.
-
-6.0.0: November 10, 2015
- - Implement :samp:`--deterministic-id` command-line
- option and ``QPDFWriter::setDeterministicID`` as well as C API
- function ``qpdf_set_deterministic_ID`` for generating a
- deterministic ID for non-encrypted files. When this option is
- selected, the ID of the file depends on the contents of the output
- file, and not on transient items such as the timestamp or output
- file name.
-
- - Make qpdf more tolerant of files whose xref table entries are not
- the correct length.
-
-5.1.3: May 24, 2015
- - Bug fix: fix-qdf was not properly handling files that contained
- object streams with more than 255 objects in them.
-
- - Bug fix: qpdf was not properly initializing Microsoft's secure
- crypto provider on fresh Windows installations that had not had
- any keys created yet.
-
- - Fix a few errors found by Gynvael Coldwind and Mateusz Jurczyk of
- the Google Security Team. Please see the ChangeLog for details.
-
- - Properly handle pages that have no contents at all. There were
- many cases in which qpdf handled this fine, but a few methods
- blindly obtained page contents with handling the possibility that
- there were no contents.
-
- - Make qpdf more robust for a few more kinds of problems that may
- occur in invalid PDF files.
-
-5.1.2: June 7, 2014
- - Bug fix: linearizing files could create a corrupted output file
- under extremely unlikely file size circumstances. See ChangeLog
- for details. The odds of getting hit by this are very low, though
- one person did.
-
- - Bug fix: qpdf would fail to write files that had streams with
- decode parameters referencing other streams.
-
- - New example program: :command:`pdf-split-pages`:
- efficiently split PDF files into individual pages. The example
- program does this more efficiently than using :command:`qpdf
- --pages` to do it.
-
- - Packaging fix: Visual C++ binaries did not support Windows XP.
- This has been rectified by updating the compilers used to generate
- the release binaries.
-
-5.1.1: January 14, 2014
- - Performance fix: copying foreign objects could be very slow with
- certain types of files. This was most likely to be visible during
- page splitting and was due to traversing the same objects multiple
- times in some cases.
-
-5.1.0: December 17, 2013
- - Added runtime option (``QUtil::setRandomDataProvider``) to supply
- your own random data provider. You can use this if you want to
- avoid using the OS-provided secure random number generation
- facility or stdlib's less secure version. See comments in
- include/qpdf/QUtil.hh for details.
-
- - Fixed image comparison tests to not create 12-bit-per-pixel images
- since some versions of tiffcmp have bugs in comparing them in some
- cases. This increases the disk space required by the image
- comparison tests, which are off by default anyway.
-
- - Introduce a number of small fixes for compilation on the latest
- clang in MacOS and the latest Visual C++ in Windows.
-
- - Be able to handle broken files that end the xref table header with
- a space instead of a newline.
-
-5.0.1: October 18, 2013
- - Thanks to a detailed review by Florian Weimer and the Red Hat
- Product Security Team, this release includes a number of
- non-user-visible security hardening changes. Please see the
- ChangeLog file in the source distribution for the complete list.
-
- - When available, operating system-specific secure random number
- generation is used for generating initialization vectors and other
- random values used during encryption or file creation. For the
- Windows build, this results in an added dependency on Microsoft's
- cryptography API. To disable the OS-specific cryptography and use
- the old version, pass the
- :samp:`--enable-insecure-random` option to
- :command:`./configure`.
-
- - The :command:`qpdf` command-line tool now issues a
- warning when :samp:`-accessibility=n` is specified
- for newer encryption versions stating that the option is ignored.
- qpdf, per the spec, has always ignored this flag, but it
- previously did so silently. This warning is issued only by the
- command-line tool, not by the library. The library's handling of
- this flag is unchanged.
-
-5.0.0: July 10, 2013
- - Bug fix: previous versions of qpdf would lose objects with
- generation != 0 when generating object streams. Fixing this
- required changes to the public API.
-
- - Removed methods from public API that were only supposed to be
- called by QPDFWriter and couldn't realistically be called anywhere
- else. See ChangeLog for details.
-
- - New ``QPDFObjGen`` class added to represent an object
- ID/generation pair. ``QPDFObjectHandle::getObjGen()`` is now
- preferred over ``QPDFObjectHandle::getObjectID()`` and
- ``QPDFObjectHandle::getGeneration()`` as it makes it less likely
- for people to accidentally write code that ignores the generation
- number. See :file:`QPDF.hh` and
- :file:`QPDFObjectHandle.hh` for additional
- notes.
-
- - Add :samp:`--show-npages` command-line option to
- the :command:`qpdf` command to show the number of
- pages in a file.
-
- - Allow omission of the page range within
- :samp:`--pages` for the
- :command:`qpdf` command. When omitted, the page
- range is implicitly taken to be all the pages in the file.
-
- - Various enhancements were made to support different types of
- broken files or broken readers. Details can be found in
- :file:`ChangeLog`.
-
-4.1.0: April 14, 2013
- - Note to people including qpdf in distributions: the
- :file:`.la` files generated by libtool are now
- installed by qpdf's :command:`make install` target.
- Before, they were not installed. This means that if your
- distribution does not want to include
- :file:`.la` files, you must remove them as
- part of your packaging process.
-
- - Major enhancement: API enhancements have been made to support
- parsing of content streams. This enhancement includes the
- following changes:
-
- - ``QPDFObjectHandle::parseContentStream`` method parses objects
- in a content stream and calls handlers in a callback class. The
- example
- :file:`examples/pdf-parse-content.cc`
- illustrates how this may be used.
-
- - ``QPDFObjectHandle`` can now represent operators and inline
- images, object types that may only appear in content streams.
-
- - Method ``QPDFObjectHandle::getTypeCode()`` returns an
- enumerated type value representing the underlying object type.
- Method ``QPDFObjectHandle::getTypeName()`` returns a text
- string describing the name of the type of a
- ``QPDFObjectHandle`` object. These methods can be used for more
- efficient parsing and debugging/diagnostic messages.
-
- - :command:`qpdf --check` now parses all pages'
- content streams in addition to doing other checks. While there are
- still many types of errors that cannot be detected, syntactic
- errors in content streams will now be reported.
-
- - Minor compilation enhancements have been made to facilitate easier
- for support for a broader range of compilers and compiler
- versions.
-
- - Warning flags have been moved into a separate variable in
- :file:`autoconf.mk`
-
- - The configure flag :samp:`--enable-werror` work
- for Microsoft compilers
-
- - All MSVC CRT security warnings have been resolved.
-
- - All C-style casts in C++ Code have been replaced by C++ casts,
- and many casts that had been included to suppress higher
- warning levels for some compilers have been removed, primarily
- for clarity. Places where integer type coercion occurs have
- been scrutinized. A new casting policy has been documented in
- the manual. This is of concern mainly to people porting qpdf to
- new platforms or compilers. It is not visible to programmers
- writing code that uses the library
-
- - Some internal limits have been removed in code that converts
- numbers to strings. This is largely invisible to users, but it
- does trigger a bug in some older versions of mingw-w64's C++
- library. See :file:`README-windows.md` in
- the source distribution if you think this may affect you. The
- copy of the DLL distributed with qpdf's binary distribution is
- not affected by this problem.
-
- - The RPM spec file previously included with qpdf has been removed.
- This is because virtually all Linux distributions include qpdf now
- that it is a dependency of CUPS filters.
-
- - A few bug fixes are included:
-
- - Overridden compressed objects are properly handled. Before,
- there were certain constructs that could cause qpdf to see old
- versions of some objects. The most usual manifestation of this
- was loss of filled in form values for certain files.
-
- - Installation no longer uses GNU/Linux-specific versions of some
- commands, so :command:`make install` works on
- Solaris with native tools.
-
- - The 64-bit mingw Windows binary package no longer includes a
- 32-bit DLL.
-
-4.0.1: January 17, 2013
- - Fix detection of binary attachments in test suite to avoid false
- test failures on some platforms.
-
- - Add clarifying comment in :file:`QPDF.hh` to
- methods that return the user password explaining that it is no
- longer possible with newer encryption formats to recover the user
- password knowing the owner password. In earlier encryption
- formats, the user password was encrypted in the file using the
- owner password. In newer encryption formats, a separate encryption
- key is used on the file, and that key is independently encrypted
- using both the user password and the owner password.
-
-4.0.0: December 31, 2012
- - Major enhancement: support has been added for newer encryption
- schemes supported by version X of Adobe Acrobat. This includes use
- of 127-character passwords, 256-bit encryption keys, and the
- encryption scheme specified in ISO 32000-2, the PDF 2.0
- specification. This scheme can be chosen from the command line by
- specifying use of 256-bit keys. qpdf also supports the deprecated
- encryption method used by Acrobat IX. This encryption style has
- known security weaknesses and should not be used in practice.
- However, such files exist "in the wild," so support for this
- scheme is still useful. New methods
- ``QPDFWriter::setR6EncryptionParameters`` (for the PDF 2.0 scheme)
- and ``QPDFWriter::setR5EncryptionParameters`` (for the deprecated
- scheme) have been added to enable these new encryption schemes.
- Corresponding functions have been added to the C API as well.
-
- - Full support for Adobe extension levels in PDF version
- information. Starting with PDF version 1.7, corresponding to ISO
- 32000, Adobe adds new functionality by increasing the extension
- level rather than increasing the version. This support includes
- addition of the ``QPDF::getExtensionLevel`` method for retrieving
- the document's extension level, addition of versions of
- ``QPDFWriter::setMinimumPDFVersion`` and
- ``QPDFWriter::forcePDFVersion`` that accept an extension level,
- and extended syntax for specifying forced and minimum versions on
- the command line as described in :ref:`ref.advanced-transformation`. Corresponding functions
- have been added to the C API as well.
-
- - Minor fixes to prevent qpdf from referencing objects in the file
- that are not referenced in the file's overall structure. Most
- files don't have any such objects, but some files have contain
- unreferenced objects with errors, so these fixes prevent qpdf from
- needlessly rejecting or complaining about such objects.
-
- - Add new generalized methods for reading and writing files from/to
- programmer-defined sources. The method
- ``QPDF::processInputSource`` allows the programmer to use any
- input source for the input file, and
- ``QPDFWriter::setOutputPipeline`` allows the programmer to write
- the output file through any pipeline. These methods would make it
- possible to perform any number of specialized operations, such as
- accessing external storage systems, creating bindings for qpdf in
- other programming languages that have their own I/O systems, etc.
-
- - Add new method ``QPDF::getEncryptionKey`` for retrieving the
- underlying encryption key used in the file.
-
- - This release includes a small handful of non-compatible API
- changes. While effort is made to avoid such changes, all the
- non-compatible API changes in this version were to parts of the
- API that would likely never be used outside the library itself. In
- all cases, the altered methods or structures were parts of the
- ``QPDF`` that were public to enable them to be called from either
- ``QPDFWriter`` or were part of validation code that was
- over-zealous in reporting problems in parts of the file that would
- not ordinarily be referenced. In no case did any of the removed
- methods do anything worse that falsely report error conditions in
- files that were broken in ways that didn't matter. The following
- public parts of the ``QPDF`` class were changed in a
- non-compatible way:
-
- - Updated nested ``QPDF::EncryptionData`` class to add fields
- needed by the newer encryption formats, member variables
- changed to private so that future changes will not require
- breaking backward compatibility.
-
- - Added additional parameters to ``compute_data_key``, which is
- used by ``QPDFWriter`` to compute the encryption key used to
- encrypt a specific object.
-
- - Removed the method ``flattenScalarReferences``. This method was
- previously used prior to writing a new PDF file, but it has the
- undesired side effect of causing qpdf to read objects in the
- file that were not referenced. Some otherwise files have
- unreferenced objects with errors in them, so this could cause
- qpdf to reject files that would be accepted by virtually all
- other PDF readers. In fact, qpdf relied on only a very small
- part of what flattenScalarReferences did, so only this part has
- been preserved, and it is now done directly inside
- ``QPDFWriter``.
-
- - Removed the method ``decodeStreams``. This method was used by
- the :samp:`--check` option of the
- :command:`qpdf` command-line tool to force all
- streams in the file to be decoded, but it also suffered from
- the problem of opening otherwise unreferenced streams and thus
- could report false positive. The
- :samp:`--check` option now causes qpdf to go
- through all the motions of writing a new file based on the
- original one, so it will always reference and check exactly
- those parts of a file that any ordinary viewer would check.
-
- - Removed the method ``trimTrailerForWrite``. This method was
- used by ``QPDFWriter`` to modify the original QPDF object by
- removing fields from the trailer dictionary that wouldn't apply
- to the newly written file. This functionality, though generally
- harmless, was a poor implementation and has been replaced by
- having QPDFWriter filter these out when copying the trailer
- rather than modifying the original QPDF object. (Note that qpdf
- never modifies the original file itself.)
-
- - Allow the PDF header to appear anywhere in the first 1024 bytes of
- the file. This is consistent with what other readers do.
-
- - Fix the :command:`pkg-config` files to list zlib
- and pcre in ``Requires.private`` to better support static linking
- using :command:`pkg-config`.
-
-3.0.2: September 6, 2012
- - Bug fix: ``QPDFWriter::setOutputMemory`` did not work when not
- used with ``QPDFWriter::setStaticID``, which made it pretty much
- useless. This has been fixed.
-
- - New API call ``QPDFWriter::setExtraHeaderText`` inserts additional
- text near the header of the PDF file. The intended use case is to
- insert comments that may be consumed by a downstream application,
- though other use cases may exist.
-
-3.0.1: August 11, 2012
- - Version 3.0.0 included addition of files for
- :command:`pkg-config`, but this was not mentioned
- in the release notes. The release notes for 3.0.0 were updated to
- mention this.
-
- - Bug fix: if an object stream ended with a scalar object not
- followed by space, qpdf would incorrectly report that it
- encountered a premature EOF. This bug has been in qpdf since
- version 2.0.
-
-3.0.0: August 2, 2012
- - Acknowledgment: I would like to express gratitude for the
- contributions of Tobias Hoffmann toward the release of qpdf
- version 3.0. He is responsible for most of the implementation and
- design of the new API for manipulating pages, and contributed code
- and ideas for many of the improvements made in version 3.0.
- Without his work, this release would certainly not have happened
- as soon as it did, if at all.
-
- - *Non-compatible API changes:*
-
- - The method ``QPDFObjectHandle::replaceStreamData`` that uses a
- ``StreamDataProvider`` to provide the stream data no longer
- takes a ``length`` parameter. The parameter was removed since
- this provides the user an opportunity to simplify the calling
- code. This method was introduced in version 2.2. At the time,
- the ``length`` parameter was required in order to ensure that
- calls to the stream data provider returned the same length for a
- specific stream every time they were invoked. In particular, the
- linearization code depends on this. Instead, qpdf 3.0 and newer
- check for that constraint explicitly. The first time the stream
- data provider is called for a specific stream, the actual length
- is saved, and subsequent calls are required to return the same
- number of bytes. This means the calling code no longer has to
- compute the length in advance, which can be a significant
- simplification. If your code fails to compile because of the
- extra argument and you don't want to make other changes to your
- code, just omit the argument.
-
- - Many methods take ``long long`` instead of other integer types.
- Most if not all existing code should compile fine with this
- change since such parameters had always previously been smaller
- types. This change was required to support files larger than two
- gigabytes in size.
-
- - Support has been added for large files. The test suite verifies
- support for files larger than 4 gigabytes, and manual testing has
- verified support for files larger than 10 gigabytes. Large file
- support is available for both 32-bit and 64-bit platforms as long
- as the compiler and underlying platforms support it.
-
- - Support for page selection (splitting and merging PDF files) has
- been added to the :command:`qpdf` command-line
- tool. See :ref:`ref.page-selection`.
-
- - Options have been added to the :command:`qpdf`
- command-line tool for copying encryption parameters from another
- file. See :ref:`ref.basic-options`.
-
- - New methods have been added to the ``QPDF`` object for adding and
- removing pages. See :ref:`ref.adding-and-remove-pages`.
-
- - New methods have been added to the ``QPDF`` object for copying
- objects from other PDF files. See :ref:`ref.foreign-objects`
-
- - A new method ``QPDFObjectHandle::parse`` has been added for
- constructing ``QPDFObjectHandle`` objects from a string
- description.
-
- - Methods have been added to ``QPDFWriter`` to allow writing to an
- already open stdio ``FILE*`` addition to writing to standard
- output or a named file. Methods have been added to ``QPDF`` to be
- able to process a file from an already open stdio ``FILE*``. This
- makes it possible to read and write PDF from secure temporary
- files that have been unlinked prior to being fully read or
- written.
-
- - The ``QPDF::emptyPDF`` can be used to allow creation of PDF files
- from scratch. The example
- :file:`examples/pdf-create.cc` illustrates how
- it can be used.
-
- - Several methods to take ``PointerHolder<Buffer>`` can now also
- accept ``std::string`` arguments.
-
- - Many new convenience methods have been added to the library, most
- in ``QPDFObjectHandle``. See :file:`ChangeLog`
- for a full list.
-
- - When building on a platform that supports ELF shared libraries
- (such as Linux), symbol versions are enabled by default. They can
- be disabled by passing
- :samp:`--disable-ld-version-script` to
- :command:`./configure`.
-
- - The file :file:`libqpdf.pc` is now installed
- to support :command:`pkg-config`.
-
- - Image comparison tests are off by default now since they are not
- needed to verify a correct build or port of qpdf. They are needed
- only when changing the actual PDF output generated by qpdf. You
- should enable them if you are making deep changes to qpdf itself.
- See :file:`README.md` for details.
-
- - Large file tests are off by default but can be turned on with
- :command:`./configure` or by setting an environment
- variable before running the test suite. See
- :file:`README.md` for details.
-
- - When qpdf's test suite fails, failures are not printed to the
- terminal anymore by default. Instead, find them in
- :file:`build/qtest.log`. For packagers who are
- building with an autobuilder, you can add the
- :samp:`--enable-show-failed-test-output` option to
- :command:`./configure` to restore the old behavior.
-
-2.3.1: December 28, 2011
- - Fix thread-safety problem resulting from non-thread-safe use of
- the PCRE library.
-
- - Made a few minor documentation fixes.
-
- - Add workaround for a bug that appears in some versions of
- ghostscript to the test suite
-
- - Fix minor build issue for Visual C++ 2010.
-
-2.3.0: August 11, 2011
- - Bug fix: when preserving existing encryption on encrypted files
- with cleartext metadata, older qpdf versions would generate
- password-protected files with no valid password. This operation
- now works. This bug only affected files created by copying
- existing encryption parameters; explicit encryption with
- specification of cleartext metadata worked before and continues to
- work.
-
- - Enhance ``QPDFWriter`` with a new constructor that allows you to
- delay the specification of the output file. When using this
- constructor, you may now call ``QPDFWriter::setOutputFilename`` to
- specify the output file, or you may use
- ``QPDFWriter::setOutputMemory`` to cause ``QPDFWriter`` to write
- the resulting PDF file to a memory buffer. You may then use
- ``QPDFWriter::getBuffer`` to retrieve the memory buffer.
-
- - Add new API call ``QPDF::replaceObject`` for replacing objects by
- object ID
-
- - Add new API call ``QPDF::swapObjects`` for swapping two objects by
- object ID
-
- - Add ``QPDFObjectHandle::getDictAsMap`` and
- ``QPDFObjectHandle::getArrayAsVector`` to allow retrieval of
- dictionary objects as maps and array objects as vectors.
-
- - Add functions ``qpdf_get_info_key`` and ``qpdf_set_info_key`` to
- the C API for manipulating string fields of the document's
- ``/Info`` dictionary.
-
- - Add functions ``qpdf_init_write_memory``,
- ``qpdf_get_buffer_length``, and ``qpdf_get_buffer`` to the C API
- for writing PDF files to a memory buffer instead of a file.
-
-2.2.4: June 25, 2011
- - Fix installation and compilation issues; no functionality changes.
-
-2.2.3: April 30, 2011
- - Handle some damaged streams with incorrect characters following
- the stream keyword.
-
- - Improve handling of inline images when normalizing content
- streams.
-
- - Enhance error recovery to properly handle files that use object 0
- as a regular object, which is specifically disallowed by the spec.
-
-2.2.2: October 4, 2010
- - Add new function ``qpdf_read_memory`` to the C API to call
- ``QPDF::processMemoryFile``. This was an omission in qpdf 2.2.1.
-
-2.2.1: October 1, 2010
- - Add new method ``QPDF::setOutputStreams`` to replace ``std::cout``
- and ``std::cerr`` with other streams for generation of diagnostic
- messages and error messages. This can be useful for GUIs or other
- applications that want to capture any output generated by the
- library to present to the user in some other way. Note that QPDF
- does not write to ``std::cout`` (or the specified output stream)
- except where explicitly mentioned in
- :file:`QPDF.hh`, and that the only use of the
- error stream is for warnings. Note also that output of warnings is
- suppressed when ``setSuppressWarnings(true)`` is called.
-
- - Add new method ``QPDF::processMemoryFile`` for operating on PDF
- files that are loaded into memory rather than in a file on disk.
-
- - Give a warning but otherwise ignore empty PDF objects by treating
- them as null. Empty object are not permitted by the PDF
- specification but have been known to appear in some actual PDF
- files.
-
- - Handle inline image filter abbreviations when the appear as stream
- filter abbreviations. The PDF specification does not allow use of
- stream filter abbreviations in this way, but Adobe Reader and some
- other PDF readers accept them since they sometimes appear
- incorrectly in actual PDF files.
-
- - Implement miscellaneous enhancements to ``PointerHolder`` and
- ``Buffer`` to support other changes.
-
-2.2.0: August 14, 2010
- - Add new methods to ``QPDFObjectHandle`` (``newStream`` and
- ``replaceStreamData`` for creating new streams and replacing
- stream data. This makes it possible to perform a wide range of
- operations that were not previously possible.
-
- - Add new helper method in ``QPDFObjectHandle``
- (``addPageContents``) for appending or prepending new content
- streams to a page. This method makes it possible to manipulate
- content streams without having to be concerned whether a page's
- contents are a single stream or an array of streams.
-
- - Add new method in ``QPDFObjectHandle``: ``replaceOrRemoveKey``,
- which replaces a dictionary key with a given value unless the
- value is null, in which case it removes the key instead.
-
- - Add new method in ``QPDFObjectHandle``: ``getRawStreamData``,
- which returns the raw (unfiltered) stream data into a buffer. This
- complements the ``getStreamData`` method, which returns the
- filtered (uncompressed) stream data and can only be used when the
- stream's data is filterable.
-
- - Provide two new examples:
- :command:`pdf-double-page-size` and
- :command:`pdf-invert-images` that illustrate the
- newly added interfaces.
-
- - Fix a memory leak that would cause loss of a few bytes for every
- object involved in a cycle of object references. Thanks to Jian Ma
- for calling my attention to the leak.
-
-2.1.5: April 25, 2010
- - Remove restriction of file identifier strings to 16 bytes. This
- unnecessary restriction was preventing qpdf from being able to
- encrypt or decrypt files with identifier strings that were not
- exactly 16 bytes long. The specification imposes no such
- restriction.
-
-2.1.4: April 18, 2010
- - Apply the same padding calculation fix from version 2.1.2 to the
- main cross reference stream as well.
-
- - Since :command:`qpdf --check` only performs limited
- checks, clarify the output to make it clear that there still may
- be errors that qpdf can't check. This should make it less
- surprising to people when another PDF reader is unable to read a
- file that qpdf thinks is okay.
-
-2.1.3: March 27, 2010
- - Fix bug that could cause a failure when rewriting PDF files that
- contain object streams with unreferenced objects that in turn
- reference indirect scalars.
-
- - Don't complain about (invalid) AES streams that aren't a multiple
- of 16 bytes. Instead, pad them before decrypting.
-
-2.1.2: January 24, 2010
- - Fix bug in padding around first half cross reference stream in
- linearized files. The bug could cause an assertion failure when
- linearizing certain unlucky files.
-
-2.1.1: December 14, 2009
- - No changes in functionality; insert missing include in an internal
- library header file to support gcc 4.4, and update test suite to
- ignore broken Adobe Reader installations.
-
-2.1: October 30, 2009
- - This is the first version of qpdf to include Windows support. On
- Windows, it is possible to build a DLL. Additionally, a partial
- C-language API has been introduced, which makes it possible to
- call qpdf functions from non-C++ environments. I am very grateful
- to Žarko Gajić (http://zarko-gajic.iz.hr/) for tirelessly testing
- numerous pre-release versions of this DLL and providing many
- excellent suggestions on improving the interface.
-
- For programming to the C interface, please see the header file
- :file:`qpdf/qpdf-c.h` and the example
- :file:`examples/pdf-linearize.c`.
-
- - Žarko Gajić has written a Delphi wrapper for qpdf, which can be
- downloaded from qpdf's download side. Žarko's Delphi wrapper is
- released with the same licensing terms as qpdf itself and comes
- with this disclaimer: "Delphi wrapper unit
- :file:`qpdf.pas` created by Žarko Gajić
- (http://zarko-gajic.iz.hr/). Use at your own risk and for whatever
- purpose you want. No support is provided. Sample code is
- provided."
-
- - Support has been added for AES encryption and crypt filters.
- Although qpdf does not presently support files that use PKI-based
- encryption, with the addition of AES and crypt filters, qpdf is
- now be able to open most encrypted files created with newer
- versions of Acrobat or other PDF creation software. Note that I
- have not been able to get very many files encrypted in this way,
- so it's possible there could still be some cases that qpdf can't
- handle. Please report them if you find them.
-
- - Many error messages have been improved to include more information
- in hopes of making qpdf a more useful tool for PDF experts to use
- in manually recovering damaged PDF files.
-
- - Attempt to avoid compressing metadata streams if possible. This is
- consistent with other PDF creation applications.
-
- - Provide new command-line options for AES encrypt, cleartext
- metadata, and setting the minimum and forced PDF versions of
- output files.
-
- - Add additional methods to the ``QPDF`` object for querying the
- document's permissions. Although qpdf does not enforce these
- permissions, it does make them available so that applications that
- use qpdf can enforce permissions.
-
- - The :samp:`--check` option to
- :command:`qpdf` has been extended to include some
- additional information.
-
- - *Non-compatible API changes:*
-
- - QPDF's exception handling mechanism now uses
- ``std::logic_error`` for internal errors and
- ``std::runtime_error`` for runtime errors in favor of the now
- removed ``QEXC`` classes used in previous versions. The ``QEXC``
- exception classes predated the addition of the
- :file:`<stdexcept>` header file to the C++ standard library.
- Most of the exceptions thrown by the qpdf library itself are
- still of type ``QPDFExc`` which is now derived from
- ``std::runtime_error``. Programs that catch an instance of
- ``std::exception`` and displayed it by calling the ``what()``
- method will not need to be changed.
-
- - The ``QPDFExc`` class now internally represents various fields
- of the error condition and provides interfaces for querying
- them. Among the fields is a numeric error code that can help
- applications act differently on (a small number of) different
- error conditions. See :file:`QPDFExc.hh` for details.
-
- - Warnings can be retrieved from qpdf as instances of ``QPDFExc``
- instead of strings.
-
- - The nested ``QPDF::EncryptionData`` class's constructor takes an
- additional argument. This class is primarily intended to be used
- by ``QPDFWriter``. There's not really anything useful an
- end-user application could do with it. It probably shouldn't
- really be part of the public interface to begin with. Likewise,
- some of the methods for computing internal encryption dictionary
- parameters have changed to support ``/R=4`` encryption.
-
- - The method ``QPDF::getUserPassword`` has been removed since it
- didn't do what people would think it did. There are now two new
- methods: ``QPDF::getPaddedUserPassword`` and
- ``QPDF::getTrimmedUserPassword``. The first one does what the
- old ``QPDF::getUserPassword`` method used to do, which is to
- return the password with possible binary padding as specified by
- the PDF specification. The second one returns a human-readable
- password string.
-
- - The enumerated types that used to be nested in ``QPDFWriter``
- have moved to top-level enumerated types and are now defined in
- the file :file:`qpdf/Constants.h`. This enables them to be
- shared by both the C and C++ interfaces.
-
-2.0.6: May 3, 2009
- - Do not attempt to uncompress streams that have decode parameters
- we don't recognize. Earlier versions of qpdf would have rejected
- files with such streams.
-
-2.0.5: March 10, 2009
- - Improve error handling in the LZW decoder, and fix a small error
- introduced in the previous version with regard to handling full
- tables. The LZW decoder has been more strongly verified in this
- release.
-
-2.0.4: February 21, 2009
- - Include proper support for LZW streams encoded without the "early
- code change" flag. Special thanks to Atom Smasher who reported the
- problem and provided an input file compressed in this way, which I
- did not previously have.
-
- - Implement some improvements to file recovery logic.
-
-2.0.3: February 15, 2009
- - Compile cleanly with gcc 4.4.
-
- - Handle strings encoded as UTF-16BE properly.
-
-2.0.2: June 30, 2008
- - Update test suite to work properly with a
- non-:command:`bash`
- :file:`/bin/sh` and with Perl 5.10. No changes
- were made to the actual qpdf source code itself for this release.
-
-2.0.1: May 6, 2008
- - No changes in functionality or interface. This release includes
- fixes to the source code so that qpdf compiles properly and passes
- its test suite on a broader range of platforms. See
- :file:`ChangeLog` in the source distribution
- for details.
-
-2.0: April 29, 2008
- - First public release.
-
-.. _acknowledgments:
-
-Acknowledgment
-==============
-
-QPDF was originally created in 2001 and modified periodically between
-2001 and 2005 during my employment at `Apex CoVantage
-<http://www.apexcovantage.com>`__. Upon my departure from Apex, the
-company graciously allowed me to take ownership of the software and
-continue maintaining it as an open source project, a decision for which I
-am very grateful. I have made considerable enhancements to it since
-that time. I feel fortunate to have worked for people who would make
-such a decision. This work would not have been possible without their
-support.
+ overview
+ license
+ installation
+ cli
+ qdf
+ library
+ weak-crypto
+ json
+ design
+ linearization
+ object-streams
+ release-notes
+ acknowledgement
diff --git a/manual/installation.rst b/manual/installation.rst
new file mode 100644
index 00000000..8862034d
--- /dev/null
+++ b/manual/installation.rst
@@ -0,0 +1,342 @@
+.. _ref.installing:
+
+Building and Installing QPDF
+============================
+
+This chapter describes how to build and install qpdf. Please see also
+the :file:`README.md` and
+:file:`INSTALL` files in the source distribution.
+
+.. _ref.prerequisites:
+
+System Requirements
+-------------------
+
+The qpdf package has few external dependencies. In order to build qpdf,
+the following packages are required:
+
+- A C++ compiler that supports C++-14.
+
+- zlib: http://www.zlib.net/
+
+- jpeg: http://www.ijg.org/files/ or https://libjpeg-turbo.org/
+
+- *Recommended but not required:* gnutls: https://www.gnutls.org/ to be
+ able to use the gnutls crypto provider, and/or openssl:
+ https://openssl.org/ to be able to use the openssl crypto provider.
+
+- gnu make 3.81 or newer: http://www.gnu.org/software/make
+
+- perl version 5.8 or newer: http://www.perl.org/; required for running
+ the test suite. Starting with qpdf version 9.1.1, perl is no longer
+ required at runtime.
+
+- GNU diffutils (any version): http://www.gnu.org/software/diffutils/
+ is required to run the test suite. Note that this is the version of
+ diff present on virtually all GNU/Linux systems. This is required
+ because the test suite uses :command:`diff -u`.
+
+Part of qpdf's test suite does comparisons of the contents PDF files by
+converting them images and comparing the images. The image comparison
+tests are disabled by default. Those tests are not required for
+determining correctness of a qpdf build if you have not modified the
+code since the test suite also contains expected output files that are
+compared literally. The image comparison tests provide an extra check to
+make sure that any content transformations don't break the rendering of
+pages. Transformations that affect the content streams themselves are
+off by default and are only provided to help developers look into the
+contents of PDF files. If you are making deep changes to the library
+that cause changes in the contents of the files that qpdf generate,
+then you should enable the image comparison tests. Enable them by
+running :command:`configure` with the
+:samp:`--enable-test-compare-images` flag. If you enable
+this, the following additional requirements are required by the test
+suite. Note that in no case are these items required to use qpdf.
+
+- libtiff: http://www.remotesensing.org/libtiff/
+
+- GhostScript version 8.60 or newer: http://www.ghostscript.com
+
+If you do not enable this, then you do not need to have tiff and
+ghostscript.
+
+Pre-built documentation is distributed with qpdf, so you should
+generally not need to rebuild the documentation. In order to build the
+documentation from source, you need to install `Sphinx
+<https://sphinx-doc.org>`__. To build the PDF version of the
+documentation, you need `pdflatex`, `latexmk`, and a fairly complete
+LaTeX installation. Detailed requirements can be found in the Sphinx
+documentation.
+
+.. _ref.building:
+
+Build Instructions
+------------------
+
+Building qpdf on UNIX is generally just a matter of running
+
+::
+
+ ./configure
+ make
+
+You can also run :command:`make check` to run the test
+suite and :command:`make install` to install. Please run
+:command:`./configure --help` for options on what can be
+configured. You can also set the value of ``DESTDIR`` during
+installation to install to a temporary location, as is common with many
+open source packages. Please see also the
+:file:`README.md` and
+:file:`INSTALL` files in the source distribution.
+
+Building on Windows is a little bit more complicated. For details,
+please see :file:`README-windows.md` in the source
+distribution. You can also download a binary distribution for Windows.
+There is a port of qpdf to Visual C++ version 6 in the
+:file:`contrib` area generously contributed by Jian
+Ma. This is also discussed in more detail in
+:file:`README-windows.md`.
+
+While ``wchar_t`` is part of the C++ standard, qpdf uses it in only one
+place in the public API, and it's just in a helper function. It is
+possible to build qpdf on a system that doesn't have ``wchar_t``, and
+it's also possible to compile a program that uses qpdf on a system
+without ``wchar_t`` as long as you don't call that one method. This is a
+very unusual situation. For a detailed discussion, please see the
+top-level README.md file in qpdf's source distribution.
+
+There are some other things you can do with the build. Although qpdf
+uses :command:`autoconf`, it does not use
+:command:`automake` but instead uses a
+hand-crafted non-recursive Makefile that requires gnu make. If you're
+really interested, please read the comments in the top-level
+:file:`Makefile`.
+
+.. _ref.crypto:
+
+Crypto Providers
+----------------
+
+Starting with qpdf 9.1.0, the qpdf library can be built with multiple
+implementations of providers of cryptographic functions, which we refer
+to as "crypto providers." At the time of writing, a crypto
+implementation must provide MD5 and SHA2 (256, 384, and 512-bit) hashes
+and RC4 and AES256 with and without CBC encryption. In the future, if
+digital signature is added to qpdf, there may be additional requirements
+beyond this.
+
+Starting with qpdf version 9.1.0, the available implementations are
+``native`` and ``gnutls``. In qpdf 10.0.0, ``openssl`` was added.
+Additional implementations may be added if needed. It is also possible
+for a developer to provide their own implementation without modifying
+the qpdf library.
+
+.. _ref.crypto.build:
+
+Build Support For Crypto Providers
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When building with qpdf's build system, crypto providers can be enabled
+at build time using various :command:`./configure`
+options. The default behavior is for
+:command:`./configure` to discover which crypto providers
+can be supported based on available external libraries, to build all
+available crypto providers, and to use an external provider as the
+default over the native one. This behavior can be changed with the
+following flags to :command:`./configure`:
+
+- :samp:`--enable-crypto-{x}`
+ (where :samp:`{x}` is a supported crypto
+ provider): enable the :samp:`{x}` crypto
+ provider, requiring any external dependencies it needs
+
+- :samp:`--disable-crypto-{x}`:
+ disable the :samp:`{x}` provider, and do not
+ link against its dependencies even if they are available
+
+- :samp:`--with-default-crypto={x}`:
+ make :samp:`{x}` the default provider even if
+ a higher priority one is available
+
+- :samp:`--disable-implicit-crypto`: only build crypto
+ providers that are explicitly requested with an
+ :samp:`--enable-crypto-{x}`
+ option
+
+For example, if you want to guarantee that the gnutls crypto provider is
+used and that the native provider is not built, you could run
+:command:`./configure --enable-crypto-gnutls
+--disable-implicit-crypto`.
+
+If you build qpdf using your own build system, in order for qpdf to work
+at all, you need to enable at least one crypto provider. The file
+:file:`libqpdf/qpdf/qpdf-config.h.in` provides
+macros ``DEFAULT_CRYPTO``, whose value must be a string naming the
+default crypto provider, and various symbols starting with
+``USE_CRYPTO_``, at least one of which has to be enabled. Additionally,
+you must compile the source files that implement a crypto provider. To
+get a list of those files, look at
+:file:`libqpdf/build.mk`. If you want to omit a
+particular crypto provider, as long as its ``USE_CRYPTO_`` symbol is
+undefined, you can completely ignore the source files that belong to a
+particular crypto provider. Additionally, crypto providers may have
+their own external dependencies that can be omitted if the crypto
+provider is not used. For example, if you are building qpdf yourself and
+are using an environment that does not support gnutls or openssl, you
+can ensure that ``USE_CRYPTO_NATIVE`` is defined, ``USE_CRYPTO_GNUTLS``
+is not defined, and ``DEFAULT_CRYPTO`` is defined to ``"native"``. Then
+you must include the source files used in the native implementation,
+some of which were added or renamed from earlier versions, to your
+build, and you can ignore
+:file:`QPDFCrypto_gnutls.cc`. Always consult
+:file:`libqpdf/build.mk` to get the list of source
+files you need to build.
+
+.. _ref.crypto.runtime:
+
+Runtime Crypto Provider Selection
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+You can use the :samp:`--show-crypto` option to
+:command:`qpdf` to get a list of available crypto
+providers. The default provider is always listed first, and the rest are
+listed in lexical order. Each crypto provider is listed on a line by
+itself with no other text, enabling the output of this command to be
+used easily in scripts.
+
+You can override which crypto provider is used by setting the
+``QPDF_CRYPTO_PROVIDER`` environment variable. There are few reasons to
+ever do this, but you might want to do it if you were explicitly trying
+to compare behavior of two different crypto providers while testing
+performance or reproducing a bug. It could also be useful for people who
+are implementing their own crypto providers.
+
+.. _ref.crypto.develop:
+
+Crypto Provider Information for Developers
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If you are writing code that uses libqpdf and you want to force a
+certain crypto provider to be used, you can call the method
+``QPDFCryptoProvider::setDefaultProvider``. The argument is the name of
+a built-in or developer-supplied provider. To add your own crypto
+provider, you have to create a class derived from ``QPDFCryptoImpl`` and
+register it with ``QPDFCryptoProvider``. For additional information, see
+comments in :file:`include/qpdf/QPDFCryptoImpl.hh`.
+
+.. _ref.crypto.design:
+
+Crypto Provider Design Notes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This section describes a few bits of rationale for why the crypto
+provider interface was set up the way it was. You don't need to know any
+of this information, but it's provided for the record and in case it's
+interesting.
+
+As a general rule, I want to avoid as much as possible including large
+blocks of code that are conditionally compiled such that, in most
+builds, some code is never built. This is dangerous because it makes it
+very easy for invalid code to creep in unnoticed. As such, I want it to
+be possible to build qpdf with all available crypto providers, and this
+is the way I build qpdf for local development. At the same time, if a
+particular packager feels that it is a security liability for qpdf to
+use crypto functionality from other than a library that gets
+considerable scrutiny for this specific purpose (such as gnutls,
+openssl, or nettle), then I want to give that packager the ability to
+completely disable qpdf's native implementation. Or if someone wants to
+avoid adding a dependency on one of the external crypto providers, I
+don't want the availability of the provider to impose additional
+external dependencies within that environment. Both of these are
+situations that I know to be true for some users of qpdf.
+
+I want registration and selection of crypto providers to be thread-safe,
+and I want it to work deterministically for a developer to provide their
+own crypto provider and be able to set it up as the default. This was
+the primary motivation behind requiring C++-11 as doing so enabled me to
+exploit the guaranteed thread safety of local block static
+initialization. The ``QPDFCryptoProvider`` class uses a singleton
+pattern with thread-safe initialization to create the singleton instance
+of ``QPDFCryptoProvider`` and exposes only static methods in its public
+interface. In this way, if a developer wants to call any
+``QPDFCryptoProvider`` methods, the library guarantees the
+``QPDFCryptoProvider`` is fully initialized and all built-in crypto
+providers are registered. Making ``QPDFCryptoProvider`` actually know
+about all the built-in providers may seem a bit sad at first, but this
+choice makes it extremely clear exactly what the initialization behavior
+is. There's no question about provider implementations automatically
+registering themselves in a nondeterministic order. It also means that
+implementations do not need to know anything about the provider
+interface, which makes them easier to test in isolation. Another
+advantage of this approach is that a developer who wants to develop
+their own crypto provider can do so in complete isolation from the qpdf
+library and, with just two calls, can make qpdf use their provider in
+their application. If they decided to contribute their code, plugging it
+into the qpdf library would require a very small change to qpdf's source
+code.
+
+The decision to make the crypto provider selectable at runtime was one I
+struggled with a little, but I decided to do it for various reasons.
+Allowing an end user to switch crypto providers easily could be very
+useful for reproducing a potential bug. If a user reports a bug that
+some cryptographic thing is broken, I can easily ask that person to try
+with the ``QPDF_CRYPTO_PROVIDER`` variable set to different values. The
+same could apply in the event of a performance problem. This also makes
+it easier for qpdf's own test suite to exercise code with different
+providers without having to make every program that links with qpdf
+aware of the possibility of multiple providers. In qpdf's continuous
+integration environment, the entire test suite is run for each supported
+crypto provider. This is made simple by being able to select the
+provider using an environment variable.
+
+Finally, making crypto providers selectable in this way establish a
+pattern that I may follow again in the future for stream filter
+providers. One could imagine a future enhancement where someone could
+provide their own implementations for basic filters like
+``/FlateDecode`` or for other filters that qpdf doesn't support.
+Implementing the registration functions and internal storage of
+registered providers was also easier using C++-11's functional
+interfaces, which was another reason to require C++-11 at this time.
+
+.. _ref.packaging:
+
+Notes for Packagers
+-------------------
+
+If you are packaging qpdf for an operating system distribution, here are
+some things you may want to keep in mind:
+
+- Starting in qpdf version 9.1.1, qpdf no longer has a runtime
+ dependency on perl. This is because fix-qdf was rewritten in C++.
+ However, qpdf still has a build-time dependency on perl.
+
+- Make sure you are getting the intended behavior with regard to crypto
+ providers. Read :ref:`ref.crypto.build` for details.
+
+- Passing :samp:`--enable-show-failed-test-output` to
+ :command:`./configure` will cause any failed test
+ output to be written to the console. This can be very useful for
+ seeing test failures generated by autobuilders where you can't access
+ qtest.log after the fact.
+
+- If qpdf's build environment detects the presence of autoconf and
+ related tools, it will check to ensure that automatically generated
+ files are up-to-date with recorded checksums and fail if it detects a
+ discrepancy. This feature is intended to prevent you from
+ accidentally forgetting to regenerate automatic files after modifying
+ their sources. If your packaging environment automatically refreshes
+ automatic files, it can cause this check to fail. Suppress qpdf's
+ checks by passing :samp:`--disable-check-autofiles`
+ to :command:`/.configure`. This is safe since qpdf's
+ :command:`autogen.sh` just runs autotools in the
+ normal way.
+
+- QPDF's :command:`make install` does not install
+ completion files by default, but as a packager, it's good if you
+ install them wherever your distribution expects such files to go. You
+ can find completion files to install in the
+ :file:`completions` directory.
+
+- Packagers are encouraged to install the source files from the
+ :file:`examples` directory along with qpdf
+ development packages.
diff --git a/manual/json.rst b/manual/json.rst
new file mode 100644
index 00000000..660486ef
--- /dev/null
+++ b/manual/json.rst
@@ -0,0 +1,177 @@
+.. _ref.json:
+
+QPDF JSON
+=========
+
+.. _ref.json-overview:
+
+Overview
+--------
+
+Beginning with qpdf version 8.3.0, the :command:`qpdf`
+command-line program can produce a JSON representation of the
+non-content data in a PDF file. It includes a dump in JSON format of all
+objects in the PDF file excluding the content of streams. This JSON
+representation makes it very easy to look in detail at the structure of
+a given PDF file, and it also provides a great way to work with PDF
+files programmatically from the command-line in languages that can't
+call or link with the qpdf library directly. Note that stream data can
+be extracted from PDF files using other qpdf command-line options.
+
+.. _ref.json-guarantees:
+
+JSON Guarantees
+---------------
+
+The qpdf JSON representation includes a JSON serialization of the raw
+objects in the PDF file as well as some computed information in a more
+easily extracted format. QPDF provides some guarantees about its JSON
+format. These guarantees are designed to simplify the experience of a
+developer working with the JSON format.
+
+Compatibility
+ The top-level JSON object output is a dictionary. The JSON output
+ contains various nested dictionaries and arrays. With the exception
+ of dictionaries that are populated by the fields of objects from the
+ file, all instances of a dictionary are guaranteed to have exactly
+ the same keys. Future versions of qpdf are free to add additional
+ keys but not to remove keys or change the type of object that a key
+ points to. The qpdf program validates this guarantee, and in the
+ unlikely event that a bug in qpdf should cause it to generate data
+ that doesn't conform to this rule, it will ask you to file a bug
+ report.
+
+ The top-level JSON structure contains a "``version``" key whose value
+ is simple integer. The value of the ``version`` key will be
+ incremented if a non-compatible change is made. A non-compatible
+ change would be any change that involves removal of a key, a change
+ to the format of data pointed to by a key, or a semantic change that
+ requires a different interpretation of a previously existing key. A
+ strong effort will be made to avoid breaking compatibility.
+
+Documentation
+ The :command:`qpdf` command can be invoked with the
+ :samp:`--json-help` option. This will output a JSON
+ structure that has the same structure as the JSON output that qpdf
+ generates, except that each field in the help output is a description
+ of the corresponding field in the JSON output. The specific
+ guarantees are as follows:
+
+ - A dictionary in the help output means that the corresponding
+ location in the actual JSON output is also a dictionary with
+ exactly the same keys; that is, no keys present in help are absent
+ in the real output, and no keys will be present in the real output
+ that are not in help. As a special case, if the dictionary has a
+ single key whose name starts with ``<`` and ends with ``>``, it
+ means that the JSON output is a dictionary that can have any keys,
+ each of which conforms to the value of the special key. This is
+ used for cases in which the keys of the dictionary are things like
+ object IDs.
+
+ - A string in the help output is a description of the item that
+ appears in the corresponding location of the actual output. The
+ corresponding output can have any format.
+
+ - An array in the help output always contains a single element. It
+ indicates that the corresponding location in the actual output is
+ also an array, and that each element of the array has whatever
+ format is implied by the single element of the help output's
+ array.
+
+ For example, the help output indicates includes a "``pagelabels``"
+ key whose value is an array of one element. That element is a
+ dictionary with keys "``index``" and "``label``". In addition to
+ describing the meaning of those keys, this tells you that the actual
+ JSON output will contain a ``pagelabels`` array, each of whose
+ elements is a dictionary that contains an ``index`` key, a ``label``
+ key, and no other keys.
+
+Directness and Simplicity
+ The JSON output contains the value of every object in the file, but
+ it also contains some processed data. This is analogous to how qpdf's
+ library interface works. The processed data is similar to the helper
+ functions in that it allows you to look at certain aspects of the PDF
+ file without having to understand all the nuances of the PDF
+ specification, while the raw objects allow you to mine the PDF for
+ anything that the higher-level interfaces are lacking.
+
+.. _json.limitations:
+
+Limitations of JSON Representation
+----------------------------------
+
+There are a few limitations to be aware of with the JSON structure:
+
+- Strings, names, and indirect object references in the original PDF
+ file are all converted to strings in the JSON representation. In the
+ case of a "normal" PDF file, you can tell the difference because a
+ name starts with a slash (``/``), and an indirect object reference
+ looks like ``n n R``, but if there were to be a string that looked
+ like a name or indirect object reference, there would be no way to
+ tell this from the JSON output. Note that there are certain cases
+ where you know for sure what something is, such as knowing that
+ dictionary keys in objects are always names and that certain things
+ in the higher-level computed data are known to contain indirect
+ object references.
+
+- The JSON format doesn't support binary data very well. Mostly the
+ details are not important, but they are presented here for
+ information. When qpdf outputs a string in the JSON representation,
+ it converts the string to UTF-8, assuming usual PDF string semantics.
+ Specifically, if the original string is UTF-16, it is converted to
+ UTF-8. Otherwise, it is assumed to have PDF doc encoding, and is
+ converted to UTF-8 with that assumption. This causes strange things
+ to happen to binary strings. For example, if you had the binary
+ string ``<038051>``, this would be output to the JSON as ``\u0003•Q``
+ because ``03`` is not a printable character and ``80`` is the bullet
+ character in PDF doc encoding and is mapped to the Unicode value
+ ``2022``. Since ``51`` is ``Q``, it is output as is. If you wanted to
+ convert back from here to a binary string, would have to recognize
+ Unicode values whose code points are higher than ``0xFF`` and map
+ those back to their corresponding PDF doc encoding characters. There
+ is no way to tell the difference between a Unicode string that was
+ originally encoded as UTF-16 or one that was converted from PDF doc
+ encoding. In other words, it's best if you don't try to use the JSON
+ format to extract binary strings from the PDF file, but if you really
+ had to, it could be done. Note that qpdf's
+ :samp:`--show-object` option does not have this
+ limitation and will reveal the string as encoded in the original
+ file.
+
+.. _json.considerations:
+
+JSON: Special Considerations
+----------------------------
+
+For the most part, the built-in JSON help tells you everything you need
+to know about the JSON format, but there are a few non-obvious things to
+be aware of:
+
+- While qpdf guarantees that keys present in the help will be present
+ in the output, those fields may be null or empty if the information
+ is not known or absent in the file. Also, if you specify
+ :samp:`--json-keys`, the keys that are not listed
+ will be excluded entirely except for those that
+ :samp:`--json-help` says are always present.
+
+- In a few places, there are keys with names containing
+ ``pageposfrom1``. The values of these keys are null or an integer. If
+ an integer, they point to a page index within the file numbering from
+ 1. Note that JSON indexes from 0, and you would also use 0-based
+ indexing using the API. However, 1-based indexing is easier in this
+ case because the command-line syntax for specifying page ranges is
+ 1-based. If you were going to write a program that looked through the
+ JSON for information about specific pages and then use the
+ command-line to extract those pages, 1-based indexing is easier.
+ Besides, it's more convenient to subtract 1 from a program in a real
+ programming language than it is to add 1 from shell code.
+
+- The image information included in the ``page`` section of the JSON
+ output includes the key "``filterable``". Note that the value of this
+ field may depend on the :samp:`--decode-level` that
+ you invoke qpdf with. The JSON output includes a top-level key
+ "``parameters``" that indicates the decode level used for computing
+ whether a stream was filterable. For example, jpeg images will be
+ shown as not filterable by default, but they will be shown as
+ filterable if you run :command:`qpdf --json
+ --decode-level=all`.
diff --git a/manual/library.rst b/manual/library.rst
new file mode 100644
index 00000000..faaffa21
--- /dev/null
+++ b/manual/library.rst
@@ -0,0 +1,91 @@
+.. _ref.using-library:
+
+Using the QPDF Library
+======================
+
+.. _ref.using.from-cxx:
+
+Using QPDF from C++
+-------------------
+
+The source tree for the qpdf package has an
+:file:`examples` directory that contains a few
+example programs. The :file:`qpdf/qpdf.cc` source
+file also serves as a useful example since it exercises almost all of
+the qpdf library's public interface. The best source of documentation on
+the library itself is reading comments in
+:file:`include/qpdf/QPDF.hh`,
+:file:`include/qpdf/QPDFWriter.hh`, and
+:file:`include/qpdf/QPDFObjectHandle.hh`.
+
+All header files are installed in the
+:file:`include/qpdf` directory. It is recommend that
+you use ``#include <qpdf/QPDF.hh>`` rather than adding
+:file:`include/qpdf` to your include path.
+
+When linking against the qpdf static library, you may also need to
+specify ``-lz -ljpeg`` on your link command. If your system understands
+how to read libtool :file:`.la` files, this may not
+be necessary.
+
+The qpdf library is safe to use in a multithreaded program, but no
+individual ``QPDF`` object instance (including ``QPDF``,
+``QPDFObjectHandle``, or ``QPDFWriter``) can be used in more than one
+thread at a time. Multiple threads may simultaneously work with
+different instances of these and all other QPDF objects.
+
+.. _ref.using.other-languages:
+
+Using QPDF from other languages
+-------------------------------
+
+The qpdf library is implemented in C++, which makes it hard to use
+directly in other languages. There are a few things that can help.
+
+"C"
+ The qpdf library includes a "C" language interface that provides a
+ subset of the overall capabilities. The header file
+ :file:`qpdf/qpdf-c.h` includes information about
+ its use. As long as you use a C++ linker, you can link C programs
+ with qpdf and use the C API. For languages that can directly load
+ methods from a shared library, the C API can also be useful. People
+ have reported success using the C API from other languages on Windows
+ by directly calling functions in the DLL.
+
+Python
+ A Python module called
+ `pikepdf <https://pypi.org/project/pikepdf/>`__ provides a clean and
+ highly functional set of Python bindings to the qpdf library. Using
+ pikepdf, you can work with PDF files in a natural way and combine
+ qpdf's capabilities with other functionality provided by Python's
+ rich standard library and available modules.
+
+Other Languages
+ Starting with version 8.3.0, the :command:`qpdf`
+ command-line tool can produce a JSON representation of the PDF file's
+ non-content data. This can facilitate interacting programmatically
+ with PDF files through qpdf's command line interface. For more
+ information, please see :ref:`ref.json`.
+
+.. _ref.unicode-files:
+
+A Note About Unicode File Names
+-------------------------------
+
+When strings are passed to qpdf library routines either as ``char*`` or
+as ``std::string``, they are treated as byte arrays except where
+otherwise noted. When Unicode is desired, qpdf wants UTF-8 unless
+otherwise noted in comments in header files. In modern UNIX/Linux
+environments, this generally does the right thing. In Windows, it's a
+bit more complicated. Starting in qpdf 8.4.0, passwords that contain
+Unicode characters are handled much better, and starting in qpdf 8.4.1,
+the library attempts to properly handle Unicode characters in filenames.
+In particular, in Windows, if a UTF-8 encoded string is used as a
+filename in either ``QPDF`` or ``QPDFWriter``, it is internally
+converted to ``wchar_t*``, and Unicode-aware Windows APIs are used. As
+such, qpdf will generally operate properly on files with non-ASCII
+characters in their names as long as the filenames are UTF-8 encoded for
+passing into the qpdf library API, but there are still some rough edges,
+such as the encoding of the filenames in error messages our CLI output
+messages. Patches or bug reports are welcome for any continuing issues
+with Unicode file names in Windows.
diff --git a/manual/license.rst b/manual/license.rst
new file mode 100644
index 00000000..691aef13
--- /dev/null
+++ b/manual/license.rst
@@ -0,0 +1,12 @@
+.. _ref.license:
+
+License
+=======
+
+QPDF is licensed under `the Apache License, Version 2.0
+<http://www.apache.org/licenses/LICENSE-2.0>`__ (the "License").
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+implied. See the License for the specific language governing
+permissions and limitations under the License.
diff --git a/manual/linearization.rst b/manual/linearization.rst
new file mode 100644
index 00000000..abac843a
--- /dev/null
+++ b/manual/linearization.rst
@@ -0,0 +1,197 @@
+.. _ref.linearization:
+
+Linearization
+=============
+
+This chapter describes how ``QPDF`` and ``QPDFWriter`` implement
+creation and processing of linearized PDFS.
+
+.. _ref.linearization-strategy:
+
+Basic Strategy for Linearization
+--------------------------------
+
+To avoid the incestuous problem of having the qpdf library validate its
+own linearized files, we have a special linearized file checking mode
+which can be invoked via :command:`qpdf
+--check-linearization` (or :command:`qpdf
+--check`). This mode reads the linearization parameter
+dictionary and the hint streams and validates that object ordering,
+parameters, and hint stream contents are correct. The validation code
+was first tested against linearized files created by external tools
+(Acrobat and pdlin) and then used to validate files created by
+``QPDFWriter`` itself.
+
+.. _ref.linearized.preparation:
+
+Preparing For Linearization
+---------------------------
+
+Before creating a linearized PDF file from any other PDF file, the PDF
+file must be altered such that all page attributes are propagated down
+to the page level (and not inherited from parents in the ``/Pages``
+tree). We also have to know which objects refer to which other objects,
+being concerned with page boundaries and a few other cases. We refer to
+this part of preparing the PDF file as
+*optimization*, discussed in
+:ref:`ref.optimization`. Note the, in this context, the
+term *optimization* is a qpdf term, and the
+term *linearization* is a term from the PDF
+specification. Do not be confused by the fact that many applications
+refer to linearization as optimization or web optimization.
+
+When creating linearized PDF files from optimized PDF files, there are
+really only a few issues that need to be dealt with:
+
+- Creation of hints tables
+
+- Placing objects in the correct order
+
+- Filling in offsets and byte sizes
+
+.. _ref.optimization:
+
+Optimization
+------------
+
+In order to perform various operations such as linearization and
+splitting files into pages, it is necessary to know which objects are
+referenced by which pages, page thumbnails, and root and trailer
+dictionary keys. It is also necessary to ensure that all page-level
+attributes appear directly at the page level and are not inherited from
+parents in the pages tree.
+
+We refer to the process of enforcing these constraints as
+*optimization*. As mentioned above, note
+that some applications refer to linearization as optimization. Although
+this optimization was initially motivated by the need to create
+linearized files, we are using these terms separately.
+
+PDF file optimization is implemented in the
+:file:`QPDF_optimization.cc` source file. That file
+is richly commented and serves as the primary reference for the
+optimization process.
+
+After optimization has been completed, the private member variables
+``obj_user_to_objects`` and ``object_to_obj_users`` in ``QPDF`` have
+been populated. Any object that has more than one value in the
+``object_to_obj_users`` table is shared. Any object that has exactly one
+value in the ``object_to_obj_users`` table is private. To find all the
+private objects in a page or a trailer or root dictionary key, one
+merely has make this determination for each element in the
+``obj_user_to_objects`` table for the given page or key.
+
+Note that pages and thumbnails have different object user types, so the
+above test on a page will not include objects referenced by the page's
+thumbnail dictionary and nothing else.
+
+.. _ref.linearization.writing:
+
+Writing Linearized Files
+------------------------
+
+We will create files with only primary hint streams. We will never write
+overflow hint streams. (As of PDF version 1.4, Acrobat doesn't either,
+and they are never necessary.) The hint streams contain offset
+information to objects that point to where they would be if the hint
+stream were not present. This means that we have to calculate all object
+positions before we can generate and write the hint table. This means
+that we have to generate the file in two passes. To make this reliable,
+``QPDFWriter`` in linearization mode invokes exactly the same code twice
+to write the file to a pipeline.
+
+In the first pass, the target pipeline is a count pipeline chained to a
+discard pipeline. The count pipeline simply passes its data through to
+the next pipeline in the chain but can return the number of bytes passed
+through it at any intermediate point. The discard pipeline is an end of
+line pipeline that just throws its data away. The hint stream is not
+written and dummy values with adequate padding are stored in the first
+cross reference table, linearization parameter dictionary, and /Prev key
+of the first trailer dictionary. All the offset, length, object
+renumbering information, and anything else we need for the second pass
+is stored.
+
+At the end of the first pass, this information is passed to the ``QPDF``
+class which constructs a compressed hint stream in a memory buffer and
+returns it. ``QPDFWriter`` uses this information to write a complete
+hint stream object into a memory buffer. At this point, the length of
+the hint stream is known.
+
+In the second pass, the end of the pipeline chain is a regular file
+instead of a discard pipeline, and we have known values for all the
+offsets and lengths that we didn't have in the first pass. We have to
+adjust offsets that appear after the start of the hint stream by the
+length of the hint stream, which is known. Anything that is of variable
+length is padded, with the padding code surrounding any writing code
+that differs in the two passes. This ensures that changes to the way
+things are represented never results in offsets that were gathered
+during the first pass becoming incorrect for the second pass.
+
+Using this strategy, we can write linearized files to a non-seekable
+output stream with only a single pass to disk or wherever the output is
+going.
+
+.. _ref.linearization-data:
+
+Calculating Linearization Data
+------------------------------
+
+Once a file is optimized, we have information about which objects access
+which other objects. We can then process these tables to decide which
+part (as described in "Linearized PDF Document Structure" in the PDF
+specification) each object is contained within. This tells us the exact
+order in which objects are written. The ``QPDFWriter`` class asks for
+this information and enqueues objects for writing in the proper order.
+It also turns on a check that causes an exception to be thrown if an
+object is encountered that has not already been queued. (This could
+happen only if there were a bug in the traversal code used to calculate
+the linearization data.)
+
+.. _ref.linearization-issues:
+
+Known Issues with Linearization
+-------------------------------
+
+There are a handful of known issues with this linearization code. These
+issues do not appear to impact the behavior of linearized files which
+still work as intended: it is possible for a web browser to begin to
+display them before they are fully downloaded. In fact, it seems that
+various other programs that create linearized files have many of these
+same issues. These items make reference to terminology used in the
+linearization appendix of the PDF specification.
+
+- Thread Dictionary information keys appear in part 4 with the rest of
+ Threads instead of in part 9. Objects in part 9 are not grouped
+ together functionally.
+
+- We are not calculating numerators for shared object positions within
+ content streams or interleaving them within content streams.
+
+- We generate only page offset, shared object, and outline hint tables.
+ It would be relatively easy to add some additional tables. We gather
+ most of the information needed to create thumbnail hint tables. There
+ are comments in the code about this.
+
+.. _ref.linearization-debugging:
+
+Debugging Note
+--------------
+
+The :command:`qpdf --show-linearization` command can show
+the complete contents of linearization hint streams. To look at the raw
+data, you can extract the filtered contents of the linearization hint
+tables using :command:`qpdf --show-object=n
+--filtered-stream-data`. Then, to convert this into a bit
+stream (since linearization tables are bit streams written without
+regard to byte boundaries), you can pipe the resulting data through the
+following perl code:
+
+.. code-block:: perl
+
+ use bytes;
+ binmode STDIN;
+ undef $/;
+ my $a = <STDIN>;
+ my @ch = split(//, $a);
+ map { printf("%08b", ord($_)) } @ch;
+ print "\n";
diff --git a/manual/object-streams.rst b/manual/object-streams.rst
new file mode 100644
index 00000000..6c2b3fc8
--- /dev/null
+++ b/manual/object-streams.rst
@@ -0,0 +1,186 @@
+.. _ref.object-and-xref-streams:
+
+Object and Cross-Reference Streams
+==================================
+
+This chapter provides information about the implementation of object
+stream and cross-reference stream support in qpdf.
+
+.. _ref.object-streams:
+
+Object Streams
+--------------
+
+Object streams can contain any regular object except the following:
+
+- stream objects
+
+- objects with generation > 0
+
+- the encryption dictionary
+
+- objects containing the /Length of another stream
+
+In addition, Adobe reader (at least as of version 8.0.0) appears to not
+be able to handle having the document catalog appear in an object stream
+if the file is encrypted, though this is not specifically disallowed by
+the specification.
+
+There are additional restrictions for linearized files. See
+:ref:`ref.object-streams-linearization` for details.
+
+The PDF specification refers to objects in object streams as "compressed
+objects" regardless of whether the object stream is compressed.
+
+The generation number of every object in an object stream must be zero.
+It is possible to delete and replace an object in an object stream with
+a regular object.
+
+The object stream dictionary has the following keys:
+
+- ``/N``: number of objects
+
+- ``/First``: byte offset of first object
+
+- ``/Extends``: indirect reference to stream that this extends
+
+Stream collections are formed with ``/Extends``. They must form a
+directed acyclic graph. These can be used for semantic information and
+are not meaningful to the PDF document's syntactic structure. Although
+qpdf preserves stream collections, it never generates them and doesn't
+make use of this information in any way.
+
+The specification recommends limiting the number of objects in object
+stream for efficiency in reading and decoding. Acrobat 6 uses no more
+than 100 objects per object stream for linearized files and no more 200
+objects per stream for non-linearized files. ``QPDFWriter``, in object
+stream generation mode, never puts more than 100 objects in an object
+stream.
+
+Object stream contents consists of *N* pairs of integers, each of which
+is the object number and the byte offset of the object relative to the
+first object in the stream, followed by the objects themselves,
+concatenated.
+
+.. _ref.xref-streams:
+
+Cross-Reference Streams
+-----------------------
+
+For non-hybrid files, the value following ``startxref`` is the byte
+offset to the xref stream rather than the word ``xref``.
+
+For hybrid files (files containing both xref tables and cross-reference
+streams), the xref table's trailer dictionary contains the key
+``/XRefStm`` whose value is the byte offset to a cross-reference stream
+that supplements the xref table. A PDF 1.5-compliant application should
+read the xref table first. Then it should replace any object that it has
+already seen with any defined in the xref stream. Then it should follow
+any ``/Prev`` pointer in the original xref table's trailer dictionary.
+The specification is not clear about what should be done, if anything,
+with a ``/Prev`` pointer in the xref stream referenced by an xref table.
+The ``QPDF`` class ignores it, which is probably reasonable since, if
+this case were to appear for any sensible PDF file, the previous xref
+table would probably have a corresponding ``/XRefStm`` pointer of its
+own. For example, if a hybrid file were appended, the appended section
+would have its own xref table and ``/XRefStm``. The appended xref table
+would point to the previous xref table which would point the
+``/XRefStm``, meaning that the new ``/XRefStm`` doesn't have to point to
+it.
+
+Since xref streams must be read very early, they may not be encrypted,
+and the may not contain indirect objects for keys required to read them,
+which are these:
+
+- ``/Type``: value ``/XRef``
+
+- ``/Size``: value *n+1*: where *n* is highest object number (same as
+ ``/Size`` in the trailer dictionary)
+
+- ``/Index`` (optional): value
+ ``[:samp:`{n count}` ...]`` used to determine
+ which objects' information is stored in this stream. The default is
+ ``[0 /Size]``.
+
+- ``/Prev``: value :samp:`{offset}`: byte
+ offset of previous xref stream (same as ``/Prev`` in the trailer
+ dictionary)
+
+- ``/W [...]``: sizes of each field in the xref table
+
+The other fields in the xref stream, which may be indirect if desired,
+are the union of those from the xref table's trailer dictionary.
+
+.. _ref.xref-stream-data:
+
+Cross-Reference Stream Data
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The stream data is binary and encoded in big-endian byte order. Entries
+are concatenated, and each entry has a length equal to the total of the
+entries in ``/W`` above. Each entry consists of one or more fields, the
+first of which is the type of the field. The number of bytes for each
+field is given by ``/W`` above. A 0 in ``/W`` indicates that the field
+is omitted and has the default value. The default value for the field
+type is "``1``". All other default values are "``0``".
+
+PDF 1.5 has three field types:
+
+- 0: for free objects. Format: ``0 obj next-generation``, same as the
+ free table in a traditional cross-reference table
+
+- 1: regular non-compressed object. Format: ``1 offset generation``
+
+- 2: for objects in object streams. Format: ``2 object-stream-number
+ index``, the number of object stream containing the object and the
+ index within the object stream of the object.
+
+It seems standard to have the first entry in the table be ``0 0 0``
+instead of ``0 0 ffff`` if there are no deleted objects.
+
+.. _ref.object-streams-linearization:
+
+Implications for Linearized Files
+---------------------------------
+
+For linearized files, the linearization dictionary, document catalog,
+and page objects may not be contained in object streams.
+
+Objects stored within object streams are given the highest range of
+object numbers within the main and first-page cross-reference sections.
+
+It is okay to use cross-reference streams in place of regular xref
+tables. There are on special considerations.
+
+Hint data refers to object streams themselves, not the objects in the
+streams. Shared object references should also be made to the object
+streams. There are no reference in any hint tables to the object numbers
+of compressed objects (objects within object streams).
+
+When numbering objects, all shared objects within both the first and
+second halves of the linearized files must be numbered consecutively
+after all normal uncompressed objects in that half.
+
+.. _ref.object-stream-implementation:
+
+Implementation Notes
+--------------------
+
+There are three modes for writing object streams:
+:samp:`disable`, :samp:`preserve`, and
+:samp:`generate`. In disable mode, we do not generate
+any object streams, and we also generate an xref table rather than xref
+streams. This can be used to generate PDF files that are viewable with
+older readers. In preserve mode, we write object streams such that
+written object streams contain the same objects and ``/Extends``
+relationships as in the original file. This is equal to disable if the
+file has no object streams. In generate, we create object streams
+ourselves by grouping objects that are allowed in object streams
+together in sets of no more than 100 objects. We also ensure that the
+PDF version is at least 1.5 in generate mode, but we preserve the
+version header in the other modes. The default is
+:samp:`preserve`.
+
+We do not support creation of hybrid files. When we write files, even in
+preserve mode, we will lose any xref tables and merge any appended
+sections.
diff --git a/manual/overview.rst b/manual/overview.rst
new file mode 100644
index 00000000..82c7057b
--- /dev/null
+++ b/manual/overview.rst
@@ -0,0 +1,33 @@
+.. _ref.overview:
+
+What is QPDF?
+=============
+
+QPDF is a program and C++ library for structural, content-preserving
+transformations on PDF files. QPDF's website is located at
+https://qpdf.sourceforge.io/. QPDF's source code is hosted on github
+at https://github.com/qpdf/qpdf.
+
+QPDF provides many useful capabilities to developers of PDF-producing
+software or for people who just want to look at the innards of a PDF
+file to learn more about how they work. With QPDF, it is possible to
+copy objects from one PDF file into another and to manipulate the list
+of pages in a PDF file. This makes it possible to merge and split PDF
+files. The QPDF library also makes it possible for you to create PDF
+files from scratch. In this mode, you are responsible for supplying
+all the contents of the file, while the QPDF library takes care of all
+the syntactical representation of the objects, creation of cross
+references tables and, if you use them, object streams, encryption,
+linearization, and other syntactic details. You are still responsible
+for generating PDF content on your own.
+
+QPDF has been designed with very few external dependencies, and it is
+intentionally very lightweight. QPDF is *not* a PDF content creation
+library, a PDF viewer, or a program capable of converting PDF into other
+formats. In particular, QPDF knows nothing about the semantics of PDF
+content streams. If you are looking for something that can do that, you
+should look elsewhere. However, once you have a valid PDF file, QPDF can
+be used to transform that file in ways that perhaps your original PDF
+creation tool can't handle. For example, many programs generate simple PDF
+files but can't password-protect them, web-optimize them, or perform
+other transformations of that type.
diff --git a/manual/qdf.rst b/manual/qdf.rst
new file mode 100644
index 00000000..b7ee7813
--- /dev/null
+++ b/manual/qdf.rst
@@ -0,0 +1,96 @@
+.. _ref.qdf:
+
+QDF Mode
+========
+
+In QDF mode, qpdf creates PDF files in what we call *QDF
+form*. A PDF file in QDF form, sometimes called a QDF
+file, is a completely valid PDF file that has ``%QDF-1.0`` as its third
+line (after the pdf header and binary characters) and has certain other
+characteristics. The purpose of QDF form is to make it possible to edit
+PDF files, with some restrictions, in an ordinary text editor. This can
+be very useful for experimenting with different PDF constructs or for
+making one-off edits to PDF files (though there are other reasons why
+this may not always work). Note that QDF mode does not support
+linearized files. If you enable linearization, QDF mode is automatically
+disabled.
+
+It is ordinarily very difficult to edit PDF files in a text editor for
+two reasons: most meaningful data in PDF files is compressed, and PDF
+files are full of offset and length information that makes it hard to
+add or remove data. A QDF file is organized in a manner such that, if
+edits are kept within certain constraints, the
+:command:`fix-qdf` program, distributed with qpdf, is
+able to restore edited files to a correct state. The
+:command:`fix-qdf` program takes no command-line
+arguments. It reads a possibly edited QDF file from standard input and
+writes a repaired file to standard output.
+
+The following attributes characterize a QDF file:
+
+- All objects appear in numerical order in the PDF file, including when
+ objects appear in object streams.
+
+- Objects are printed in an easy-to-read format, and all line endings
+ are normalized to UNIX line endings.
+
+- Unless specifically overridden, streams appear uncompressed (when
+ qpdf supports the filters and they are compressed with a non-lossy
+ compression scheme), and most content streams are normalized (line
+ endings are converted to just a UNIX-style linefeeds).
+
+- All streams lengths are represented as indirect objects, and the
+ stream length object is always the next object after the stream. If
+ the stream data does not end with a newline, an extra newline is
+ inserted, and a special comment appears after the stream indicating
+ that this has been done.
+
+- If the PDF file contains object streams, if object stream *n*
+ contains *k* objects, those objects are numbered from *n+1* through
+ *n+k*, and the object number/offset pairs appear on a separate line
+ for each object. Additionally, each object in the object stream is
+ preceded by a comment indicating its object number and index. This
+ makes it very easy to find objects in object streams.
+
+- All beginnings of objects, ``stream`` tokens, ``endstream`` tokens,
+ and ``endobj`` tokens appear on lines by themselves. A blank line
+ follows every ``endobj`` token.
+
+- If there is a cross-reference stream, it is unfiltered.
+
+- Page dictionaries and page content streams are marked with special
+ comments that make them easy to find.
+
+- Comments precede each object indicating the object number of the
+ corresponding object in the original file.
+
+When editing a QDF file, any edits can be made as long as the above
+constraints are maintained. This means that you can freely edit a page's
+content without worrying about messing up the QDF file. It is also
+possible to add new objects so long as those objects are added after the
+last object in the file or subsequent objects are renumbered. If a QDF
+file has object streams in it, you can always add the new objects before
+the xref stream and then change the number of the xref stream, since
+nothing generally ever references it by number.
+
+It is not generally practical to remove objects from QDF files without
+messing up object numbering, but if you remove all references to an
+object, you can run qpdf on the file (after running
+:command:`fix-qdf`), and qpdf will omit the now-orphaned
+object.
+
+When :command:`fix-qdf` is run, it goes through the file
+and recomputes the following parts of the file:
+
+- the ``/N``, ``/W``, and ``/First`` keys of all object stream
+ dictionaries
+
+- the pairs of numbers representing object numbers and offsets of
+ objects in object streams
+
+- all stream lengths
+
+- the cross-reference table or cross-reference stream
+
+- the offset to the cross-reference table or cross-reference stream
+ following the ``startxref`` token
diff --git a/manual/release-notes.rst b/manual/release-notes.rst
new file mode 100644
index 00000000..5a8fd307
--- /dev/null
+++ b/manual/release-notes.rst
@@ -0,0 +1,2643 @@
+.. _ref.release-notes:
+
+Release Notes
+=============
+
+For a detailed list of changes, please see the file
+:file:`ChangeLog` in the source distribution.
+
+10.5.0: XXX Month dd, YYYY
+ - Library Enhancements
+
+ - Since qpdf version 8, using object accessor methods on an
+ instance of ``QPDFObjectHandle`` may create warnings if the
+ object is not of the expected type. These warnings now have an
+ error code of ``qpdf_e_object`` instead of
+ ``qpdf_e_damaged_pdf``. Also, comments have been added to
+ :file:`QPDFObjectHandle.hh` to explain in more detail what the
+ behavior is. See :ref:`ref.object-accessors` for a more in-depth
+ discussion.
+
+ - Add ``Pl_Buffer::getMallocBuffer()`` to initialize a buffer
+ allocated with ``malloc()`` for better cross-language
+ interoperability.
+
+ - C API Enhancements
+
+ - Overhaul error handling for the object handle functions C API.
+ Some rare error conditions that would previously have caused a
+ crash are now trapped and reported, and the functions that
+ generate them return fallback values. See comments in the
+ ``ERROR HANDLING`` section of :file:`include/qpdf/qpdf-c.h` for
+ details. In particular, exceptions thrown by the underlying C++
+ code when calling object accessors are caught and converted into
+ errors. The errors can be checked by call ``qpdf_has_error``.
+ Use ``qpdf_silence_errors`` to prevent the error from being
+ written to stderr.
+
+ - Add ``qpdf_get_last_string_length`` to the C API to get the
+ length of the last string that was returned. This is needed to
+ handle strings that contain embedded null characters.
+
+ - Add ``qpdf_oh_is_initialized`` and
+ ``qpdf_oh_new_uninitialized`` to the C API to make it possible
+ to work with uninitialized objects.
+
+ - Add ``qpdf_oh_new_object`` to the C API. This allows you to
+ clone an object handle.
+
+ - Add ``qpdf_get_object_by_id``, ``qpdf_make_indirect_object``,
+ and ``qpdf_replace_object``, exposing the corresponding methods
+ in ``QPDF`` and ``QPDFObjectHandle``.
+
+ - Add several functions for working with pages. See ``PAGE
+ FUNCTIONS`` in ``include/qpdf/qpdf-c.h`` for details.
+
+ - Add several functions for working with streams. See ``STREAM
+ FUNCTIONS`` in ``include/qpdf/qpdf-c.h`` for details.
+
+ - Add ``qpdf_oh_get_type_code`` and ``qpdf_oh_get_type_name``.
+
+ - Documentation change
+
+ - The documentation sources have been switched from docbook to
+ reStructuredText processed with `Sphinx
+ <https://sphinx-doc.org>`__. This is mostly transparent (other
+ than format change) with the exception that all section links
+ have changed. What used to be `#ref.something` is now
+ `#something`. A top-to-bottom review of the documentation is
+ planned for an upcoming release.
+
+10.4.0: November 16, 2021
+ - Handling of Weak Cryptography Algorithms
+
+ - From the qpdf CLI, the
+ :samp:`--allow-weak-crypto` is now required to
+ suppress a warning when explicitly creating PDF files using RC4
+ encryption. While qpdf will always retain the ability to read
+ and write such files, doing so will require explicit
+ acknowledgment moving forward. For qpdf 10.4, this change only
+ affects the command-line tool. Starting in qpdf 11, there will
+ be small API changes to require explicit acknowledgment in
+ those cases as well. For additional information, see :ref:`ref.weak-crypto`.
+
+ - Bug Fixes
+
+ - Fix potential bounds error when handling shell completion that
+ could occur when given bogus input.
+
+ - Properly handle overlay/underlay on completely empty pages
+ (with no resource dictionary).
+
+ - Fix crash that could occur under certain conditions when using
+ :samp:`--pages` with files that had form
+ fields.
+
+ - Library Enhancements
+
+ - Make ``QPDF::findPage`` functions public.
+
+ - Add methods to ``Pl_Flate`` to be able to receive warnings on
+ certain recoverable conditions.
+
+ - Add an extra check to the library to detect when foreign
+ objects are inserted directly (instead of using
+ ``QPDF::copyForeignObject``) at the time of insertion rather
+ than when the file is written. Catching the error sooner makes
+ it much easier to locate the incorrect code.
+
+ - CLI Enhancements
+
+ - Improve diagnostics around parsing
+ :samp:`--pages` command-line options
+
+ - Packaging Changes
+
+ - The Windows binary distribution is now built with crypto
+ provided by OpenSSL 3.0.
+
+10.3.2: May 8, 2021
+ - Bug Fixes
+
+ - When generating a file while preserving object streams,
+ unreferenced objects are correctly removed unless
+ :samp:`--preserve-unreferenced` is specified.
+
+ - Library Enhancements
+
+ - When adding a page that already exists, make a shallow copy
+ instead of throwing an exception. This makes the library
+ behavior consistent with the CLI behavior. See
+ :file:`ChangeLog` for additional notes.
+
+10.3.1: March 11, 2021
+ - Bug Fixes
+
+ - Form field copying failed on files where /DR was a direct
+ object in the document-level form dictionary.
+
+10.3.0: March 4, 2021
+ - Bug Fixes
+
+ - The code for handling form fields when copying pages from
+ 10.2.0 was not quite right and didn't work in a number of
+ situations, such as when the same page was copied multiple
+ times or when there were conflicting resource or field names
+ across multiple copies. The 10.3.0 code has been much more
+ thoroughly tested with more complex cases and with a multitude
+ of readers and should be much closer to correct. The 10.2.0
+ code worked well enough for page splitting or for copying pages
+ with form fields into documents that didn't already have them
+ but was still not quite correct in handling of field-level
+ resources.
+
+ - When ``QPDF::replaceObject`` or ``QPDF::swapObjects`` is
+ called, existing ``QPDFObjectHandle`` instances no longer point
+ to the old objects. The next time they are accessed, they
+ automatically notice the change to the underlying object and
+ update themselves. This resolves a very longstanding source of
+ confusion, albeit in a very rarely used method call.
+
+ - Fix form field handling code to look for default appearances,
+ quadding, and default resources in the right places. The code
+ was not looking for things in the document-level interactive
+ form dictionary that it was supposed to be finding there. This
+ required adding a few new methods to
+ ``QPDFFormFieldObjectHelper``.
+
+ - Library Enhancements
+
+ - Reworked the code that handles copying annotations and form
+ fields during page operations. There were additional methods
+ added to the public API from 10.2.0 and a one deprecation of a
+ method added in 10.2.0. The majority of the API changes are in
+ methods most people would never call and that will hopefully be
+ superseded by higher-level interfaces for handling page copies.
+ Please see the :file:`ChangeLog` file for
+ details.
+
+ - The method ``QPDF::numWarnings`` was added so that you can tell
+ whether any warnings happened during a specific block of code.
+
+10.2.0: February 23, 2021
+ - CLI Behavior Changes
+
+ - Operations that work on combining pages are much better about
+ protecting form fields. In particular,
+ :samp:`--split-pages` and
+ :samp:`--pages` now preserve interaction form
+ functionality by copying the relevant form field information
+ from the original files. Additionally, if you use
+ :samp:`--pages` to select only some pages from
+ the original input file, unused form fields are removed, which
+ prevents lots of unused annotations from being retained.
+
+ - By default, :command:`qpdf` no longer allows
+ creation of encrypted PDF files whose user password is
+ non-empty and owner password is empty when a 256-bit key is in
+ use. The :samp:`--allow-insecure` option,
+ specified inside the :samp:`--encrypt` options,
+ allows creation of such files. Behavior changes in the CLI are
+ avoided when possible, but an exception was made here because
+ this is security-related. qpdf must always allow creation of
+ weird files for testing purposes, but it should not default to
+ letting users unknowingly create insecure files.
+
+ - Library Behavior Changes
+
+ - Note: the changes in this section cause differences in output
+ in some cases. These differences change the syntax of the PDF
+ but do not change the semantics (meaning). I make a strong
+ effort to avoid gratuitous changes in qpdf's output so that
+ qpdf changes don't break people's tests. In this case, the
+ changes significantly improve the readability of the generated
+ PDF and don't affect any output that's generated by simple
+ transformation. If you are annoyed by having to update test
+ files, please rest assured that changes like this have been and
+ will continue to be rare events.
+
+ - ``QPDFObjectHandle::newUnicodeString`` now uses whichever of
+ ASCII, PDFDocEncoding, of UTF-16 is sufficient to encode all
+ the characters in the string. This reduces needless encoding in
+ UTF-16 of strings that can be encoded in ASCII. This change may
+ cause qpdf to generate different output than before when form
+ field values are set using ``QPDFFormFieldObjectHelper`` but
+ does not change the meaning of the output.
+
+ - The code that places form XObjects and also the code that
+ flattens rotations trim trailing zeroes from real numbers that
+ they calculate. This causes slight (but semantically
+ equivalent) differences in generated appearance streams and
+ form XObject invocations in overlay/underlay code or in user
+ code that calls the methods that place form XObjects on a page.
+
+ - CLI Enhancements
+
+ - Add new command line options for listing, saving, adding,
+ removing, and and copying file attachments. See :ref:`ref.attachments` for details.
+
+ - Page splitting and merging operations, as well as
+ :samp:`--flatten-rotation`, are better behaved
+ with respect to annotations and interactive form fields. In
+ most cases, interactive form field functionality and proper
+ formatting and functionality of annotations is preserved by
+ these operations. There are still some cases that aren't
+ perfect, such as when functionality of annotations depends on
+ document-level data that qpdf doesn't yet understand or when
+ there are problems with referential integrity among form fields
+ and annotations (e.g., when a single form field object or its
+ associated annotations are shared across multiple pages, a case
+ that is out of spec but that works in most viewers anyway).
+
+ - The option
+ :samp:`--password-file={filename}`
+ can now be used to read the decryption password from a file.
+ You can use ``-`` as the file name to read the password from
+ standard input. This is an easier/more obvious way to read
+ passwords from files or standard input than using
+ :samp:`@file` for this purpose.
+
+ - Add some information about attachments to the json output, and
+ added ``attachments`` as an additional json key. The
+ information included here is limited to the preferred name and
+ content stream and a reference to the file spec object. This is
+ enough detail for clients to avoid the hassle of navigating a
+ name tree and provides what is needed for basic enumeration and
+ extraction of attachments. More detailed information can be
+ obtained by following the reference to the file spec object.
+
+ - Add numeric option to :samp:`--collate`. If
+ :samp:`--collate={n}`
+ is given, take pages in groups of
+ :samp:`{n}` from the given files.
+
+ - It is now valid to provide :samp:`--rotate=0`
+ to clear rotation from a page.
+
+ - Library Enhancements
+
+ - This release includes numerous additions to the API. Not all
+ changes are listed here. Please see the
+ :file:`ChangeLog` file in the source
+ distribution for a comprehensive list. Highlights appear below.
+
+ - Add ``QPDFObjectHandle::ditems()`` and
+ ``QPDFObjectHandle::aitems()`` that enable C++-style iteration,
+ including range-for iteration, over dictionary and array
+ QPDFObjectHandles. See comments in
+ :file:`include/qpdf/QPDFObjectHandle.hh`
+ and
+ :file:`examples/pdf-name-number-tree.cc`
+ for details.
+
+ - Add ``QPDFObjectHandle::copyStream`` for making a copy of a
+ stream within the same ``QPDF`` instance.
+
+ - Add new helper classes for supporting file attachments, also
+ known as embedded files. New classes are
+ ``QPDFEmbeddedFileDocumentHelper``,
+ ``QPDFFileSpecObjectHelper``, and ``QPDFEFStreamObjectHelper``.
+ See their respective headers for details and
+ :file:`examples/pdf-attach-file.cc` for an
+ example.
+
+ - Add a version of ``QPDFObjectHandle::parse`` that takes a
+ ``QPDF`` pointer as context so that it can parse strings
+ containing indirect object references. This is illustrated in
+ :file:`examples/pdf-attach-file.cc`.
+
+ - Re-implement ``QPDFNameTreeObjectHelper`` and
+ ``QPDFNumberTreeObjectHelper`` to be more efficient, add an
+ iterator-based API, give them the capability to repair broken
+ trees, and create methods for modifying the trees. With this
+ change, qpdf has a robust read/write implementation of name and
+ number trees.
+
+ - Add new versions of ``QPDFObjectHandle::replaceStreamData``
+ that take ``std::function`` objects for cases when you need
+ something between a static string and a full-fledged
+ StreamDataProvider. Using this with ``QUtil::file_provider`` is
+ a very easy way to create a stream from the contents of a file.
+
+ - The ``QPDFMatrix`` class, formerly a private, internal class,
+ has been added to the public API. See
+ :file:`include/qpdf/QPDFMatrix.hh` for
+ details. This class is for working with transformation
+ matrices. Some methods in ``QPDFPageObjectHelper`` make use of
+ this to make information about transformation matrices
+ available. For an example, see
+ :file:`examples/pdf-overlay-page.cc`.
+
+ - Several new methods were added to
+ ``QPDFAcroFormDocumentHelper`` for adding, removing, getting
+ information about, and enumerating form fields.
+
+ - Add method
+ ``QPDFAcroFormDocumentHelper::transformAnnotations``, which
+ applies a transformation to each annotation on a page.
+
+ - Add ``QPDFPageObjectHelper::copyAnnotations``, which copies
+ annotations and, if applicable, associated form fields, from
+ one page to another, possibly transforming the rectangles.
+
+ - Build Changes
+
+ - A C++-14 compiler is now required to build qpdf. There is no
+ intention to require anything newer than that for a while.
+ C++-14 includes modest enhancements to C++-11 and appears to be
+ supported about as widely as C++-11.
+
+ - Bug Fixes
+
+ - The :samp:`--flatten-rotation` option applies
+ transformations to any annotations that may be on the page.
+
+ - If a form XObject lacks a resources dictionary, consider any
+ names in that form XObject to be referenced from the containing
+ page. This is compliant with older PDF versions. Also detect if
+ any form XObjects have any unresolved names and, if so, don't
+ remove unreferenced resources from them or from the page that
+ contains them. Unfortunately this has the side effect of
+ preventing removal of unreferenced resources in some cases
+ where names appear that don't refer to resources, such as with
+ tagged PDF. This is a bit of a corner case that is not likely
+ to cause a significant problem in practice, but the only side
+ effect would be lack of removal of shared resources. A future
+ version of qpdf may be more sophisticated in its detection of
+ names that refer to resources.
+
+ - Properly handle strings if they appear in inline image
+ dictionaries while externalizing inline images.
+
+10.1.0: January 5, 2021
+ - CLI Enhancements
+
+ - Add :samp:`--flatten-rotation` command-line
+ option, which causes all pages that are rotated using
+ parameters in the page's dictionary to instead be identically
+ rotated in the page's contents. The change is not user-visible
+ for compliant PDF readers but can be used to work around broken
+ PDF applications that don't properly handle page rotation.
+
+ - Library Enhancements
+
+ - Support for user-provided (pluggable, modular) stream filters.
+ It is now possible to derive a class from ``QPDFStreamFilter``
+ and register it with ``QPDF`` so that regular library methods,
+ including those used by ``QPDFWriter``, can decode streams with
+ filters not directly supported by the library. The example
+ :file:`examples/pdf-custom-filter.cc`
+ illustrates how to use this capability.
+
+ - Add methods to ``QPDFPageObjectHelper`` to iterate through
+ XObjects on a page or form XObjects, possibly recursing into
+ nested form XObjects: ``forEachXObject``, ``ForEachImage``,
+ ``forEachFormXObject``.
+
+ - Enhance several methods in ``QPDFPageObjectHelper`` to work
+ with form XObjects as well as pages, as noted in comments. See
+ :file:`ChangeLog` for a full list.
+
+ - Rename some functions in ``QPDFPageObjectHelper``, while
+ keeping old names for compatibility:
+
+ - ``getPageImages`` to ``getImages``
+
+ - ``filterPageContents`` to ``filterContents``
+
+ - ``pipePageContents`` to ``pipeContents``
+
+ - ``parsePageContents`` to ``parseContents``
+
+ - Add method ``QPDFPageObjectHelper::getFormXObjects`` to return
+ a map of form XObjects directly on a page or form XObject
+
+ - Add new helper methods to ``QPDFObjectHandle``:
+ ``isFormXObject``, ``isImage``
+
+ - Add the optional ``allow_streams`` parameter
+ ``QPDFObjectHandle::makeDirect``. When
+ ``QPDFObjectHandle::makeDirect`` is called in this way, it
+ preserves references to streams rather than throwing an
+ exception.
+
+ - Add ``QPDFObjectHandle::setFilterOnWrite`` method. Calling this
+ on a stream prevents ``QPDFWriter`` from attempting to
+ uncompress, recompress, or otherwise filter a stream even if it
+ could. Developers can use this to protect streams that are
+ optimized should be protected from ``QPDFWriter``'s default
+ behavior for any other reason.
+
+ - Add ``ostream`` ``<<`` operator for ``QPDFObjGen``. This is
+ useful to have for debugging.
+
+ - Add method ``QPDFPageObjectHelper::flattenRotation``, which
+ replaces a page's ``/Rotate`` keyword by rotating the page
+ within the content stream and altering the page's bounding
+ boxes so the rendering is the same. This can be used to work
+ around buggy PDF readers that can't properly handle page
+ rotation.
+
+ - C API Enhancements
+
+ - Add several new functions to the C API for working with
+ objects. These are wrappers around many of the methods in
+ ``QPDFObjectHandle``. Their inclusion adds considerable new
+ capability to the C API.
+
+ - Add ``qpdf_register_progress_reporter`` to the C API,
+ corresponding to ``QPDFWriter::registerProgressReporter``.
+
+ - Performance Enhancements
+
+ - Improve steps ``QPDFWriter`` takes to prepare a ``QPDF`` object
+ for writing, resulting in about an 8% improvement in write
+ performance while allowing indirect objects to appear in
+ ``/DecodeParms``.
+
+ - When extracting pages, the :command:`qpdf` CLI
+ only removes unreferenced resources from the pages that are
+ being kept, resulting in a significant performance improvement
+ when extracting small numbers of pages from large, complex
+ documents.
+
+ - Bug Fixes
+
+ - ``QPDFPageObjectHelper::externalizeInlineImages`` was not
+ externalizing images referenced from form XObjects that
+ appeared on the page.
+
+ - ``QPDFObjectHandle::filterPageContents`` was broken for pages
+ with multiple content streams.
+
+ - Tweak zsh completion code to behave a little better with
+ respect to path completion.
+
+10.0.4: November 21, 2020
+ - Bug Fixes
+
+ - Fix a handful of integer overflows. This includes cases found
+ by fuzzing as well as having qpdf not do range checking on
+ unused values in the xref stream.
+
+10.0.3: October 31, 2020
+ - Bug Fixes
+
+ - The fix to the bug involving copying streams with indirect
+ filters was incorrect and introduced a new, more serious bug.
+ The original bug has been fixed correctly, as has the bug
+ introduced in 10.0.2.
+
+10.0.2: October 27, 2020
+ - Bug Fixes
+
+ - When concatenating content streams, as with
+ :samp:`--coalesce-contents`, there were cases
+ in which qpdf would merge two lexical tokens together, creating
+ invalid results. A newline is now inserted between merged
+ content streams if one is not already present.
+
+ - Fix an internal error that could occur when copying foreign
+ streams whose stream data had been replaced using a stream data
+ provider if those streams had indirect filters or decode
+ parameters. This is a rare corner case.
+
+ - Ensure that the caller's locale settings do not change the
+ results of numeric conversions performed internally by the qpdf
+ library. Note that the problem here could only be caused when
+ the qpdf library was used programmatically. Using the qpdf CLI
+ already ignored the user's locale for numeric conversion.
+
+ - Fix several instances in which warnings were not suppressed in
+ spite of :samp:`--no-warn` and/or errors or
+ warnings were written to standard output rather than standard
+ error.
+
+ - Fixed a memory leak that could occur under specific
+ circumstances when
+ :samp:`--object-streams=generate` was used.
+
+ - Fix various integer overflows and similar conditions found by
+ the OSS-Fuzz project.
+
+ - Enhancements
+
+ - New option :samp:`--warning-exit-0` causes qpdf
+ to exit with a status of ``0`` rather than ``3`` if there are
+ warnings but no errors. Combine with
+ :samp:`--no-warn` to completely ignore
+ warnings.
+
+ - Performance improvements have been made to
+ ``QPDF::processMemoryFile``.
+
+ - The OpenSSL crypto provider produces more detailed error
+ messages.
+
+ - Build Changes
+
+ - The option :samp:`--disable-rpath` is now
+ supported by qpdf's :command:`./configure`
+ script. Some distributions' packaging standards recommended the
+ use of this option.
+
+ - Selection of a printf format string for ``long long`` has
+ been moved from ``ifdefs`` to an autoconf
+ test. If you are using your own build system, you will need to
+ provide a value for ``LL_FMT`` in
+ :file:`libqpdf/qpdf/qpdf-config.h`, which
+ would typically be ``"%lld"`` or, for some Windows compilers,
+ ``"%I64d"``.
+
+ - Several improvements were made to build-time configuration of
+ the OpenSSL crypto provider.
+
+ - A nearly stand-alone Linux binary zip file is now included with
+ the qpdf release. This is built on an older (but supported)
+ Ubuntu LTS release, but would work on most reasonably recent
+ Linux distributions. It contains only the executables and
+ required shared libraries that would not be present on a
+ minimal system. It can be used for including qpdf in a minimal
+ environment, such as a docker container. The zip file is also
+ known to work as a layer in AWS Lambda.
+
+ - QPDF's automated build has been migrated from Azure Pipelines
+ to GitHub Actions.
+
+ - Windows-specific Changes
+
+ - The Windows executables distributed with qpdf releases now use
+ the OpenSSL crypto provider by default. The native crypto
+ provider is also compiled in and can be selected at runtime
+ with the ``QPDF_CRYPTO_PROVIDER`` environment variable.
+
+ - Improvements have been made to how a cryptographic provider is
+ obtained in the native Windows crypto implementation. However
+ mostly this is shadowed by OpenSSL being used by default.
+
+10.0.1: April 9, 2020
+ - Bug Fixes
+
+ - 10.0.0 introduced a bug in which calling
+ ``QPDFObjectHandle::getStreamData`` on a stream that can't be
+ filtered was returning the raw data instead of throwing an
+ exception. This is now fixed.
+
+ - Fix a bug that was preventing qpdf from linking with some
+ versions of clang on some platforms.
+
+ - Enhancements
+
+ - Improve the :file:`pdf-invert-images`
+ example to avoid having to load all the images into RAM at the
+ same time.
+
+10.0.0: April 6, 2020
+ - Performance Enhancements
+
+ - The qpdf library and executable should run much faster in this
+ version than in the last several releases. Several internal
+ library optimizations have been made, and there has been
+ improved behavior on page splitting as well. This version of
+ qpdf should outperform any of the 8.x or 9.x versions.
+
+ - Incompatible API (source-level) Changes (minor)
+
+ - The ``QUtil::srandom`` method was removed. It didn't do
+ anything unless insecure random numbers were compiled in, and
+ they have been off by default for a long time. If you were
+ calling it, just remove the call since it wasn't doing anything
+ anyway.
+
+ - Build/Packaging Changes
+
+ - Add a ``openssl`` crypto provider, which is implemented with
+ OpenSSL and also works with BoringSSL. Thanks to Dean Scarff
+ for this contribution. If you maintain qpdf for a distribution,
+ pay special attention to make sure that you are including
+ support for the crypto providers you want. Package maintainers
+ will have to weigh the advantages of allowing users to pick a
+ crypto provider at runtime against the disadvantages of adding
+ more dependencies to qpdf.
+
+ - Allow qpdf to built on stripped down systems whose C/C++
+ libraries lack the ``wchar_t`` type. Search for ``wchar_t`` in
+ qpdf's README.md for details. This should be very rare, but it
+ is known to be helpful in some embedded environments.
+
+ - CLI Enhancements
+
+ - Add ``objectinfo`` key to the JSON output. This will be a place
+ to put computed metadata or other information about PDF objects
+ that are not immediately evident in other ways or that seem
+ useful for some other reason. In this version, information is
+ provided about each object indicating whether it is a stream
+ and, if so, what its length and filters are. Without this, it
+ was not possible to tell conclusively from the JSON output
+ alone whether or not an object was a stream. Run
+ :command:`qpdf --json-help` for details.
+
+ - Add new option
+ :samp:`--remove-unreferenced-resources` which
+ takes ``auto``, ``yes``, or ``no`` as arguments. The new
+ ``auto`` mode, which is the default, performs a fast heuristic
+ over a PDF file when splitting pages to determine whether the
+ expensive process of finding and removing unreferenced
+ resources is likely to be of benefit. For most files, this new
+ default will result in a significant performance improvement
+ for splitting pages. See :ref:`ref.advanced-transformation` for a more detailed
+ discussion.
+
+ - The :samp:`--preserve-unreferenced-resources`
+ is now just a synonym for
+ :samp:`--remove-unreferenced-resources=no`.
+
+ - If the ``QPDF_EXECUTABLE`` environment variable is set when
+ invoking :command:`qpdf --bash-completion` or
+ :command:`qpdf --zsh-completion`, the completion
+ command that it outputs will refer to qpdf using the value of
+ that variable rather than what :command:`qpdf`
+ determines its executable path to be. This can be useful when
+ wrapping :command:`qpdf` with a script, working
+ with a version in the source tree, using an AppImage, or other
+ situations where there is some indirection.
+
+ - Library Enhancements
+
+ - Random number generation is now delegated to the crypto
+ provider. The old behavior is still used by the native crypto
+ provider. It is still possible to provide your own random
+ number generator.
+
+ - Add a new version of
+ ``QPDFObjectHandle::StreamDataProvider::provideStreamData``
+ that accepts the ``suppress_warnings`` and ``will_retry``
+ options and allows a success code to be returned. This makes it
+ possible to implement a ``StreamDataProvider`` that calls
+ ``pipeStreamData`` on another stream and to pass the response
+ back to the caller, which enables better error handling on
+ those proxied streams.
+
+ - Update ``QPDFObjectHandle::pipeStreamData`` to return an
+ overall success code that goes beyond whether or not filtered
+ data was written successfully. This allows better error
+ handling of cases that were not filtering errors. You have to
+ call this explicitly. Methods in previously existing APIs have
+ the same semantics as before.
+
+ - The ``QPDFPageObjectHelper::placeFormXObject`` method now
+ allows separate control over whether it should be willing to
+ shrink or expand objects to fit them better into the
+ destination rectangle. The previous behavior was that shrinking
+ was allowed but expansion was not. The previous behavior is
+ still the default.
+
+ - When calling the C API, any non-zero value passed to a boolean
+ parameter is treated as ``TRUE``. Previously only the value
+ ``1`` was accepted. This makes the C API behave more like most
+ C interfaces and is known to improve compatibility with some
+ Windows environments that dynamically load the DLL and call
+ functions from it.
+
+ - Add ``QPDFObjectHandle::unsafeShallowCopy`` for copying only
+ top-level dictionary keys or array items. This is unsafe
+ because it creates a situation in which changing a lower-level
+ item in one object may also change it in another object, but
+ for cases in which you *know* you are only inserting or
+ replacing top-level items, it is much faster than
+ ``QPDFObjectHandle::shallowCopy``.
+
+ - Add ``QPDFObjectHandle::filterAsContents``, which filter's a
+ stream's data as a content stream. This is useful for parsing
+ the contents for form XObjects in the same way as parsing page
+ content streams.
+
+ - Bug Fixes
+
+ - When detecting and removing unreferenced resources during page
+ splitting, traverse into form XObjects and handle their
+ resources dictionaries as well.
+
+ - The same error recovery is applied to streams in other than the
+ primary input file when merging or splitting pages.
+
+9.1.1: January 26, 2020
+ - Build/Packaging Changes
+
+ - The fix-qdf program was converted from perl to C++. As such,
+ qpdf no longer has a runtime dependency on perl.
+
+ - Library Enhancements
+
+ - Added new helper routine ``QUtil::call_main_from_wmain`` which
+ converts ``wchar_t`` arguments to UTF-8 encoded strings. This
+ is useful for qpdf because library methods expect file names to
+ be UTF-8 encoded, even on Windows
+
+ - Added new ``QUtil::read_lines_from_file`` methods that take
+ ``FILE*`` arguments and that allow preservation of end-of-line
+ characters. This also fixes a bug where
+ ``QUtil::read_lines_from_file`` wouldn't work properly with
+ Unicode filenames.
+
+ - CLI Enhancements
+
+ - Added options :samp:`--is-encrypted` and
+ :samp:`--requires-password` for testing whether
+ a file is encrypted or requires a password other than the
+ supplied (or empty) password. These communicate via exit
+ status, making them useful for shell scripts. They also work on
+ encrypted files with unknown passwords.
+
+ - Added ``encrypt`` key to JSON options. With the exception of
+ the reconstructed user password for older encryption formats,
+ this provides the same information as
+ :samp:`--show-encryption` but in a consistent,
+ parseable format. See output of :command:`qpdf
+ --json-help` for details.
+
+ - Bug Fixes
+
+ - In QDF mode, be sure not to write more than one XRef stream to
+ a file, even when
+ :samp:`--preserve-unreferenced` is used.
+ :command:`fix-qdf` assumes that there is only
+ one XRef stream, and that it appears at the end of the file.
+
+ - When externalizing inline images, properly handle images whose
+ color space is a reference to an object in the page's resource
+ dictionary.
+
+ - Windows-specific fix for acquiring crypt context with a new
+ keyset.
+
+9.1.0: November 17, 2019
+ - Build Changes
+
+ - A C++-11 compiler is now required to build qpdf.
+
+ - A new crypto provider that uses gnutls for crypto functions is
+ now available and can be enabled at build time. See :ref:`ref.crypto` for more information about crypto
+ providers and :ref:`ref.crypto.build` for specific information about
+ the build.
+
+ - Library Enhancements
+
+ - Incorporate contribution from Masamichi Hosoda to properly
+ handle signature dictionaries by not including them in object
+ streams, formatting the ``Contents`` key has a hexadecimal
+ string, and excluding the ``/Contents`` key from encryption and
+ decryption.
+
+ - Incorporate contribution from Masamichi Hosoda to provide new
+ API calls for getting file-level information about input and
+ output files, enabling certain operations on the files at the
+ file level rather than the object level. New methods include
+ ``QPDF::getXRefTable()``,
+ ``QPDFObjectHandle::getParsedOffset()``,
+ ``QPDFWriter::getRenumberedObjGen(QPDFObjGen)``, and
+ ``QPDFWriter::getWrittenXRefTable()``.
+
+ - Support build-time and runtime selectable crypto providers.
+ This includes the addition of new classes
+ ``QPDFCryptoProvider`` and ``QPDFCryptoImpl`` and the
+ recognition of the ``QPDF_CRYPTO_PROVIDER`` environment
+ variable. Crypto providers are described in depth in :ref:`ref.crypto`.
+
+ - CLI Enhancements
+
+ - Addition of the :samp:`--show-crypto` option in
+ support of selectable crypto providers, as described in :ref:`ref.crypto`.
+
+ - Allow ``:even`` or ``:odd`` to be appended to numeric ranges
+ for specification of the even or odd pages from among the pages
+ specified in the range.
+
+ - Fix shell wildcard expansion behavior (``*`` and ``?``) of the
+ :command:`qpdf.exe` as built my MSVC.
+
+9.0.2: October 12, 2019
+ - Bug Fix
+
+ - Fix the name of the temporary file used by
+ :samp:`--replace-input` so that it doesn't
+ require path splitting and works with paths include
+ directories.
+
+9.0.1: September 20, 2019
+ - Bug Fixes/Enhancements
+
+ - Fix some build and test issues on big-endian systems and
+ compilers with characters that are unsigned by default. The
+ problems were in build and test only. There were no actual bugs
+ in the qpdf library itself relating to endianness or unsigned
+ characters.
+
+ - When a dictionary has a duplicated key, report this with a
+ warning. The behavior of the library in this case is unchanged,
+ but the error condition is no longer silently ignored.
+
+ - When a form field's display rectangle is erroneously specified
+ with inverted coordinates, detect and correct this situation.
+ This avoids some form fields from being flipped when flattening
+ annotations on files with this condition.
+
+9.0.0: August 31, 2019
+ - Incompatible API (source-level) Changes (minor)
+
+ - The method ``QUtil::strcasecmp`` has been renamed to
+ ``QUtil::str_compare_nocase``. This incompatible change is
+ necessary to enable qpdf to build on platforms that define
+ ``strcasecmp`` as a macro.
+
+ - The ``QPDF::copyForeignObject`` method had an overloaded
+ version that took a boolean parameter that was not used. If you
+ were using this version, just omit the extra parameter.
+
+ - There was a version ``QPDFTokenizer::expectInlineImage`` that
+ took no arguments. This version has been removed since it
+ caused the tokenizer to return incorrect inline images. A new
+ version was added some time ago that produces correct output.
+ This is a very low level method that doesn't make sense to call
+ outside of qpdf's lexical engine. There are higher level
+ methods for tokenizing content streams.
+
+ - Change ``QPDFOutlineDocumentHelper::getTopLevelOutlines`` and
+ ``QPDFOutlineObjectHelper::getKids`` to return a
+ ``std::vector`` instead of a ``std::list`` of
+ ``QPDFOutlineObjectHelper`` objects.
+
+ - Remove method ``QPDFTokenizer::allowPoundAnywhereInName``. This
+ function would allow creation of name tokens whose value would
+ change when unparsed, which is never the correct behavior.
+
+ - CLI Enhancements
+
+ - The :samp:`--replace-input` option may be given
+ in place of an output file name. This causes qpdf to overwrite
+ the input file with the output. See the description of
+ :samp:`--replace-input` in :ref:`ref.basic-options` for more details.
+
+ - The :samp:`--recompress-flate` instructs
+ :command:`qpdf` to recompress streams that are
+ already compressed with ``/FlateDecode``. Useful with
+ :samp:`--compression-level`.
+
+ - The
+ :samp:`--compression-level={level}`
+ sets the zlib compression level used for any streams compressed
+ by ``/FlateDecode``. Most effective when combined with
+ :samp:`--recompress-flate`.
+
+ - Library Enhancements
+
+ - A new namespace ``QIntC``, provided by
+ :file:`qpdf/QIntC.hh`, provides safe
+ conversion methods between different integer types. These
+ conversion methods do range checking to ensure that the cast
+ can be performed with no loss of information. Every use of
+ ``static_cast`` in the library was inspected to see if it could
+ use one of these safe converters instead. See :ref:`ref.casting` for additional details.
+
+ - Method ``QPDF::anyWarnings`` tells whether there have been any
+ warnings without clearing the list of warnings.
+
+ - Method ``QPDF::closeInputSource`` closes or otherwise releases
+ the input source. This enables the input file to be deleted or
+ renamed.
+
+ - New methods have been added to ``QUtil`` for converting back
+ and forth between strings and unsigned integers:
+ ``uint_to_string``, ``uint_to_string_base``,
+ ``string_to_uint``, and ``string_to_ull``.
+
+ - New methods have been added to ``QPDFObjectHandle`` that return
+ the value of ``Integer`` objects as ``int`` or ``unsigned int``
+ with range checking and sensible fallback values, and a new
+ method was added to return an unsigned value. This makes it
+ easier to write code that is safe from unintentional data loss.
+ Functions: ``getUIntValue``, ``getIntValueAsInt``,
+ ``getUIntValueAsUInt``.
+
+ - When parsing content streams with
+ ``QPDFObjectHandle::ParserCallbacks``, in place of the method
+ ``handleObject(QPDFObjectHandle)``, the developer may override
+ ``handleObject(QPDFObjectHandle, size_t offset, size_t
+ length)``. If this method is defined, it will
+ be invoked with the object along with its offset and length
+ within the overall contents being parsed. Intervening spaces
+ and comments are not included in offset and length.
+ Additionally, a new method ``contentSize(size_t)`` may be
+ implemented. If present, it will be called prior to the first
+ call to ``handleObject`` with the total size in bytes of the
+ combined contents.
+
+ - New methods ``QPDF::userPasswordMatched`` and
+ ``QPDF::ownerPasswordMatched`` have been added to enable a
+ caller to determine whether the supplied password was the user
+ password, the owner password, or both. This information is also
+ displayed by :command:`qpdf --show-encryption`
+ and :command:`qpdf --check`.
+
+ - Static method ``Pl_Flate::setCompressionLevel`` can be called
+ to set the zlib compression level globally used by all
+ instances of Pl_Flate in deflate mode.
+
+ - The method ``QPDFWriter::setRecompressFlate`` can be called to
+ tell ``QPDFWriter`` to uncompress and recompress streams
+ already compressed with ``/FlateDecode``.
+
+ - The underlying implementation of QPDF arrays has been enhanced
+ to be much more memory efficient when dealing with arrays with
+ lots of nulls. This enables qpdf to use drastically less memory
+ for certain types of files.
+
+ - When traversing the pages tree, if nodes are encountered with
+ invalid types, the types are fixed, and a warning is issued.
+
+ - A new helper method ``QUtil::read_file_into_memory`` was added.
+
+ - All conditions previously reported by
+ ``QPDF::checkLinearization()`` as errors are now presented as
+ warnings.
+
+ - Name tokens containing the ``#`` character not preceded by two
+ hexadecimal digits, which is invalid in PDF 1.2 and above, are
+ properly handled by the library: a warning is generated, and
+ the name token is properly preserved, even if invalid, in the
+ output. See :file:`ChangeLog` for a more
+ complete description of this change.
+
+ - Bug Fixes
+
+ - A small handful of memory issues, assertion failures, and
+ unhandled exceptions that could occur on badly mangled input
+ files have been fixed. Most of these problems were found by
+ Google's OSS-Fuzz project.
+
+ - When :command:`qpdf --check` or
+ :command:`qpdf --check-linearization` encounters
+ a file with linearization warnings but not errors, it now
+ properly exits with exit code 3 instead of 2.
+
+ - The :samp:`--completion-bash` and
+ :samp:`--completion-zsh` options now work
+ properly when qpdf is invoked as an AppImage.
+
+ - Calling ``QPDFWriter::set*EncryptionParameters`` on a
+ ``QPDFWriter`` object whose output filename has not yet been
+ set no longer produces a segmentation fault.
+
+ - When reading encrypted files, follow the spec more closely
+ regarding encryption key length. This allows qpdf to open
+ encrypted files in most cases when they have invalid or missing
+ /Length keys in the encryption dictionary.
+
+ - Build Changes
+
+ - On platforms that support it, qpdf now builds with
+ :samp:`-fvisibility=hidden`. If you build qpdf
+ with your own build system, this is now safe to use. This
+ prevents methods that are not part of the public API from being
+ exported by the shared library, and makes qpdf's ELF shared
+ libraries (used on Linux, MacOS, and most other UNIX flavors)
+ behave more like the Windows DLL. Since the DLL already behaves
+ in much this way, it is unlikely that there are any methods
+ that were accidentally not exported. However, with ELF shared
+ libraries, typeinfo for some classes has to be explicitly
+ exported. If there are problems in dynamically linked code
+ catching exceptions or subclassing, this could be the reason.
+ If you see this, please report a bug at
+ https://github.com/qpdf/qpdf/issues/.
+
+ - QPDF is now compiled with integer conversion and sign
+ conversion warnings enabled. Numerous changes were made to the
+ library to make this safe.
+
+ - QPDF's :command:`make install` target explicitly
+ specifies the mode to use when installing files instead of
+ relying the user's umask. It was previously doing this for some
+ files but not others.
+
+ - If :command:`pkg-config` is available, use it to
+ locate :file:`libjpeg` and
+ :file:`zlib` dependencies, falling back on
+ old behavior if unsuccessful.
+
+ - Other Notes
+
+ - QPDF has been fully integrated into `Google's OSS-Fuzz
+ project <https://github.com/google/oss-fuzz>`__. This project
+ exercises code with randomly mutated inputs and is great for
+ discovering hidden security crashes and security issues.
+ Several bugs found by oss-fuzz have already been fixed in qpdf.
+
+8.4.2: May 18, 2019
+ This release has just one change: correction of a buffer overrun in
+ the Windows code used to open files. Windows users should take this
+ update. There are no code changes that affect non-Windows releases.
+
+8.4.1: April 27, 2019
+ - Enhancements
+
+ - When :command:`qpdf --version` is run, it will
+ detect if the qpdf CLI was built with a different version of
+ qpdf than the library, which may indicate a problem with the
+ installation.
+
+ - New option :samp:`--remove-page-labels` will
+ remove page labels before generating output. This used to
+ happen if you ran :command:`qpdf --empty --pages ..
+ --`, but the behavior changed in qpdf 8.3.0. This
+ option enables people who were relying on the old behavior to
+ get it again.
+
+ - New option
+ :samp:`--keep-files-open-threshold={count}`
+ can be used to override number of files that qpdf will use to
+ trigger the behavior of not keeping all files open when merging
+ files. This may be necessary if your system allows fewer than
+ the default value of 200 files to be open at the same time.
+
+ - Bug Fixes
+
+ - Handle Unicode characters in filenames on Windows. The changes
+ to support Unicode on the CLI in Windows broke Unicode
+ filenames for Windows.
+
+ - Slightly tighten logic that determines whether an object is a
+ page. This should resolve problems in some rare files where
+ some non-page objects were passing qpdf's test for whether
+ something was a page, thus causing them to be erroneously lost
+ during page splitting operations.
+
+ - Revert change that included preservation of outlines
+ (bookmarks) in :samp:`--split-pages`. The way
+ it was implemented in 8.3.0 and 8.4.0 caused a very significant
+ degradation of performance for splitting certain files. A
+ future release of qpdf may re-introduce the behavior in a more
+ performant and also more correct fashion.
+
+ - In JSON mode, add missing leading 0 to decimal values between
+ -1 and 1 even if not present in the input. The JSON
+ specification requires the leading 0. The PDF specification
+ does not.
+
+8.4.0: February 1, 2019
+ - Command-line Enhancements
+
+ - *Non-compatible CLI change:* The qpdf command-line tool
+ interprets passwords given at the command-line differently from
+ previous releases when the passwords contain non-ASCII
+ characters. In some cases, the behavior differs from previous
+ releases. For a discussion of the current behavior, please see
+ :ref:`ref.unicode-passwords`. The
+ incompatibilities are as follows:
+
+ - On Windows, qpdf now receives all command-line options as
+ Unicode strings if it can figure out the appropriate
+ compile/link options. This is enabled at least for MSVC and
+ mingw builds. That means that if non-ASCII strings are
+ passed to the qpdf CLI in Windows, qpdf will now correctly
+ receive them. In the past, they would have either been
+ encoded as Windows code page 1252 (also known as "Windows
+ ANSI" or as something unintelligible. In almost all cases,
+ qpdf is able to properly interpret Unicode arguments now,
+ whereas in the past, it would almost never interpret them
+ properly. The result is that non-ASCII passwords given to
+ the qpdf CLI on Windows now have a much greater chance of
+ creating PDF files that can be opened by a variety of
+ readers. In the past, usually files encrypted from the
+ Windows CLI using non-ASCII passwords would not be readable
+ by most viewers. Note that the current version of qpdf is
+ able to decrypt files that it previously created using the
+ previously supplied password.
+
+ - The PDF specification requires passwords to be encoded as
+ UTF-8 for 256-bit encryption and with PDF Doc encoding for
+ 40-bit or 128-bit encryption. Older versions of qpdf left it
+ up to the user to provide passwords with the correct
+ encoding. The qpdf CLI now detects when a password is given
+ with UTF-8 encoding and automatically transcodes it to what
+ the PDF spec requires. While this is almost always the
+ correct behavior, it is possible to override the behavior if
+ there is some reason to do so. This is discussed in more
+ depth in :ref:`ref.unicode-passwords`.
+
+ - New options
+ :samp:`--externalize-inline-images`,
+ :samp:`--ii-min-bytes`, and
+ :samp:`--keep-inline-images` control qpdf's
+ handling of inline images and possible conversion of them to
+ regular images. By default,
+ :samp:`--optimize-images` now also applies to
+ inline images. These options are discussed in :ref:`ref.advanced-transformation`.
+
+ - Add options :samp:`--overlay` and
+ :samp:`--underlay` for overlaying or
+ underlaying pages of other files onto output pages. See
+ :ref:`ref.overlay-underlay` for
+ details.
+
+ - When opening an encrypted file with a password, if the
+ specified password doesn't work and the password contains any
+ non-ASCII characters, qpdf will try a number of alternative
+ passwords to try to compensate for possible character encoding
+ errors. This behavior can be suppressed with the
+ :samp:`--suppress-password-recovery` option.
+ See :ref:`ref.unicode-passwords` for a full
+ discussion.
+
+ - Add the :samp:`--password-mode` option to
+ fine-tune how qpdf interprets password arguments, especially
+ when they contain non-ASCII characters. See :ref:`ref.unicode-passwords` for more information.
+
+ - In the :samp:`--pages` option, it is now
+ possible to copy the same page more than once from the same
+ file without using the previous workaround of specifying two
+ different paths to the same file.
+
+ - In the :samp:`--pages` option, allow use of "."
+ as a shortcut for the primary input file. That way, you can do
+ :command:`qpdf in.pdf --pages . 1-2 -- out.pdf`
+ instead of having to repeat :file:`in.pdf`
+ in the command.
+
+ - When encrypting with 128-bit and 256-bit encryption, new
+ encryption options :samp:`--assemble`,
+ :samp:`--annotate`,
+ :samp:`--form`, and
+ :samp:`--modify-other` allow more fine-grained
+ granularity in configuring options. Before, the
+ :samp:`--modify` option only configured certain
+ predefined groups of permissions.
+
+ - Bug Fixes and Enhancements
+
+ - *Potential data-loss bug:* Versions of qpdf between 8.1.0 and
+ 8.3.0 had a bug that could cause page splitting and merging
+ operations to drop some font or image resources if the PDF
+ file's internal structure shared these resource lists across
+ pages and if some but not all of the pages in the output did
+ not reference all the fonts and images. Using the
+ :samp:`--preserve-unreferenced-resources`
+ option would work around the incorrect behavior. This bug was
+ the result of a typo in the code and a deficiency in the test
+ suite. The case that triggered the error was known, just not
+ handled properly. This case is now exercised in qpdf's test
+ suite and properly handled.
+
+ - When optimizing images, detect and refuse to optimize images
+ that can't be converted to JPEG because of bit depth or color
+ space.
+
+ - Linearization and page manipulation APIs now detect and recover
+ from files that have duplicate Page objects in the pages tree.
+
+ - Using older option
+ :samp:`--stream-data=compress` with object
+ streams, object streams and xref streams were not compressed.
+
+ - When the tokenizer returns inline image tokens, delimiters
+ following ``ID`` and ``EI`` operators are no longer excluded.
+ This makes it possible to reliably extract the actual image
+ data.
+
+ - Library Enhancements
+
+ - Add method ``QPDFPageObjectHelper::externalizeInlineImages`` to
+ convert inline images to regular images.
+
+ - Add method ``QUtil::possible_repaired_encodings()`` to generate
+ a list of strings that represent other ways the given string
+ could have been encoded. This is the method the QPDF CLI uses
+ to generate the strings it tries when recovering incorrectly
+ encoded Unicode passwords.
+
+ - Add new versions of
+ ``QPDFWriter::setR{3,4,5,6}EncryptionParameters`` that allow
+ more granular setting of permissions bits. See
+ :file:`QPDFWriter.hh` for details.
+
+ - Add new versions of the transcoders from UTF-8 to single-byte
+ coding systems in ``QUtil`` that report success or failure
+ rather than just substituting a specified unknown character.
+
+ - Add method ``QUtil::analyze_encoding()`` to determine whether a
+ string has high-bit characters and is appears to be UTF-16 or
+ valid UTF-8 encoding.
+
+ - Add new method ``QPDFPageObjectHelper::shallowCopyPage()`` to
+ copy a new page that is a "shallow copy" of a page. The
+ resulting object is an indirect object ready to be passed to
+ ``QPDFPageDocumentHelper::addPage()`` for either the original
+ ``QPDF`` object or a different one. This is what the
+ :command:`qpdf` command-line tool uses to copy
+ the same page multiple times from the same file during
+ splitting and merging operations.
+
+ - Add method ``QPDF::getUniqueId()``, which returns a unique
+ identifier for the given QPDF object. The identifier will be
+ unique across the life of the application. The returned value
+ can be safely used as a map key.
+
+ - Add method ``QPDF::setImmediateCopyFrom``. This further
+ enhances qpdf's ability to allow a ``QPDF`` object from which
+ objects are being copied to go out of scope before the
+ destination object is written. If you call this method on a
+ ``QPDF`` instances, objects copied *from* this instance will be
+ copied immediately instead of lazily. This option uses more
+ memory but allows the source object to go out of scope before
+ the destination object is written in all cases. See comments in
+ :file:`QPDF.hh` for details.
+
+ - Add method ``QPDFPageObjectHelper::getAttribute`` for
+ retrieving an attribute from the page dictionary taking
+ inheritance into consideration, and optionally making a copy if
+ your intention is to modify the attribute.
+
+ - Fix long-standing limitation of
+ ``QPDFPageObjectHelper::getPageImages`` so that it now properly
+ reports images from inherited resources dictionaries,
+ eliminating the need to call
+ ``QPDFPageDocumentHelper::pushInheritedAttributesToPage`` in
+ this case.
+
+ - Add method ``QPDFObjectHandle::getUniqueResourceName`` for
+ finding an unused name in a resource dictionary.
+
+ - Add method ``QPDFPageObjectHelper::getFormXObjectForPage`` for
+ generating a form XObject equivalent to a page. The resulting
+ object can be used in the same file or copied to another file
+ with ``copyForeignObject``. This can be useful for implementing
+ underlay, overlay, n-up, thumbnails, or any other functionality
+ requiring replication of pages in other contexts.
+
+ - Add method ``QPDFPageObjectHelper::placeFormXObject`` for
+ generating content stream text that places a given form XObject
+ on a page, centered and fit within a specified rectangle. This
+ method takes care of computing the proper transformation matrix
+ and may optionally compensate for rotation or scaling of the
+ destination page.
+
+ - Build Improvements
+
+ - Add new configure option
+ :samp:`--enable-avoid-windows-handle`, which
+ causes the preprocessor symbol ``AVOID_WINDOWS_HANDLE`` to be
+ defined. When defined, qpdf will avoid referencing the Windows
+ ``HANDLE`` type, which is disallowed with certain versions of
+ the Windows SDK.
+
+ - For Windows builds, attempt to determine what options, if any,
+ have to be passed to the compiler and linker to enable use of
+ ``wmain``. This causes the preprocessor symbol
+ ``WINDOWS_WMAIN`` to be defined. If you do your own builds with
+ other compilers, you can define this symbol to cause ``wmain``
+ to be used. This is needed to allow the Windows
+ :command:`qpdf` command to receive Unicode
+ command-line options.
+
+8.3.0: January 7, 2019
+ - Command-line Enhancements
+
+ - Shell completion: you can now use eval :command:`$(qpdf
+ --completion-bash)` and eval :command:`$(qpdf
+ --completion-zsh)` to enable shell completion for
+ bash and zsh.
+
+ - Page numbers (also known as page labels) are now preserved when
+ merging and splitting files with the
+ :samp:`--pages` and
+ :samp:`--split-pages` options.
+
+ - Bookmarks are partially preserved when splitting pages with the
+ :samp:`--split-pages` option. Specifically, the
+ outlines dictionary and some supporting metadata are copied
+ into the split files. The result is that all bookmarks from the
+ original file appear, those that point to pages that are
+ preserved work, and those that point to pages that are not
+ preserved don't do anything. This is an interim step toward
+ proper support for bookmarks in splitting and merging
+ operations.
+
+ - Page collation: add new option
+ :samp:`--collate`. When specified, the
+ semantics of :samp:`--pages` change from
+ concatenation to collation. See :ref:`ref.page-selection` for examples and discussion.
+
+ - Generation of information in JSON format, primarily to
+ facilitate use of qpdf from languages other than C++. Add new
+ options :samp:`--json`,
+ :samp:`--json-key`, and
+ :samp:`--json-object` to generate a JSON
+ representation of the PDF file. Run :command:`qpdf
+ --json-help` to get a description of the JSON
+ format. For more information, see :ref:`ref.json`.
+
+ - The :samp:`--generate-appearances` flag will
+ cause qpdf to generate appearances for form fields if the PDF
+ file indicates that form field appearances are out of date.
+ This can happen when PDF forms are filled in by a program that
+ doesn't know how to regenerate the appearances of the filled-in
+ fields.
+
+ - The :samp:`--flatten-annotations` flag can be
+ used to *flatten* annotations, including form fields.
+ Ordinarily, annotations are drawn separately from the page.
+ Flattening annotations is the process of combining their
+ appearances into the page's contents. You might want to do this
+ if you are going to rotate or combine pages using a tool that
+ doesn't understand about annotations. You may also want to use
+ :samp:`--generate-appearances` when using this
+ flag since annotations for outdated form fields are not
+ flattened as that would cause loss of information.
+
+ - The :samp:`--optimize-images` flag tells qpdf
+ to recompresses every image using DCT (JPEG) compression as
+ long as the image is not already compressed with lossy
+ compression and recompressing the image reduces its size. The
+ additional options :samp:`--oi-min-width`,
+ :samp:`--oi-min-height`, and
+ :samp:`--oi-min-area` prevent recompression of
+ images whose width, height, or pixel area (width × height) are
+ below a specified threshold.
+
+ - The :samp:`--show-object` option can now be
+ given as :samp:`--show-object=trailer` to show
+ the trailer dictionary.
+
+ - Bug Fixes and Enhancements
+
+ - QPDF now automatically detects and recovers from dangling
+ references. If a PDF file contained an indirect reference to a
+ non-existent object, which is valid, when adding a new object
+ to the file, it was possible for the new object to take the
+ object ID of the dangling reference, thereby causing the
+ dangling reference to point to the new object. This case is now
+ prevented.
+
+ - Fixes to form field setting code: strings are always written in
+ UTF-16 format, and checkboxes and radio buttons are handled
+ properly with respect to synchronization of values and
+ appearance states.
+
+ - The ``QPDF::checkLinearization()`` no longer causes the program
+ to crash when it detects problems with linearization data.
+ Instead, it issues a normal warning or error.
+
+ - Ordinarily qpdf treats an argument of the form
+ :samp:`@file` to mean that command-line options
+ should be read from :file:`file`. Now, if
+ :file:`file` does not exist but
+ :file:`@file` does, qpdf will treat
+ :file:`@file` as a regular option. This
+ makes it possible to work more easily with PDF files whose
+ names happen to start with the ``@`` character.
+
+ - Library Enhancements
+
+ - Remove the restriction in most cases that the source QPDF
+ object used in a ``QPDF::copyForeignObject`` call has to stick
+ around until the destination QPDF is written. The exceptional
+ case is when the source stream gets is data using a
+ QPDFObjectHandle::StreamDataProvider. For a more in-depth
+ discussion, see comments around ``copyForeignObject`` in
+ :file:`QPDF.hh`.
+
+ - Add new method ``QPDFWriter::getFinalVersion()``, which returns
+ the PDF version that will ultimately be written to the final
+ file. See comments in :file:`QPDFWriter.hh`
+ for some restrictions on its use.
+
+ - Add several methods for transcoding strings to some of the
+ character sets used in PDF files: ``QUtil::utf8_to_ascii``,
+ ``QUtil::utf8_to_win_ansi``, ``QUtil::utf8_to_mac_roman``, and
+ ``QUtil::utf8_to_utf16``. For the single-byte encodings that
+ support only a limited character sets, these methods replace
+ unsupported characters with a specified substitute.
+
+ - Add new methods to ``QPDFAnnotationObjectHelper`` and
+ ``QPDFFormFieldObjectHelper`` for querying flags and
+ interpretation of different field types. Define constants in
+ :file:`qpdf/Constants.h` to help with
+ interpretation of flag values.
+
+ - Add new methods
+ ``QPDFAcroFormDocumentHelper::generateAppearancesIfNeeded`` and
+ ``QPDFFormFieldObjectHelper::generateAppearance`` for
+ generating appearance streams. See discussion in
+ :file:`QPDFFormFieldObjectHelper.hh` for
+ limitations.
+
+ - Add two new helper functions for dealing with resource
+ dictionaries: ``QPDFObjectHandle::getResourceNames()`` returns
+ a list of all second-level keys, which correspond to the names
+ of resources, and ``QPDFObjectHandle::mergeResources()`` merges
+ two resources dictionaries as long as they have non-conflicting
+ keys. These methods are useful for certain types of objects
+ that resolve resources from multiple places, such as form
+ fields.
+
+ - Add methods ``QPDFPageDocumentHelper::flattenAnnotations()``
+ and
+ ``QPDFAnnotationObjectHelper::getPageContentForAppearance()``
+ for handling low-level details of annotation flattening.
+
+ - Add new helper classes: ``QPDFOutlineDocumentHelper``,
+ ``QPDFOutlineObjectHelper``, ``QPDFPageLabelDocumentHelper``,
+ ``QPDFNameTreeObjectHelper``, and
+ ``QPDFNumberTreeObjectHelper``.
+
+ - Add method ``QPDFObjectHandle::getJSON()`` that returns a JSON
+ representation of the object. Call ``serialize()`` on the
+ result to convert it to a string.
+
+ - Add a simple JSON serializer. This is not a complete or
+ general-purpose JSON library. It allows assembly and
+ serialization of JSON structures with some restrictions, which
+ are described in the header file. This is the serializer used
+ by qpdf's new JSON representation.
+
+ - Add new ``QPDFObjectHandle::Matrix`` class along with a few
+ convenience methods for dealing with six-element numerical
+ arrays as matrices.
+
+ - Add new method ``QPDFObjectHandle::wrapInArray``, which returns
+ the object itself if it is an array, or an array containing the
+ object otherwise. This is a common construct in PDF. This
+ method prevents you from having to explicitly test whether
+ something is a single element or an array.
+
+ - Build Improvements
+
+ - It is no longer necessary to run
+ :command:`autogen.sh` to build from a pristine
+ checkout. Automatically generated files are now committed so
+ that it is possible to build on platforms without autoconf
+ directly from a clean checkout of the repository. The
+ :command:`configure` script detects if the files
+ are out of date when it also determines that the tools are
+ present to regenerate them.
+
+ - Pull requests and the master branch are now built automatically
+ in `Azure
+ Pipelines <https://dev.azure.com/qpdf/qpdf/_build>`__, which is
+ free for open source projects. The build includes Linux, mac,
+ Windows 32-bit and 64-bit with mingw and MSVC, and an AppImage
+ build. Official qpdf releases are now built with Azure
+ Pipelines.
+
+ - Notes for Packagers
+
+ - A new section has been added to the documentation with notes
+ for packagers. Please see :ref:`ref.packaging`.
+
+ - The qpdf detects out-of-date automatically generated files. If
+ your packaging system automatically refreshes libtool or
+ autoconf files, it could cause this check to fail. To avoid
+ this problem, pass
+ :samp:`--disable-check-autofiles` to
+ :command:`configure`.
+
+ - If you would like to have qpdf completion enabled
+ automatically, you can install completion files in the
+ distribution's default location. You can find sample completion
+ files to install in the :file:`completions`
+ directory.
+
+8.2.1: August 18, 2018
+ - Command-line Enhancements
+
+ - Add
+ :samp:`--keep-files-open={[yn]}`
+ to override default determination of whether to keep files open
+ when merging. Please see the discussion of
+ :samp:`--keep-files-open` in :ref:`ref.basic-options` for additional details.
+
+8.2.0: August 16, 2018
+ - Command-line Enhancements
+
+ - Add :samp:`--no-warn` option to suppress
+ issuing warning messages. If there are any conditions that
+ would have caused warnings to be issued, the exit status is
+ still 3.
+
+ - Bug Fixes and Optimizations
+
+ - Performance fix: optimize page merging operation to avoid
+ unnecessary open/close calls on files being merged. This solves
+ a dramatic slow-down that was observed when merging certain
+ types of files.
+
+ - Optimize how memory was used for the TIFF predictor,
+ drastically improving performance and memory usage for files
+ containing high-resolution images compressed with Flate using
+ the TIFF predictor.
+
+ - Bug fix: end of line characters were not properly handled
+ inside strings in some cases.
+
+ - Bug fix: using :samp:`--progress` on very small
+ files could cause an infinite loop.
+
+ - API enhancements
+
+ - Add new class ``QPDFSystemError``, derived from
+ ``std::runtime_error``, which is now thrown by
+ ``QUtil::throw_system_error``. This enables the triggering
+ ``errno`` value to be retrieved.
+
+ - Add ``ClosedFileInputSource::stayOpen`` method, enabling a
+ ``ClosedFileInputSource`` to stay open during manually
+ indicated periods of high activity, thus reducing the overhead
+ of frequent open/close operations.
+
+ - Build Changes
+
+ - For the mingw builds, change the name of the DLL import library
+ from :file:`libqpdf.a` to
+ :file:`libqpdf.dll.a` to more accurately
+ reflect that it is an import library rather than a static
+ library. This potentially clears the way for supporting a
+ static library in the future, though presently, the qpdf
+ Windows build only builds the DLL and executables.
+
+8.1.0: June 23, 2018
+ - Usability Improvements
+
+ - When splitting files, qpdf detects fonts and images that the
+ document metadata claims are referenced from a page but are not
+ actually referenced and omits them from the output file. This
+ change can cause a significant reduction in the size of split
+ PDF files for files created by some software packages. In some
+ cases, it can also make page splitting slower. Prior versions
+ of qpdf would believe the document metadata and sometimes
+ include all the images from all the other pages even though the
+ pages were no longer present. In the unlikely event that the
+ old behavior should be desired, or if you have a case where
+ page splitting is very slow, the old behavior (and speed) can
+ be enabled by specifying
+ :samp:`--preserve-unreferenced-resources`. For
+ additional details, please see :ref:`ref.advanced-transformation`.
+
+ - When merging multiple PDF files, qpdf no longer leaves all the
+ files open. This makes it possible to merge numbers of files
+ that may exceed the operating system's limit for the maximum
+ number of open files.
+
+ - The :samp:`--rotate` option's syntax has been
+ extended to make the page range optional. If you specify
+ :samp:`--rotate={angle}`
+ without specifying a page range, the rotation will be applied
+ to all pages. This can be especially useful for adjusting a PDF
+ created from a multi-page document that was scanned upside
+ down.
+
+ - When merging multiple files, the
+ :samp:`--verbose` option now prints information
+ about each file as it operates on that file.
+
+ - When the :samp:`--progress` option is
+ specified, qpdf will print a running indicator of its best
+ guess at how far through the writing process it is. Note that,
+ as with all progress meters, it's an approximation. This option
+ is implemented in a way that makes it useful for software that
+ uses the qpdf library; see API Enhancements below.
+
+ - Bug Fixes
+
+ - Properly decrypt files that use revision 3 of the standard
+ security handler but use 40 bit keys (even though revision 3
+ supports 128-bit keys).
+
+ - Limit depth of nested data structures to prevent crashes from
+ certain types of malformed (malicious) PDFs.
+
+ - In "newline before endstream" mode, insert the required extra
+ newline before the ``endstream`` at the end of object streams.
+ This one case was previously omitted.
+
+ - API Enhancements
+
+ - The first round of higher level "helper" interfaces has been
+ introduced. These are designed to provide a more convenient way
+ of interacting with certain document features than using
+ ``QPDFObjectHandle`` directly. For details on helpers, see
+ :ref:`ref.helper-classes`. Specific additional
+ interfaces are described below.
+
+ - Add two new document helper classes: ``QPDFPageDocumentHelper``
+ for working with pages, and ``QPDFAcroFormDocumentHelper`` for
+ working with interactive forms. No old methods have been
+ removed, but ``QPDFPageDocumentHelper`` is now the preferred
+ way to perform operations on pages rather than calling the old
+ methods in ``QPDFObjectHandle`` and ``QPDF`` directly. Comments
+ in the header files direct you to the new interfaces. Please
+ see the header files and :file:`ChangeLog`
+ for additional details.
+
+ - Add three new object helper class: ``QPDFPageObjectHelper`` for
+ pages, ``QPDFFormFieldObjectHelper`` for interactive form
+ fields, and ``QPDFAnnotationObjectHelper`` for annotations. All
+ three classes are fairly sparse at the moment, but they have
+ some useful, basic functionality.
+
+ - A new example program
+ :file:`examples/pdf-set-form-values.cc` has
+ been added that illustrates use of the new document and object
+ helpers.
+
+ - The method ``QPDFWriter::registerProgressReporter`` has been
+ added. This method allows you to register a function that is
+ called by ``QPDFWriter`` to update your idea of the percentage
+ it thinks it is through writing its output. Client programs can
+ use this to implement reasonably accurate progress meters. The
+ :command:`qpdf` command line tool uses this to
+ implement its :samp:`--progress` option.
+
+ - New methods ``QPDFObjectHandle::newUnicodeString`` and
+ ``QPDFObject::unparseBinary`` have been added to allow for more
+ convenient creation of strings that are explicitly encoded
+ using big-endian UTF-16. This is useful for creating strings
+ that appear outside of content streams, such as labels, form
+ fields, outlines, document metadata, etc.
+
+ - A new class ``QPDFObjectHandle::Rectangle`` has been added to
+ ease working with PDF rectangles, which are just arrays of four
+ numeric values.
+
+8.0.2: March 6, 2018
+ - When a loop is detected while following cross reference streams or
+ tables, treat this as damage instead of silently ignoring the
+ previous table. This prevents loss of otherwise recoverable data
+ in some damaged files.
+
+ - Properly handle pages with no contents.
+
+8.0.1: March 4, 2018
+ - Disregard data check errors when uncompressing ``/FlateDecode``
+ streams. This is consistent with most other PDF readers and allows
+ qpdf to recover data from another class of malformed PDF files.
+
+ - On the command line when specifying page ranges, support preceding
+ a page number by "r" to indicate that it should be counted from
+ the end. For example, the range ``r3-r1`` would indicate the last
+ three pages of a document.
+
+8.0.0: February 25, 2018
+ - Packaging and Distribution Changes
+
+ - QPDF is now distributed as an
+ `AppImage <https://appimage.org/>`__ in addition to all the
+ other ways it is distributed. The AppImage can be found in the
+ download area with the other packages. Thanks to Kurt Pfeifle
+ and Simon Peter for their contributions.
+
+ - Bug Fixes
+
+ - ``QPDFObjectHandle::getUTF8Val`` now properly treats
+ non-Unicode strings as encoded with PDF Doc Encoding.
+
+ - Improvements to handling of objects in PDF files that are not
+ of the expected type. In most cases, qpdf will be able to warn
+ for such cases rather than fail with an exception. Previous
+ versions of qpdf would sometimes fail with errors such as
+ "operation for dictionary object attempted on object of wrong
+ type". This situation should be mostly or entirely eliminated
+ now.
+
+ - Enhancements to the :command:`qpdf` Command-line
+ Tool. All new options listed here are documented in more detail in
+ :ref:`ref.using`.
+
+ - The option
+ :samp:`--linearize-pass1={file}`
+ has been added for debugging qpdf's linearization code.
+
+ - The option :samp:`--coalesce-contents` can be
+ used to combine content streams of a page whose contents are an
+ array of streams into a single stream.
+
+ - API Enhancements. All new API calls are documented in their
+ respective classes' header files. There are no non-compatible
+ changes to the API.
+
+ - Add function ``qpdf_check_pdf`` to the C API. This function
+ does basic checking that is a subset of what :command:`qpdf
+ --check` performs.
+
+ - Major enhancements to the lexical layer of qpdf. For a complete
+ list of enhancements, please refer to the
+ :file:`ChangeLog` file. Most of the changes
+ result in improvements to qpdf's ability handle erroneous
+ files. It is also possible for programs to handle whitespace,
+ comments, and inline images as tokens.
+
+ - New API for working with PDF content streams at a lexical
+ level. The new class ``QPDFObjectHandle::TokenFilter`` allows
+ the developer to provide token handlers. Token filters can be
+ used with several different methods in ``QPDFObjectHandle`` as
+ well as with a lower-level interface. See comments in
+ :file:`QPDFObjectHandle.hh` as well as the
+ new examples
+ :file:`examples/pdf-filter-tokens.cc` and
+ :file:`examples/pdf-count-strings.cc` for
+ details.
+
+7.1.1: February 4, 2018
+ - Bug fix: files whose /ID fields were other than 16 bytes long can
+ now be properly linearized
+
+ - A few compile and link issues have been corrected for some
+ platforms.
+
+7.1.0: January 14, 2018
+ - PDF files contain streams that may be compressed with various
+ compression algorithms which, in some cases, may be enhanced by
+ various predictor functions. Previously only the PNG up predictor
+ was supported. In this version, all the PNG predictors as well as
+ the TIFF predictor are supported. This increases the range of
+ files that qpdf is able to handle.
+
+ - QPDF now allows a raw encryption key to be specified in place of a
+ password when opening encrypted files, and will optionally display
+ the encryption key used by a file. This is a non-standard
+ operation, but it can be useful in certain situations. Please see
+ the discussion of :samp:`--password-is-hex-key` in
+ :ref:`ref.basic-options` or the comments around
+ ``QPDF::setPasswordIsHexKey`` in
+ :file:`QPDF.hh` for additional details.
+
+ - Bug fix: numbers ending with a trailing decimal point are now
+ properly recognized as numbers.
+
+ - Bug fix: when building qpdf from source on some platforms
+ (especially MacOS), the build could get confused by older versions
+ of qpdf installed on the system. This has been corrected.
+
+7.0.0: September 15, 2017
+ - Packaging and Distribution Changes
+
+ - QPDF's primary license is now `version 2.0 of the Apache
+ License <http://www.apache.org/licenses/LICENSE-2.0>`__ rather
+ than version 2.0 of the Artistic License. You may still, at
+ your option, consider qpdf to be licensed with version 2.0 of
+ the Artistic license.
+
+ - QPDF no longer has a dependency on the PCRE (Perl-Compatible
+ Regular Expression) library. QPDF now has an added dependency
+ on the JPEG library.
+
+ - Bug Fixes
+
+ - This release contains many bug fixes for various infinite
+ loops, memory leaks, and other memory errors that could be
+ encountered with specially crafted or otherwise erroneous PDF
+ files.
+
+ - New Features
+
+ - QPDF now supports reading and writing streams encoded with JPEG
+ or RunLength encoding. Library API enhancements and
+ command-line options have been added to control this behavior.
+ See command-line options
+ :samp:`--compress-streams` and
+ :samp:`--decode-level` and methods
+ ``QPDFWriter::setCompressStreams`` and
+ ``QPDFWriter::setDecodeLevel``.
+
+ - QPDF is much better at recovering from broken files. In most
+ cases, qpdf will skip invalid objects and will preserve broken
+ stream data by not attempting to filter broken streams. QPDF is
+ now able to recover or at least not crash on dozens of broken
+ test files I have received over the past few years.
+
+ - Page rotation is now supported and accessible from both the
+ library and the command line.
+
+ - ``QPDFWriter`` supports writing files in a way that preserves
+ PCLm compliance in support of driverless printing. This is very
+ specialized and is only useful to applications that already
+ know how to create PCLm files.
+
+ - Enhancements to the :command:`qpdf` Command-line
+ Tool. All new options listed here are documented in more detail in
+ :ref:`ref.using`.
+
+ - Command-line arguments can now be read from files or standard
+ input using ``@file`` or ``@-`` syntax. Please see :ref:`ref.invocation`.
+
+ - :samp:`--rotate`: request page rotation
+
+ - :samp:`--newline-before-endstream`: ensure that
+ a newline appears before every ``endstream`` keyword in the
+ file; used to prevent qpdf from breaking PDF/A compliance on
+ already compliant files.
+
+ - :samp:`--preserve-unreferenced`: preserve
+ unreferenced objects in the input PDF
+
+ - :samp:`--split-pages`: break output into chunks
+ with fixed numbers of pages
+
+ - :samp:`--verbose`: print the name of each
+ output file that is created
+
+ - :samp:`--compress-streams` and
+ :samp:`--decode-level` replace
+ :samp:`--stream-data` for improving granularity
+ of controlling compression and decompression of stream data.
+ The :samp:`--stream-data` option will remain
+ available.
+
+ - When running :command:`qpdf --check` with other
+ options, checks are always run first. This enables qpdf to
+ perform its full recovery logic before outputting other
+ information. This can be especially useful when manually
+ recovering broken files, looking at qpdf's regenerated cross
+ reference table, or other similar operations.
+
+ - Process :command:`--pages` earlier so that other
+ options like :samp:`--show-pages` or
+ :samp:`--split-pages` can operate on the file
+ after page splitting/merging has occurred.
+
+ - API Changes. All new API calls are documented in their respective
+ classes' header files.
+
+ - ``QPDFObjectHandle::rotatePage``: apply rotation to a page
+ object
+
+ - ``QPDFWriter::setNewlineBeforeEndstream``: force newline to
+ appear before ``endstream``
+
+ - ``QPDFWriter::setPreserveUnreferencedObjects``: preserve
+ unreferenced objects that appear in the input PDF. The default
+ behavior is to discard them.
+
+ - New ``Pipeline`` types ``Pl_RunLength`` and ``Pl_DCT`` are
+ available for developers who wish to produce or consume
+ RunLength or DCT stream data directly. The
+ :file:`examples/pdf-create.cc` example
+ illustrates their use.
+
+ - ``QPDFWriter::setCompressStreams`` and
+ ``QPDFWriter::setDecodeLevel`` methods control handling of
+ different types of stream compression.
+
+ - Add new C API functions ``qpdf_set_compress_streams``,
+ ``qpdf_set_decode_level``,
+ ``qpdf_set_preserve_unreferenced_objects``, and
+ ``qpdf_set_newline_before_endstream`` corresponding to the new
+ ``QPDFWriter`` methods.
+
+6.0.0: November 10, 2015
+ - Implement :samp:`--deterministic-id` command-line
+ option and ``QPDFWriter::setDeterministicID`` as well as C API
+ function ``qpdf_set_deterministic_ID`` for generating a
+ deterministic ID for non-encrypted files. When this option is
+ selected, the ID of the file depends on the contents of the output
+ file, and not on transient items such as the timestamp or output
+ file name.
+
+ - Make qpdf more tolerant of files whose xref table entries are not
+ the correct length.
+
+5.1.3: May 24, 2015
+ - Bug fix: fix-qdf was not properly handling files that contained
+ object streams with more than 255 objects in them.
+
+ - Bug fix: qpdf was not properly initializing Microsoft's secure
+ crypto provider on fresh Windows installations that had not had
+ any keys created yet.
+
+ - Fix a few errors found by Gynvael Coldwind and Mateusz Jurczyk of
+ the Google Security Team. Please see the ChangeLog for details.
+
+ - Properly handle pages that have no contents at all. There were
+ many cases in which qpdf handled this fine, but a few methods
+ blindly obtained page contents with handling the possibility that
+ there were no contents.
+
+ - Make qpdf more robust for a few more kinds of problems that may
+ occur in invalid PDF files.
+
+5.1.2: June 7, 2014
+ - Bug fix: linearizing files could create a corrupted output file
+ under extremely unlikely file size circumstances. See ChangeLog
+ for details. The odds of getting hit by this are very low, though
+ one person did.
+
+ - Bug fix: qpdf would fail to write files that had streams with
+ decode parameters referencing other streams.
+
+ - New example program: :command:`pdf-split-pages`:
+ efficiently split PDF files into individual pages. The example
+ program does this more efficiently than using :command:`qpdf
+ --pages` to do it.
+
+ - Packaging fix: Visual C++ binaries did not support Windows XP.
+ This has been rectified by updating the compilers used to generate
+ the release binaries.
+
+5.1.1: January 14, 2014
+ - Performance fix: copying foreign objects could be very slow with
+ certain types of files. This was most likely to be visible during
+ page splitting and was due to traversing the same objects multiple
+ times in some cases.
+
+5.1.0: December 17, 2013
+ - Added runtime option (``QUtil::setRandomDataProvider``) to supply
+ your own random data provider. You can use this if you want to
+ avoid using the OS-provided secure random number generation
+ facility or stdlib's less secure version. See comments in
+ include/qpdf/QUtil.hh for details.
+
+ - Fixed image comparison tests to not create 12-bit-per-pixel images
+ since some versions of tiffcmp have bugs in comparing them in some
+ cases. This increases the disk space required by the image
+ comparison tests, which are off by default anyway.
+
+ - Introduce a number of small fixes for compilation on the latest
+ clang in MacOS and the latest Visual C++ in Windows.
+
+ - Be able to handle broken files that end the xref table header with
+ a space instead of a newline.
+
+5.0.1: October 18, 2013
+ - Thanks to a detailed review by Florian Weimer and the Red Hat
+ Product Security Team, this release includes a number of
+ non-user-visible security hardening changes. Please see the
+ ChangeLog file in the source distribution for the complete list.
+
+ - When available, operating system-specific secure random number
+ generation is used for generating initialization vectors and other
+ random values used during encryption or file creation. For the
+ Windows build, this results in an added dependency on Microsoft's
+ cryptography API. To disable the OS-specific cryptography and use
+ the old version, pass the
+ :samp:`--enable-insecure-random` option to
+ :command:`./configure`.
+
+ - The :command:`qpdf` command-line tool now issues a
+ warning when :samp:`-accessibility=n` is specified
+ for newer encryption versions stating that the option is ignored.
+ qpdf, per the spec, has always ignored this flag, but it
+ previously did so silently. This warning is issued only by the
+ command-line tool, not by the library. The library's handling of
+ this flag is unchanged.
+
+5.0.0: July 10, 2013
+ - Bug fix: previous versions of qpdf would lose objects with
+ generation != 0 when generating object streams. Fixing this
+ required changes to the public API.
+
+ - Removed methods from public API that were only supposed to be
+ called by QPDFWriter and couldn't realistically be called anywhere
+ else. See ChangeLog for details.
+
+ - New ``QPDFObjGen`` class added to represent an object
+ ID/generation pair. ``QPDFObjectHandle::getObjGen()`` is now
+ preferred over ``QPDFObjectHandle::getObjectID()`` and
+ ``QPDFObjectHandle::getGeneration()`` as it makes it less likely
+ for people to accidentally write code that ignores the generation
+ number. See :file:`QPDF.hh` and
+ :file:`QPDFObjectHandle.hh` for additional
+ notes.
+
+ - Add :samp:`--show-npages` command-line option to
+ the :command:`qpdf` command to show the number of
+ pages in a file.
+
+ - Allow omission of the page range within
+ :samp:`--pages` for the
+ :command:`qpdf` command. When omitted, the page
+ range is implicitly taken to be all the pages in the file.
+
+ - Various enhancements were made to support different types of
+ broken files or broken readers. Details can be found in
+ :file:`ChangeLog`.
+
+4.1.0: April 14, 2013
+ - Note to people including qpdf in distributions: the
+ :file:`.la` files generated by libtool are now
+ installed by qpdf's :command:`make install` target.
+ Before, they were not installed. This means that if your
+ distribution does not want to include
+ :file:`.la` files, you must remove them as
+ part of your packaging process.
+
+ - Major enhancement: API enhancements have been made to support
+ parsing of content streams. This enhancement includes the
+ following changes:
+
+ - ``QPDFObjectHandle::parseContentStream`` method parses objects
+ in a content stream and calls handlers in a callback class. The
+ example
+ :file:`examples/pdf-parse-content.cc`
+ illustrates how this may be used.
+
+ - ``QPDFObjectHandle`` can now represent operators and inline
+ images, object types that may only appear in content streams.
+
+ - Method ``QPDFObjectHandle::getTypeCode()`` returns an
+ enumerated type value representing the underlying object type.
+ Method ``QPDFObjectHandle::getTypeName()`` returns a text
+ string describing the name of the type of a
+ ``QPDFObjectHandle`` object. These methods can be used for more
+ efficient parsing and debugging/diagnostic messages.
+
+ - :command:`qpdf --check` now parses all pages'
+ content streams in addition to doing other checks. While there are
+ still many types of errors that cannot be detected, syntactic
+ errors in content streams will now be reported.
+
+ - Minor compilation enhancements have been made to facilitate easier
+ for support for a broader range of compilers and compiler
+ versions.
+
+ - Warning flags have been moved into a separate variable in
+ :file:`autoconf.mk`
+
+ - The configure flag :samp:`--enable-werror` work
+ for Microsoft compilers
+
+ - All MSVC CRT security warnings have been resolved.
+
+ - All C-style casts in C++ Code have been replaced by C++ casts,
+ and many casts that had been included to suppress higher
+ warning levels for some compilers have been removed, primarily
+ for clarity. Places where integer type coercion occurs have
+ been scrutinized. A new casting policy has been documented in
+ the manual. This is of concern mainly to people porting qpdf to
+ new platforms or compilers. It is not visible to programmers
+ writing code that uses the library
+
+ - Some internal limits have been removed in code that converts
+ numbers to strings. This is largely invisible to users, but it
+ does trigger a bug in some older versions of mingw-w64's C++
+ library. See :file:`README-windows.md` in
+ the source distribution if you think this may affect you. The
+ copy of the DLL distributed with qpdf's binary distribution is
+ not affected by this problem.
+
+ - The RPM spec file previously included with qpdf has been removed.
+ This is because virtually all Linux distributions include qpdf now
+ that it is a dependency of CUPS filters.
+
+ - A few bug fixes are included:
+
+ - Overridden compressed objects are properly handled. Before,
+ there were certain constructs that could cause qpdf to see old
+ versions of some objects. The most usual manifestation of this
+ was loss of filled in form values for certain files.
+
+ - Installation no longer uses GNU/Linux-specific versions of some
+ commands, so :command:`make install` works on
+ Solaris with native tools.
+
+ - The 64-bit mingw Windows binary package no longer includes a
+ 32-bit DLL.
+
+4.0.1: January 17, 2013
+ - Fix detection of binary attachments in test suite to avoid false
+ test failures on some platforms.
+
+ - Add clarifying comment in :file:`QPDF.hh` to
+ methods that return the user password explaining that it is no
+ longer possible with newer encryption formats to recover the user
+ password knowing the owner password. In earlier encryption
+ formats, the user password was encrypted in the file using the
+ owner password. In newer encryption formats, a separate encryption
+ key is used on the file, and that key is independently encrypted
+ using both the user password and the owner password.
+
+4.0.0: December 31, 2012
+ - Major enhancement: support has been added for newer encryption
+ schemes supported by version X of Adobe Acrobat. This includes use
+ of 127-character passwords, 256-bit encryption keys, and the
+ encryption scheme specified in ISO 32000-2, the PDF 2.0
+ specification. This scheme can be chosen from the command line by
+ specifying use of 256-bit keys. qpdf also supports the deprecated
+ encryption method used by Acrobat IX. This encryption style has
+ known security weaknesses and should not be used in practice.
+ However, such files exist "in the wild," so support for this
+ scheme is still useful. New methods
+ ``QPDFWriter::setR6EncryptionParameters`` (for the PDF 2.0 scheme)
+ and ``QPDFWriter::setR5EncryptionParameters`` (for the deprecated
+ scheme) have been added to enable these new encryption schemes.
+ Corresponding functions have been added to the C API as well.
+
+ - Full support for Adobe extension levels in PDF version
+ information. Starting with PDF version 1.7, corresponding to ISO
+ 32000, Adobe adds new functionality by increasing the extension
+ level rather than increasing the version. This support includes
+ addition of the ``QPDF::getExtensionLevel`` method for retrieving
+ the document's extension level, addition of versions of
+ ``QPDFWriter::setMinimumPDFVersion`` and
+ ``QPDFWriter::forcePDFVersion`` that accept an extension level,
+ and extended syntax for specifying forced and minimum versions on
+ the command line as described in :ref:`ref.advanced-transformation`. Corresponding functions
+ have been added to the C API as well.
+
+ - Minor fixes to prevent qpdf from referencing objects in the file
+ that are not referenced in the file's overall structure. Most
+ files don't have any such objects, but some files have contain
+ unreferenced objects with errors, so these fixes prevent qpdf from
+ needlessly rejecting or complaining about such objects.
+
+ - Add new generalized methods for reading and writing files from/to
+ programmer-defined sources. The method
+ ``QPDF::processInputSource`` allows the programmer to use any
+ input source for the input file, and
+ ``QPDFWriter::setOutputPipeline`` allows the programmer to write
+ the output file through any pipeline. These methods would make it
+ possible to perform any number of specialized operations, such as
+ accessing external storage systems, creating bindings for qpdf in
+ other programming languages that have their own I/O systems, etc.
+
+ - Add new method ``QPDF::getEncryptionKey`` for retrieving the
+ underlying encryption key used in the file.
+
+ - This release includes a small handful of non-compatible API
+ changes. While effort is made to avoid such changes, all the
+ non-compatible API changes in this version were to parts of the
+ API that would likely never be used outside the library itself. In
+ all cases, the altered methods or structures were parts of the
+ ``QPDF`` that were public to enable them to be called from either
+ ``QPDFWriter`` or were part of validation code that was
+ over-zealous in reporting problems in parts of the file that would
+ not ordinarily be referenced. In no case did any of the removed
+ methods do anything worse that falsely report error conditions in
+ files that were broken in ways that didn't matter. The following
+ public parts of the ``QPDF`` class were changed in a
+ non-compatible way:
+
+ - Updated nested ``QPDF::EncryptionData`` class to add fields
+ needed by the newer encryption formats, member variables
+ changed to private so that future changes will not require
+ breaking backward compatibility.
+
+ - Added additional parameters to ``compute_data_key``, which is
+ used by ``QPDFWriter`` to compute the encryption key used to
+ encrypt a specific object.
+
+ - Removed the method ``flattenScalarReferences``. This method was
+ previously used prior to writing a new PDF file, but it has the
+ undesired side effect of causing qpdf to read objects in the
+ file that were not referenced. Some otherwise files have
+ unreferenced objects with errors in them, so this could cause
+ qpdf to reject files that would be accepted by virtually all
+ other PDF readers. In fact, qpdf relied on only a very small
+ part of what flattenScalarReferences did, so only this part has
+ been preserved, and it is now done directly inside
+ ``QPDFWriter``.
+
+ - Removed the method ``decodeStreams``. This method was used by
+ the :samp:`--check` option of the
+ :command:`qpdf` command-line tool to force all
+ streams in the file to be decoded, but it also suffered from
+ the problem of opening otherwise unreferenced streams and thus
+ could report false positive. The
+ :samp:`--check` option now causes qpdf to go
+ through all the motions of writing a new file based on the
+ original one, so it will always reference and check exactly
+ those parts of a file that any ordinary viewer would check.
+
+ - Removed the method ``trimTrailerForWrite``. This method was
+ used by ``QPDFWriter`` to modify the original QPDF object by
+ removing fields from the trailer dictionary that wouldn't apply
+ to the newly written file. This functionality, though generally
+ harmless, was a poor implementation and has been replaced by
+ having QPDFWriter filter these out when copying the trailer
+ rather than modifying the original QPDF object. (Note that qpdf
+ never modifies the original file itself.)
+
+ - Allow the PDF header to appear anywhere in the first 1024 bytes of
+ the file. This is consistent with what other readers do.
+
+ - Fix the :command:`pkg-config` files to list zlib
+ and pcre in ``Requires.private`` to better support static linking
+ using :command:`pkg-config`.
+
+3.0.2: September 6, 2012
+ - Bug fix: ``QPDFWriter::setOutputMemory`` did not work when not
+ used with ``QPDFWriter::setStaticID``, which made it pretty much
+ useless. This has been fixed.
+
+ - New API call ``QPDFWriter::setExtraHeaderText`` inserts additional
+ text near the header of the PDF file. The intended use case is to
+ insert comments that may be consumed by a downstream application,
+ though other use cases may exist.
+
+3.0.1: August 11, 2012
+ - Version 3.0.0 included addition of files for
+ :command:`pkg-config`, but this was not mentioned
+ in the release notes. The release notes for 3.0.0 were updated to
+ mention this.
+
+ - Bug fix: if an object stream ended with a scalar object not
+ followed by space, qpdf would incorrectly report that it
+ encountered a premature EOF. This bug has been in qpdf since
+ version 2.0.
+
+3.0.0: August 2, 2012
+ - Acknowledgment: I would like to express gratitude for the
+ contributions of Tobias Hoffmann toward the release of qpdf
+ version 3.0. He is responsible for most of the implementation and
+ design of the new API for manipulating pages, and contributed code
+ and ideas for many of the improvements made in version 3.0.
+ Without his work, this release would certainly not have happened
+ as soon as it did, if at all.
+
+ - *Non-compatible API changes:*
+
+ - The method ``QPDFObjectHandle::replaceStreamData`` that uses a
+ ``StreamDataProvider`` to provide the stream data no longer
+ takes a ``length`` parameter. The parameter was removed since
+ this provides the user an opportunity to simplify the calling
+ code. This method was introduced in version 2.2. At the time,
+ the ``length`` parameter was required in order to ensure that
+ calls to the stream data provider returned the same length for a
+ specific stream every time they were invoked. In particular, the
+ linearization code depends on this. Instead, qpdf 3.0 and newer
+ check for that constraint explicitly. The first time the stream
+ data provider is called for a specific stream, the actual length
+ is saved, and subsequent calls are required to return the same
+ number of bytes. This means the calling code no longer has to
+ compute the length in advance, which can be a significant
+ simplification. If your code fails to compile because of the
+ extra argument and you don't want to make other changes to your
+ code, just omit the argument.
+
+ - Many methods take ``long long`` instead of other integer types.
+ Most if not all existing code should compile fine with this
+ change since such parameters had always previously been smaller
+ types. This change was required to support files larger than two
+ gigabytes in size.
+
+ - Support has been added for large files. The test suite verifies
+ support for files larger than 4 gigabytes, and manual testing has
+ verified support for files larger than 10 gigabytes. Large file
+ support is available for both 32-bit and 64-bit platforms as long
+ as the compiler and underlying platforms support it.
+
+ - Support for page selection (splitting and merging PDF files) has
+ been added to the :command:`qpdf` command-line
+ tool. See :ref:`ref.page-selection`.
+
+ - Options have been added to the :command:`qpdf`
+ command-line tool for copying encryption parameters from another
+ file. See :ref:`ref.basic-options`.
+
+ - New methods have been added to the ``QPDF`` object for adding and
+ removing pages. See :ref:`ref.adding-and-remove-pages`.
+
+ - New methods have been added to the ``QPDF`` object for copying
+ objects from other PDF files. See :ref:`ref.foreign-objects`
+
+ - A new method ``QPDFObjectHandle::parse`` has been added for
+ constructing ``QPDFObjectHandle`` objects from a string
+ description.
+
+ - Methods have been added to ``QPDFWriter`` to allow writing to an
+ already open stdio ``FILE*`` addition to writing to standard
+ output or a named file. Methods have been added to ``QPDF`` to be
+ able to process a file from an already open stdio ``FILE*``. This
+ makes it possible to read and write PDF from secure temporary
+ files that have been unlinked prior to being fully read or
+ written.
+
+ - The ``QPDF::emptyPDF`` can be used to allow creation of PDF files
+ from scratch. The example
+ :file:`examples/pdf-create.cc` illustrates how
+ it can be used.
+
+ - Several methods to take ``PointerHolder<Buffer>`` can now also
+ accept ``std::string`` arguments.
+
+ - Many new convenience methods have been added to the library, most
+ in ``QPDFObjectHandle``. See :file:`ChangeLog`
+ for a full list.
+
+ - When building on a platform that supports ELF shared libraries
+ (such as Linux), symbol versions are enabled by default. They can
+ be disabled by passing
+ :samp:`--disable-ld-version-script` to
+ :command:`./configure`.
+
+ - The file :file:`libqpdf.pc` is now installed
+ to support :command:`pkg-config`.
+
+ - Image comparison tests are off by default now since they are not
+ needed to verify a correct build or port of qpdf. They are needed
+ only when changing the actual PDF output generated by qpdf. You
+ should enable them if you are making deep changes to qpdf itself.
+ See :file:`README.md` for details.
+
+ - Large file tests are off by default but can be turned on with
+ :command:`./configure` or by setting an environment
+ variable before running the test suite. See
+ :file:`README.md` for details.
+
+ - When qpdf's test suite fails, failures are not printed to the
+ terminal anymore by default. Instead, find them in
+ :file:`build/qtest.log`. For packagers who are
+ building with an autobuilder, you can add the
+ :samp:`--enable-show-failed-test-output` option to
+ :command:`./configure` to restore the old behavior.
+
+2.3.1: December 28, 2011
+ - Fix thread-safety problem resulting from non-thread-safe use of
+ the PCRE library.
+
+ - Made a few minor documentation fixes.
+
+ - Add workaround for a bug that appears in some versions of
+ ghostscript to the test suite
+
+ - Fix minor build issue for Visual C++ 2010.
+
+2.3.0: August 11, 2011
+ - Bug fix: when preserving existing encryption on encrypted files
+ with cleartext metadata, older qpdf versions would generate
+ password-protected files with no valid password. This operation
+ now works. This bug only affected files created by copying
+ existing encryption parameters; explicit encryption with
+ specification of cleartext metadata worked before and continues to
+ work.
+
+ - Enhance ``QPDFWriter`` with a new constructor that allows you to
+ delay the specification of the output file. When using this
+ constructor, you may now call ``QPDFWriter::setOutputFilename`` to
+ specify the output file, or you may use
+ ``QPDFWriter::setOutputMemory`` to cause ``QPDFWriter`` to write
+ the resulting PDF file to a memory buffer. You may then use
+ ``QPDFWriter::getBuffer`` to retrieve the memory buffer.
+
+ - Add new API call ``QPDF::replaceObject`` for replacing objects by
+ object ID
+
+ - Add new API call ``QPDF::swapObjects`` for swapping two objects by
+ object ID
+
+ - Add ``QPDFObjectHandle::getDictAsMap`` and
+ ``QPDFObjectHandle::getArrayAsVector`` to allow retrieval of
+ dictionary objects as maps and array objects as vectors.
+
+ - Add functions ``qpdf_get_info_key`` and ``qpdf_set_info_key`` to
+ the C API for manipulating string fields of the document's
+ ``/Info`` dictionary.
+
+ - Add functions ``qpdf_init_write_memory``,
+ ``qpdf_get_buffer_length``, and ``qpdf_get_buffer`` to the C API
+ for writing PDF files to a memory buffer instead of a file.
+
+2.2.4: June 25, 2011
+ - Fix installation and compilation issues; no functionality changes.
+
+2.2.3: April 30, 2011
+ - Handle some damaged streams with incorrect characters following
+ the stream keyword.
+
+ - Improve handling of inline images when normalizing content
+ streams.
+
+ - Enhance error recovery to properly handle files that use object 0
+ as a regular object, which is specifically disallowed by the spec.
+
+2.2.2: October 4, 2010
+ - Add new function ``qpdf_read_memory`` to the C API to call
+ ``QPDF::processMemoryFile``. This was an omission in qpdf 2.2.1.
+
+2.2.1: October 1, 2010
+ - Add new method ``QPDF::setOutputStreams`` to replace ``std::cout``
+ and ``std::cerr`` with other streams for generation of diagnostic
+ messages and error messages. This can be useful for GUIs or other
+ applications that want to capture any output generated by the
+ library to present to the user in some other way. Note that QPDF
+ does not write to ``std::cout`` (or the specified output stream)
+ except where explicitly mentioned in
+ :file:`QPDF.hh`, and that the only use of the
+ error stream is for warnings. Note also that output of warnings is
+ suppressed when ``setSuppressWarnings(true)`` is called.
+
+ - Add new method ``QPDF::processMemoryFile`` for operating on PDF
+ files that are loaded into memory rather than in a file on disk.
+
+ - Give a warning but otherwise ignore empty PDF objects by treating
+ them as null. Empty object are not permitted by the PDF
+ specification but have been known to appear in some actual PDF
+ files.
+
+ - Handle inline image filter abbreviations when the appear as stream
+ filter abbreviations. The PDF specification does not allow use of
+ stream filter abbreviations in this way, but Adobe Reader and some
+ other PDF readers accept them since they sometimes appear
+ incorrectly in actual PDF files.
+
+ - Implement miscellaneous enhancements to ``PointerHolder`` and
+ ``Buffer`` to support other changes.
+
+2.2.0: August 14, 2010
+ - Add new methods to ``QPDFObjectHandle`` (``newStream`` and
+ ``replaceStreamData`` for creating new streams and replacing
+ stream data. This makes it possible to perform a wide range of
+ operations that were not previously possible.
+
+ - Add new helper method in ``QPDFObjectHandle``
+ (``addPageContents``) for appending or prepending new content
+ streams to a page. This method makes it possible to manipulate
+ content streams without having to be concerned whether a page's
+ contents are a single stream or an array of streams.
+
+ - Add new method in ``QPDFObjectHandle``: ``replaceOrRemoveKey``,
+ which replaces a dictionary key with a given value unless the
+ value is null, in which case it removes the key instead.
+
+ - Add new method in ``QPDFObjectHandle``: ``getRawStreamData``,
+ which returns the raw (unfiltered) stream data into a buffer. This
+ complements the ``getStreamData`` method, which returns the
+ filtered (uncompressed) stream data and can only be used when the
+ stream's data is filterable.
+
+ - Provide two new examples:
+ :command:`pdf-double-page-size` and
+ :command:`pdf-invert-images` that illustrate the
+ newly added interfaces.
+
+ - Fix a memory leak that would cause loss of a few bytes for every
+ object involved in a cycle of object references. Thanks to Jian Ma
+ for calling my attention to the leak.
+
+2.1.5: April 25, 2010
+ - Remove restriction of file identifier strings to 16 bytes. This
+ unnecessary restriction was preventing qpdf from being able to
+ encrypt or decrypt files with identifier strings that were not
+ exactly 16 bytes long. The specification imposes no such
+ restriction.
+
+2.1.4: April 18, 2010
+ - Apply the same padding calculation fix from version 2.1.2 to the
+ main cross reference stream as well.
+
+ - Since :command:`qpdf --check` only performs limited
+ checks, clarify the output to make it clear that there still may
+ be errors that qpdf can't check. This should make it less
+ surprising to people when another PDF reader is unable to read a
+ file that qpdf thinks is okay.
+
+2.1.3: March 27, 2010
+ - Fix bug that could cause a failure when rewriting PDF files that
+ contain object streams with unreferenced objects that in turn
+ reference indirect scalars.
+
+ - Don't complain about (invalid) AES streams that aren't a multiple
+ of 16 bytes. Instead, pad them before decrypting.
+
+2.1.2: January 24, 2010
+ - Fix bug in padding around first half cross reference stream in
+ linearized files. The bug could cause an assertion failure when
+ linearizing certain unlucky files.
+
+2.1.1: December 14, 2009
+ - No changes in functionality; insert missing include in an internal
+ library header file to support gcc 4.4, and update test suite to
+ ignore broken Adobe Reader installations.
+
+2.1: October 30, 2009
+ - This is the first version of qpdf to include Windows support. On
+ Windows, it is possible to build a DLL. Additionally, a partial
+ C-language API has been introduced, which makes it possible to
+ call qpdf functions from non-C++ environments. I am very grateful
+ to Žarko Gajić (http://zarko-gajic.iz.hr/) for tirelessly testing
+ numerous pre-release versions of this DLL and providing many
+ excellent suggestions on improving the interface.
+
+ For programming to the C interface, please see the header file
+ :file:`qpdf/qpdf-c.h` and the example
+ :file:`examples/pdf-linearize.c`.
+
+ - Žarko Gajić has written a Delphi wrapper for qpdf, which can be
+ downloaded from qpdf's download side. Žarko's Delphi wrapper is
+ released with the same licensing terms as qpdf itself and comes
+ with this disclaimer: "Delphi wrapper unit
+ :file:`qpdf.pas` created by Žarko Gajić
+ (http://zarko-gajic.iz.hr/). Use at your own risk and for whatever
+ purpose you want. No support is provided. Sample code is
+ provided."
+
+ - Support has been added for AES encryption and crypt filters.
+ Although qpdf does not presently support files that use PKI-based
+ encryption, with the addition of AES and crypt filters, qpdf is
+ now be able to open most encrypted files created with newer
+ versions of Acrobat or other PDF creation software. Note that I
+ have not been able to get very many files encrypted in this way,
+ so it's possible there could still be some cases that qpdf can't
+ handle. Please report them if you find them.
+
+ - Many error messages have been improved to include more information
+ in hopes of making qpdf a more useful tool for PDF experts to use
+ in manually recovering damaged PDF files.
+
+ - Attempt to avoid compressing metadata streams if possible. This is
+ consistent with other PDF creation applications.
+
+ - Provide new command-line options for AES encrypt, cleartext
+ metadata, and setting the minimum and forced PDF versions of
+ output files.
+
+ - Add additional methods to the ``QPDF`` object for querying the
+ document's permissions. Although qpdf does not enforce these
+ permissions, it does make them available so that applications that
+ use qpdf can enforce permissions.
+
+ - The :samp:`--check` option to
+ :command:`qpdf` has been extended to include some
+ additional information.
+
+ - *Non-compatible API changes:*
+
+ - QPDF's exception handling mechanism now uses
+ ``std::logic_error`` for internal errors and
+ ``std::runtime_error`` for runtime errors in favor of the now
+ removed ``QEXC`` classes used in previous versions. The ``QEXC``
+ exception classes predated the addition of the
+ :file:`<stdexcept>` header file to the C++ standard library.
+ Most of the exceptions thrown by the qpdf library itself are
+ still of type ``QPDFExc`` which is now derived from
+ ``std::runtime_error``. Programs that catch an instance of
+ ``std::exception`` and displayed it by calling the ``what()``
+ method will not need to be changed.
+
+ - The ``QPDFExc`` class now internally represents various fields
+ of the error condition and provides interfaces for querying
+ them. Among the fields is a numeric error code that can help
+ applications act differently on (a small number of) different
+ error conditions. See :file:`QPDFExc.hh` for details.
+
+ - Warnings can be retrieved from qpdf as instances of ``QPDFExc``
+ instead of strings.
+
+ - The nested ``QPDF::EncryptionData`` class's constructor takes an
+ additional argument. This class is primarily intended to be used
+ by ``QPDFWriter``. There's not really anything useful an
+ end-user application could do with it. It probably shouldn't
+ really be part of the public interface to begin with. Likewise,
+ some of the methods for computing internal encryption dictionary
+ parameters have changed to support ``/R=4`` encryption.
+
+ - The method ``QPDF::getUserPassword`` has been removed since it
+ didn't do what people would think it did. There are now two new
+ methods: ``QPDF::getPaddedUserPassword`` and
+ ``QPDF::getTrimmedUserPassword``. The first one does what the
+ old ``QPDF::getUserPassword`` method used to do, which is to
+ return the password with possible binary padding as specified by
+ the PDF specification. The second one returns a human-readable
+ password string.
+
+ - The enumerated types that used to be nested in ``QPDFWriter``
+ have moved to top-level enumerated types and are now defined in
+ the file :file:`qpdf/Constants.h`. This enables them to be
+ shared by both the C and C++ interfaces.
+
+2.0.6: May 3, 2009
+ - Do not attempt to uncompress streams that have decode parameters
+ we don't recognize. Earlier versions of qpdf would have rejected
+ files with such streams.
+
+2.0.5: March 10, 2009
+ - Improve error handling in the LZW decoder, and fix a small error
+ introduced in the previous version with regard to handling full
+ tables. The LZW decoder has been more strongly verified in this
+ release.
+
+2.0.4: February 21, 2009
+ - Include proper support for LZW streams encoded without the "early
+ code change" flag. Special thanks to Atom Smasher who reported the
+ problem and provided an input file compressed in this way, which I
+ did not previously have.
+
+ - Implement some improvements to file recovery logic.
+
+2.0.3: February 15, 2009
+ - Compile cleanly with gcc 4.4.
+
+ - Handle strings encoded as UTF-16BE properly.
+
+2.0.2: June 30, 2008
+ - Update test suite to work properly with a
+ non-:command:`bash`
+ :file:`/bin/sh` and with Perl 5.10. No changes
+ were made to the actual qpdf source code itself for this release.
+
+2.0.1: May 6, 2008
+ - No changes in functionality or interface. This release includes
+ fixes to the source code so that qpdf compiles properly and passes
+ its test suite on a broader range of platforms. See
+ :file:`ChangeLog` in the source distribution
+ for details.
+
+2.0: April 29, 2008
+ - First public release.
diff --git a/manual/weak-crypto.rst b/manual/weak-crypto.rst
new file mode 100644
index 00000000..8902f760
--- /dev/null
+++ b/manual/weak-crypto.rst
@@ -0,0 +1,33 @@
+.. _ref.weak-crypto:
+
+Weak Cryptography
+=================
+
+Start with version 10.4, qpdf is taking steps to reduce the likelihood
+of a user *accidentally* creating PDF files with insecure cryptography
+but will continue to allow creation of such files indefinitely with
+explicit acknowledgment.
+
+The PDF file format makes use of RC4, which is known to be a weak
+cryptography algorithm, and MD5, which is a weak hashing algorithm. In
+version 10.4, qpdf generates warnings for some (but not all) cases of
+writing files with weak cryptography when invoked from the command-line.
+These warnings can be suppressed using the
+:samp:`--allow-weak-crypto` option.
+
+It is planned for qpdf version 11 to be stricter, making it an error to
+write files with insecure cryptography from the command-line tool in
+most cases without specifying the
+:samp:`--allow-weak-crypto` flag and also to require
+explicit steps when using the C++ library to enable use of insecure
+cryptography.
+
+Note that qpdf must always retain support for weak cryptographic
+algorithms since this is required for reading older PDF files that use
+it. Additionally, qpdf will always retain the ability to create files
+using weak cryptographic algorithms since, as a development tool, qpdf
+explicitly supports creating older or deprecated types of PDF files
+since these are sometimes needed to test or work with older versions of
+software. Even if other cryptography libraries drop support for RC4 or
+MD5, qpdf can always fall back to its internal implementations of those
+algorithms, so they are not going to disappear from qpdf.