aboutsummaryrefslogtreecommitdiffstats
path: root/manual
diff options
context:
space:
mode:
authorJay Berkenbilt <ejb@ql.org>2022-05-30 22:38:17 +0200
committerJay Berkenbilt <ejb@ql.org>2022-05-31 02:03:08 +0200
commit0bd908b550603a6bcc399a825a170a1263378b22 (patch)
treed4cf0cdeafa09fb914e744b5708b5bc6e49b16f2 /manual
parentb7bbf12e85fa46e7971d84143d1597c992045af1 (diff)
downloadqpdf-0bd908b550603a6bcc399a825a170a1263378b22.tar.zst
Update documentation for qpdf JSON v2
Diffstat (limited to 'manual')
-rw-r--r--manual/cli.rst156
-rw-r--r--manual/design.rst10
-rw-r--r--manual/json.rst798
-rw-r--r--manual/library.rst14
-rw-r--r--manual/object-streams.rst2
-rw-r--r--manual/qdf.rst7
-rw-r--r--manual/qpdf-job.rst4
-rw-r--r--manual/release-notes.rst9
8 files changed, 804 insertions, 196 deletions
diff --git a/manual/cli.rst b/manual/cli.rst
index 194bfacd..c33a545f 100644
--- a/manual/cli.rst
+++ b/manual/cli.rst
@@ -171,7 +171,9 @@ Related Options
equivalent command-line arguments were supplied. It can be repeated
and mixed freely with other options. Run ``qpdf`` with
:qpdf:ref:`--job-json-help` for a description of the job JSON input
- file format. For more information, see :ref:`qpdf-job`.
+ file format. For more information, see :ref:`qpdf-job`. Note that
+ this is unrelated to :qpdf:ref:`--json` but may be combined with
+ it.
.. _exit-status:
@@ -341,6 +343,17 @@ Related Options
itself. The default provider is always listed first. See
:ref:`crypto` for more information about crypto providers.
+.. qpdf:option:: --job-json-help
+
+ .. help: show format of job JSON
+
+ Describe the format of the QPDFJob JSON input used by
+ --job-json-file.
+
+ Describe the format of the QPDFJob JSON input used by
+ :qpdf:ref:`--job-json-file`. For more information about QPDFJob,
+ see :ref:`qpdf-job`.
+
.. _general-options:
General Options
@@ -852,9 +865,11 @@ Related Options
When uncompressing streams, control which types of compression
schemes should be uncompressed:
- - none: don't uncompress anything. This is the default with --json-output.
+ - none: don't uncompress anything. This is the default with
+ --json-output.
- generalized: uncompress streams compressed with a
- general-purpose compression algorithm. This is the default.
+ general-purpose compression algorithm. This is the default
+ except when --json-output is given.
- specialized: in addition to generalized, also uncompress
streams compressed with a special-purpose but non-lossy
compression scheme
@@ -875,7 +890,8 @@ Related Options
``/ASCII85Decode``, and ``/ASCIIHexDecode``. We define
generalized filters as those to be used for general-purpose
compression or encoding, as opposed to filters specifically
- designed for image data. This is the default.
+ designed for image data. This is the default except when
+ :qpdf:ref:`--json-output` is given.
- :samp:`specialized`: in addition to generalized, decode streams
with supported non-lossy specialized filters; currently this is
@@ -3126,8 +3142,9 @@ Related Options
is usually but not always equal to the file name and is needed by
some of the other options. See also :ref:`attachments`. Note that
this option displays dates in PDF timestamp syntax. When attachment
- information is included in json output (see :ref:`--json`), dates
- are shown in ISO-8601 format.
+ information is included in json output in the ``"attachments"`` key
+ (see :ref:`--json`), dates are shown (just within that object) in
+ ISO-8601 format.
.. qpdf:option:: --show-attachment=key
@@ -3169,14 +3186,11 @@ Related Options
Generate a JSON representation of the file. This is described in
depth in :ref:`json`. The version parameter can be used to specify
- which version of the qpdf JSON format should be output. The only
- supported value is ``1``, but it's possible that a new JSON output
- version will be added in a future version. You can also specify
- ``latest`` to use the latest JSON version. For backward
- compatibility, the default value will remain ``1`` until qpdf
- version 11, after which point it will become ``latest``. In all
- case, you can tell what version of the JSON output you have from
- the ``"version"`` key in the output. Use the
+ which version of the qpdf JSON format should be output. The version
+ number be a number or ``latest``. The default is ``latest``. As of
+ qpdf 11, the latest version is ``2``. If you have code that reads
+ qpdf JSON output, you can tell what version of the JSON output you
+ have from the ``"version"`` key in the output. Use the
:qpdf:ref:`--json-help` option to get a description of the JSON
object.
@@ -3189,11 +3203,11 @@ Related Options
containing descriptive text.
Describe the format of the JSON output by writing to standard
- output a JSON object with the same structure with the same keys as
- the JSON generated by qpdf. In the output written by
- ``--json-help``, each key's value is a description of the key. The
- specific contract guaranteed by qpdf in its JSON representation is
- explained in more detail in the :ref:`json`.
+ output a JSON object with the same structure as the JSON generated
+ by qpdf. In the output written by ``--json-help``, each key's value
+ is a description of the key. The specific contract guaranteed by
+ qpdf in its JSON representation is explained in more detail in the
+ :ref:`json`.
.. qpdf:option:: --json-key=key
@@ -3216,53 +3230,50 @@ Related Options
be shown in the "objects" key of the JSON output. Otherwise, all
objects will be shown.
- This option is repeatable. If given, only specified objects will
- be shown in the "``objects``" key of the JSON output. Otherwise, all
- objects will be shown.
-
-.. qpdf:option:: --job-json-help
-
- .. help: show format of job JSON
-
- Describe the format of the QPDFJob JSON input used by
- --job-json-file.
-
- Describe the format of the QPDFJob JSON input used by
- :qpdf:ref:`--job-json-file`. For more information about QPDFJob,
- see :ref:`qpdf-job`.
+ This option is repeatable. If given, only specified objects will be
+ shown in the ``"objects"`` key of the JSON output. Otherwise, all
+ objects will be shown. For qpdf JSON version 1, this also affects
+ the ``"objectinfo"`` key, which is not present in version 2. This
+ option may be used with :qpdf:ref:`--json` and also with
+ :qpdf:ref:`--json-output`.
.. qpdf:option:: --json-stream-data={none|inline|file}
.. help: how to handle streams in json output
- Control whether streams in json output should be omitted,
- written inline (base64-encoded) or written to a file. If "file"
- is chosen, the file will be the name of the input file appended
- with -nnn where nnn is the object number. The prefix can be
- overridden with --json-stream-prefix.
-
- Control whether streams in json output should be omitted, written
- inline (base64-encoded) or written to a file. If ``file`` is
- chosen, the file will be the name of the input file appended with
- :samp:`-{nnn}` where :samp:`{nnn}` is the object number. The prefix
- can be overridden with :qpdf:ref:`--json-stream-prefix`. This
- option only applies when used with :qpdf:ref:`--json-output`.
+ When used with --json-output, this option controls whether
+ streams in json output should be omitted, written inline
+ (base64-encoded) or written to a file. If "file" is chosen, the
+ file will be the name of the output file appended with -nnn where
+ nnn is the object number. The prefix can be overridden with
+ --json-stream-prefix.
+
+ When used with :qpdf:ref:`--json-output`, this option controls
+ whether streams in JSON output should be omitted, written inline
+ (base64-encoded) or written to a file. If ``file`` is chosen, the
+ file will be the name of the output file appended with
+ :samp:`-{nnn}` where :samp:`{nnn}` is the object number. The stream
+ data file prefix can be overridden with
+ :qpdf:ref:`--json-stream-prefix`. This option only applies when
+ used with :qpdf:ref:`--json-output`.
.. qpdf:option:: --json-stream-prefix=file-prefix
.. help: prefix for json stream data files
- When --json-stream-data=file is given, override the input file
- name as the prefix for stream data files. Whatever is given here
+ When used with --json-output, --json-stream-data=file-prefix
+ sets the prefix for stream data files, overriding the default,
+ which is to use the output file name. Whatever is given here
will be appended with -nnn to create the name of the file that
will contain the data for the stream stream in object nnn.
- When :qpdf:ref:`--json-stream-data` is given with the value
- ``file``, override the input file name as the prefix for stream
- data files. Whatever is given here will be appended with
- :samp:`-{nnn}` to create the name of the file that will contain the
- data for the stream stream in object :samp:`{nnn}`. This
- option only applies when used with :qpdf:ref:`--json-output`.
+ When used with :qpdf:ref:`--json-output`,
+ ``--json-stream-data=file-prefix`` sets the prefix for stream data
+ files, overriding the default, which is to use the output file
+ name. Whatever is given here will be appended with :samp:`-{nnn}`
+ to create the name of the file that will contain the data for the
+ stream stream in object :samp:`{nnn}`. This option only applies
+ when used with :qpdf:ref:`--json-output`.
.. qpdf:option:: --json-output[=version]
@@ -3270,44 +3281,45 @@ Related Options
The output file will be qpdf JSON format at the given version.
"version" may be a specific version or "latest" (the default).
- Version 1 is not supported. See also --json-stream-data,
+ The only supported version is 2. See also --json-stream-data,
--json-stream-prefix, and --decode-level.
- The output file will be qpdf JSON format at the given version.
- ``version`` may be a specific version or ``latest`` (the default).
- Version 1 is not supported. See also :qpdf:ref:`--json-stream-data`
- and :qpdf:ref:`--json-stream-prefix`. The default decode level is
- ``none``, but you can override it with :qpdf:ref:`--decode-level`.
- If you want to look at the contents of streams easily as you would
- in QDF mode (see :ref:`qdf`), you can use
- ``--decode-level=generalized`` and ``--json-stream-data=file`` for
- a convenient way to do that.
+ The output file, instead of being a PDF file, will be a JSON file
+ in qpdf JSON format at the given version. ``version`` may be a
+ specific version or ``latest`` (the default). The only supported
+ version is 2. See also :qpdf:ref:`--json-stream-data` and
+ :qpdf:ref:`--json-stream-prefix`. When this option is specified,
+ the default decode level for stream data is ``none``, but you can
+ override it with :qpdf:ref:`--decode-level`. If you want to look at
+ the contents of streams easily as you would in QDF mode (see
+ :ref:`qdf`), you can use ``--decode-level=generalized`` and
+ ``--json-stream-data=file`` for a convenient way to do that.
.. qpdf:option:: --json-input
.. help: input file is qpdf JSON
Treat the input file as a JSON file in qpdf JSON format as
- written by qpdf --json-output. See the "QPDF JSON Format"
+ written by qpdf --json-output. See the "qpdf JSON Format"
section of the manual for information about how to use this
option.
Treat the input file as a JSON file in qpdf JSON format as written
by ``qpdf --json-output``. The input file must be complete and
include all stream data. For information about converting between
- PDF and JSON, please see :ref:`qpdf-json`.
+ PDF and JSON, please see :ref:`json`.
.. qpdf:option:: --update-from-json=qpdf-json-file
.. help: update a PDF from qpdf JSON
- Update a PDF file from a JSON file. Please see the "QPDF JSON
- Format" section of the manual for information about how to use
- this option.
+ Update a PDF file from a JSON file. Please see the "qpdf JSON"
+ chapter of the manual for information about how to use this
+ option.
- This option updates a PDF file from a qpdf JSON file. For a
- information about how to use this option, please see
- :ref:`qpdf-json`.
+ This option updates a PDF file from the specified qpdf JSON file.
+ For a information about how to use this option, please see
+ :ref:`json`.
.. _test-options:
@@ -3420,7 +3432,7 @@ Related Options
This is used by qpdf's test suite to check consistency between the
output of ``qpdf --json`` and the output of ``qpdf --json-help``.
- This option causes an extra copy of the generated json to appear in
+ This option causes an extra copy of the generated JSON to appear in
memory and is therefore unsuitable for use with large files. This
is why it's also not on by default.
diff --git a/manual/design.rst b/manual/design.rst
index 5014ce2f..e0b87bf2 100644
--- a/manual/design.rst
+++ b/manual/design.rst
@@ -242,7 +242,7 @@ the current file position. If the token is a not either a dictionary or
array opener, an object is immediately constructed from the single token
and the parser returns. Otherwise, the parser iterates in a special mode
in which it accumulates objects until it finds a balancing closer.
-During this process, the "``R``" keyword is recognized and an indirect
+During this process, the ``R`` keyword is recognized and an indirect
``QPDFObjectHandle`` may be constructed.
The ``QPDF::resolve()`` method, which is used to resolve an indirect
@@ -280,15 +280,15 @@ file.
it is looking before the last ``%%EOF``. After getting to ``trailer``
keyword, it invokes the parser.
-- The parser sees "``<<``", so it calls itself recursively in
+- The parser sees ``<<``, so it calls itself recursively in
dictionary creation mode.
- In dictionary creation mode, the parser keeps accumulating objects
- until it encounters "``>>``". Each object that is read is pushed onto
- a stack. If "``R``" is read, the last two objects on the stack are
+ until it encounters ``>>``. Each object that is read is pushed onto
+ a stack. If ``R`` is read, the last two objects on the stack are
inspected. If they are integers, they are popped off the stack and
their values are used to construct an indirect object handle which is
- then pushed onto the stack. When "``>>``" is finally read, the stack
+ then pushed onto the stack. When ``>>`` is finally read, the stack
is converted into a ``QPDF_Dictionary`` which is placed in a
``QPDFObjectHandle`` and returned.
diff --git a/manual/json.rst b/manual/json.rst
index 9df75168..52a6aedc 100644
--- a/manual/json.rst
+++ b/manual/json.rst
@@ -1,6 +1,9 @@
+.. cSpell:ignore moddifyannotations
+.. cSpell:ignore feff
+
.. _json:
-QPDF JSON
+qpdf JSON
=========
.. _json-overview:
@@ -8,27 +11,540 @@ QPDF JSON
Overview
--------
-Beginning with qpdf version 8.3.0, the :command:`qpdf`
-command-line program can produce a JSON representation of the
-non-content data in a PDF file. It includes a dump in JSON format of all
-objects in the PDF file excluding the content of streams. This JSON
-representation makes it very easy to look in detail at the structure of
-a given PDF file, and it also provides a great way to work with PDF
-files programmatically from the command-line in languages that can't
-call or link with the qpdf library directly. Note that stream data can
-be extracted from PDF files using other qpdf command-line options.
+Beginning with qpdf version 11.0.0, the qpdf library and command-line
+program can produce a JSON representation of the in a PDF file. qpdf
+version 11 introduces JSON format version 2. Prior to qpdf 11,
+versions 8.3.0 onward had a more limited JSON representation
+accessible only from the command-line. For details on what changed,
+see :ref:`json-v2-changes`. The rest of this chapter documents qpdf
+JSON version 2.
+
+Please note: this chapter discusses *qpdf JSON format*, which
+represents the contents of a PDF file. This is distinct from the
+*QPDFJob JSON format* which provides a higher-level interface
+interacting with qpdf the way the command-line tool does. For
+information about that, see :ref:`qpdf-job`.
+
+The qpdf JSON format is specific to qpdf. There are two ways to use
+qpdf JSON:
+
+- The :qpdf:ref:`--json` command-ine flag causes creation of a JSON
+ representation of all the objects in a PDF file, excluding stream
+ data. This includes an unambiguous representation of the PDF object
+ structure and also provides JSON-formatted summaries of other
+ information about the file. This functionality is built into
+ ``QPDFJob`` and can be accessed from the ``qpdf`` command-line tool
+ or from the ``QPDFJob`` C or C++ API.
+
+- qpdf can create a JSON file that completely represents a PDF file.
+ You can think of this as using JSON as an *alternative syntax* for
+ representing a PDF file. Using qpdf JSON, it is possible to
+ convert a PDF file to JSON, manipulate the structure or contents of
+ the objects at a low level, and convert the results back to a PDF
+ file. This functionality can be accessed from the command-line with
+ the :qpdf:ref:`--json-output`, :qpdf:ref:`--json-input`, and
+ :qpdf:ref:`--update-from-json` flags, or from the API using the
+ ``QPDF::writeJSON``, ``QPDF::createFromJSON``, and
+ ``QPDF::updateFromJSON`` methods.
+
+.. _json-terminology:
+
+JSON Terminology
+----------------
+
+Notes about terminology:
+
+- In JavaScript and JSON, that thing that has keys and values is
+ typically called an *object*.
+
+- In PDF, that thing that has keys and values is typically called a
+ *dictionary*. An *object* is a PDF object such as integer, real,
+ boolean, null, string, array, dictionary, or stream.
+
+- Some languages that use JSON call an *object* a *dictionary*, a
+ *map*, or a *hash*.
+
+- Sometimes, it's called on *object* if it has fixed keys and a
+ *dictionary* if it has variable keys.
+
+This manual is not entirely consistent about its use of *dictionary*
+vs. *object* because sometimes one term or another is clearer in
+context. Just be aware of the ambiguity when reading the manual. We
+frequently use the term *dictionary* to refer to a JSON object because
+of the consistency with PDF terminology.
+
+.. _what-qpdf-json-is-not:
+
+What qpdf JSON is not
+---------------------
+
+Please note that qpdf JSON offers a convenient syntax for manipulating
+PDF files at a low level using JSON syntax. JSON syntax is much easier
+to work with than native PDF syntax, and there are good JSON libraries
+in virtually every commonly used programming language. Working with
+PDF objects in JSON removes the need to worry about stream lengths,
+cross reference tables, and PDF-specific representations of Unicode or
+binary strings that appear outside of content streams. It does not
+eliminate the need to understand the semantic structure of PDF files.
+Working with qpdf JSON still requires familiarity with the PDF
+specification.
+
+In particular, qpdf JSON *does not* provide any of the following
+capabilities:
+
+- Text extraction. While you could use qpdf JSON syntax to navigate to
+ a page's content streams and font structures, text within pages is
+ still encoded using PDF syntax within content streams, and there is
+ no assistance for text extraction.
+
+- Reflowing text, document structure. qpdf JSON does not add any new
+ information or insight into the content of PDF files. If you have a
+ PDF file that lacks any structural information, qpdf JSON won't help
+ you solve any of those problems.
+
+This is what we mean when we say that JSON provides a *alternative
+syntax* for working with PDF data. Semantically, it is identical to
+native PDF.
.. _qpdf-json:
-QPDF JSON Format
+qpdf JSON Format
----------------
-XXX Write this.
+This section describes how qpdf represents PDF objects in JSON format.
+It also describes how to work with qpdf JSON to create or
+modify PDF files.
+
+.. _json.objects:
+
+qpdf JSON Object Representation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This section describes the representation of PDF objects in qpdf JSON
+version 2. PDF objects are represented within the ``"objects"``
+dictionary of a qpdf JSON file. This is true both for PDF serialized
+to JSON (:qpdf:ref:`--json-output`, ``QPDF::writeJSON``) or objects as
+they appear in the output of ``qpdf`` with the :qpdf:ref:`--json`
+option.
+
+Each key in the ``"objects"`` dictionary is either ``"trailer"`` or a
+string of the form ``"obj:O G R"`` where ``O`` and ``G`` are the
+object and generation numbers and ``R`` is the literal string ``R``.
+This is the PDF syntax for the indirect object reference prepended by
+``obj:``. The value, representing the object itself, is a JSON object
+whose structure is described below.
+
+Top-level Stream Objects
+ Stream objects are represented as a JSON object with the single key
+ ``"stream"``. The stream object has a key called ``"dict"`` whose
+ value is the stream dictionary as an object value (described below)
+ with the ``"/Length"`` key omitted. Other keys are determined by the
+ value for json stream data (:qpdf:ref:`--json-stream-data`, or a
+ parameter of type ``qpdf_json_stream_data_e``) as follows:
+
+ - ``none``: stream data is not represented; no other keys are
+ present
+
+ - ``inline``: the stream data appears as a base64-encoded string as
+ the value of the ``"data"`` key
+
+ - ``file``: the stream data is written to a file, and the path to
+ the file is stored in the ``"datafile"`` key. A relative path is
+ interpreted as relative to the current directory when qpdf is
+ invoked.
+
+ Keys other than ``"dict"``, ``"data"``, and ``"datafile"`` are
+ ignored. This is primarily for future compatibility in case a newer
+ version of qpdf includes additional information.
+
+ As with the native PDF representation, the stream data must be
+ consistent with whatever filters and decode parameters are specified
+ in the stream dictionary.
+
+Top-level Non-stream Objects
+ Non-stream objects are represented as a dictionary with the single
+ key ``"value"``. Other keys are ignored for future compatibility.
+ The value's structure is described in "Object Values" below.
+
+ Note: in files that use object streams, the trailer "dictionary" is
+ actually a stream, but in the JSON representation, the value of the
+ ``"trailer"`` key is always written as a dictionary (with a
+ ``"value"`` key like other non-stream objects). There will also be a
+ a stream object whose key is the object ID of the cross-reference
+ stream, even though this stream will generally be unreferenced. This
+ makes it possible to assume ``"trailer"`` points to a dictionary
+ without having to consider whether the file uses object streams or
+ not. It is also consistent with how ``QPDF::getTrailer`` behaves in
+ the C++ API.
+
+Object Values
+ Within ``"value"`` or ``"stream"."dict"``, PDF objects are
+ represented as follows:
+
+ - Objects of type Boolean or null are represented as JSON objects of
+ the same type.
+
+ - Objects that are numeric are represented as numeric in the JSON
+ without regard to precision. Internally, qpdf stores numeric
+ values as strings, so qpdf will preserve arbitrary precision
+ numerical values when reading and writing JSON. It is likely that
+ other JSON readers and writers will have implementation-dependent
+ ways of handling numerical values that are out of range.
+
+ - Name objects are represented as JSON strings that start with ``/``
+ and are followed by the PDF name in canonical form with all PDF
+ syntax resolved. For example, the name whose canonical form (per
+ the PDF specification) is ``text/plain`` would be represented in
+ JSON as ``"/text/plain"`` and in PDF as ``"/text#2fplain"``.
+
+ - Indirect object references are represented as JSON strings that
+ look like a PDF indirect object reference and have the form ``"O G
+ R"`` where ``O`` and ``G`` are the object and generation numbers
+ and ``R`` is the literal string ``R``. For example, ``"3 0 R"``
+ would represent a reference to the object with object ID 3 and
+ generation 0.
+
+ - PDF strings are represented as JSON strings in one of two ways:
+
+ - ``"u:utf8-encoded-string"``: this format is used when the PDF
+ string can be unambiguously represented as a Unicode string and
+ contains no unprintable characters. This is the case whether the
+ input string is encoded as UTF-16, UTF-8 (as allowed by PDF
+ 2.0), or PDF doc encoding. Strings are only represented this way
+ if they can be encoded without loss of information.
+
+ - ``"b:hex-string"``: this format is used to represent any binary
+ string value that can't be represented as a Unicode string.
+ ``hex-string`` must have an even number of characters that range
+ from ``a`` through ``f``, ``A`` through ``F``, or ``0`` through
+ ``9``.
+
+ qpdf writes empty strings as ``"u:"``, but both ``"b:"`` and
+ ``"u:"`` are valid representations of the empty string.
+
+ There is full support for UTF-16 surrogate pairs. Binary strings
+ encoded with ``"b:..."`` are the internal PDF representations.
+ As such, the following are equivalent:
+
+ - ``"u:\ud83e\udd54"`` -- representation of U+1F954 as a surrogate
+ pair in JSON syntax
+
+ - ``"b:FEFFD83EDD54"`` -- representation of U+1F954 as the bytes
+ of a UTF-16 string in PDF syntax with the leading ``FEFF``
+ indicating UTF-16
+
+ - ``"b:efbbbff09fa594"`` -- representation of U+1F954 as the
+ bytes of a UTF-8 string in PDF syntax (as allowed by PDF 2.0)
+ with the leading ``EF``, ``BB``, ``BF`` sequence (which is just
+ UTF-8 encoding of ``FEFF``).
+
+ - A JSON string whose contents are ``u:`` followed by the UTF-8
+ representation of U+1F954. This is the potato emoji.
+ Unfortunately, I am not able to render it in the PDF version
+ of this manual.
+
+ - PDF arrays are represented as JSON arrays of objects as described
+ above
+
+ - PDF dictionaries are represented as JSON objects whose keys are
+ the string representations of names and whose values are
+ representations of PDF objects.
+
+.. _json.output:
+
+qpdf JSON Output
+~~~~~~~~~~~~~~~~
+
+The format of the JSON written by qpdf's :qpdf:ref:`--json-output`
+flag or the ``QPDF::writeJSON`` API call is a JSON object consisting
+of a single key: ``"qpdf-v2"``. Any other top-level keys are ignored.
+While unknown keys in other places are ignored for future
+compatibility, in this case, ignoring other top-level keys is an
+explicit decision to allow users to include other keys for their own
+use. No new top-level keys will be added in JSON version 2.
+
+The ``"qpdf-v2"`` key points to a JSON object with the following keys:
+
+- ``"pdfversion"`` -- a string containing PDF version as indicated in
+ the PDF header (e.g. ``"1.7"``, ``"2.0"``)
+
+- ``"maxobjectid"`` -- a number indicating the object ID of the
+ highest numbered object in the file. This is provided to make it
+ easier for software that wants to add new objects to the file as you
+ can safely start with one above that number when creating new
+ objects. Note that the value of ``"maxobjectid"`` may be higher than
+ the actual maximum object that appears in the input PDF since it
+ takes into consideration any dangling indirect object references
+ from the original file. This prevents you from unwittingly creating
+ an object that doesn't exist but that is referenced, which may have
+ unintended side effects. (The PDF specification explicitly allows
+ dangling references and says to treat them as nulls. This can happen
+ if objects are removed from a PDF file.)
+
+- ``"objects"`` -- the actual PDF objects as described in
+ :ref:`json.objects`.
+
+Note that writing JSON output is done by ``QPDF``, not ``QPDFWriter``.
+As such, none of the things ``QPDFWriter`` does apply. This includes
+recompression of streams, renumbering of objects, anything to do with
+object streams (which are not represented by qpdf JSON at all since
+they are PDF syntax, not semantics), encryption, decryption,
+linearization, QDF mode, etc.
+
+.. _json.example:
+
+qpdf JSON Example
+~~~~~~~~~~~~~~~~~
+
+The JSON below shows an example of a simple PDF file represented in
+qpdf JSON format.
+
+.. code-block:: json
+
+ {
+ "qpdf-v2": {
+ "pdfversion": "1.3",
+ "maxobjectid": 5,
+ "objects": {
+ "obj:1 0 R": {
+ "value": {
+ "/Pages": "2 0 R",
+ "/Type": "/Catalog"
+ }
+ },
+ "obj:2 0 R": {
+ "value": {
+ "/Count": 1,
+ "/Kids": [ "3 0 R" ],
+ "/Type": "/Pages"
+ }
+ },
+ "obj:3 0 R": {
+ "value": {
+ "/Contents": "4 0 R",
+ "/MediaBox": [ 0, 0, 612, 792 ],
+ "/Parent": "2 0 R",
+ "/Resources": {
+ "/Font": {
+ "/F1": "5 0 R"
+ }
+ },
+ "/Type": "/Page"
+ }
+ },
+ "obj:4 0 R": {
+ "stream": {
+ "data": "eJxzCuFSUNB3M1QwMlEISQOyzY2AyEAhJAXI1gjIL0ksyddUCMnicg3hAgDLAQnI",
+ "dict": {
+ "/Filter": "/FlateDecode"
+ }
+ }
+ },
+ "obj:5 0 R": {
+ "value": {
+ "/BaseFont": "/Helvetica",
+ "/Encoding": "/WinAnsiEncoding",
+ "/Subtype": "/Type1",
+ "/Type": "/Font"
+ }
+ },
+ "trailer": {
+ "value": {
+ "/ID": [
+ "b:98b5a26966fba4d3a769b715b2558da6",
+ "b:98b5a26966fba4d3a769b715b2558da6"
+ ],
+ "/Root": "1 0 R",
+ "/Size": 6
+ }
+ }
+ }
+ }
+ }
+
+.. _json.input:
+
+qpdf JSON Input
+~~~~~~~~~~~~~~~
+
+Output in the JSON output format described in :ref:`json.output` can
+be used in two different ways:
+
+- By using the :qpdf:ref:`--json-input` flag or calling
+ ``QPDF::createFromJSON`` in place of ``QPDF::processFile``, a qpdf
+ JSON file can be used in place of a PDF file as the input to qpdf.
+
+- By using the :qpdf:ref:`--update-from-json` flag or calling
+ ``QPDF::updateFromJSON`` on an initialized ``QPDF`` object, a qpdf
+ JSON file can be used to apply changes to an existing ``QPDF``
+ object. That ``QPDF`` object can have come from any source including
+ a PDF file, a qpdf JSON file, or the result of any other process
+ that results in a valid, initialized ``QPDF`` object.
+
+Here are some important things to know about qpdf JSON input.
+
+- When a qpdf JSON file is used as the primary input file, it must be
+ complete. This means
+
+ - A PDF version number must be specified with the ``"pdfversion"``
+ key
+
+ - Stream data must be present for all streams
+
+ - The trailer dictionary must be present, though only the
+ ``"/Root"`` key is required.
+
+- Certain fields from the input are ignored whether creating or
+ updating from a JSON file:
+
+ - ``"maxobjectid"`` is ignored, so it is not necessary to update it
+ when adding new objects.
+
+ - ``"/Length"`` is ignored in all stream dictionaries. qpdf doesn't
+ put it there when it creates JSON output, and it is not necessary
+ to add it.
+
+ - ``"/Size"`` is ignored if it appears in a trailer dictionary as
+ that is always recomputed by ``QPDFWriter``.
+
+ - Unknown keys at the to top level of the file, within ``objects``,
+ at the top level of each individual object (inside the object that
+ has the ``"value"`` or ``"stream"`` key) and directly within
+ ``"stream"`` are ignored for future compatibility. You should
+ avoid putting your own values in those places if you wish to avoid
+ risking that your JSON files will not work in future versions of
+ qpdf. The exception to this advice is at the top level of the
+ overall file where it is explicitly supported for you to add your
+ own keys. For example, you could add your own metadata at the top
+ level, and qpdf will ignore it. Note that extra top-level keys are
+ not preserved when qpdf reads your JSON file.
+
+- When qpdf reads a PDF file, the internal object numbers are always
+ preserved. However, when qpdf writes a file using ``QPDFWriter``,
+ ``QPDFWriter`` does its own numbering and, in general, does not
+ preserve input object numbers. That means that a qpdf JSON file that
+ is used to update an existing PDF must have object numbers that
+ match the input file it is modifying. In practical terms, this means
+ that you can't use a JSON file created from one PDF file to modify
+ the *output of running qpdf on that file*.
+
+ To put this more concretely, the following is valid:
+
+ ::
+
+ qpdf --json-output in.pdf pdf.json
+ # edit pdf.json
+ qpdf in.pdf out.pdf --update-from-json=pdf.json
+
+ The following will not produce predictable results because
+ ``out.pdf`` won't have the same object numbers as ``pdf.json`` and
+ ``in.pdf``.
+
+ ::
+
+ qpdf --json-output in.pdf pdf.json
+ # edit pdf.json
+ qpdf in.pdf out.pdf --update-from-json=pdf.json
+ # edit pdf.json again
+ # Don't do this
+ qpdf out.pdf out2.pdf --update-from-json=pdf.json
+
+- When updating from a JSON file (:qpdf:ref:`--update-from-json`,
+ ``QPDF::updateFromJSON``), existing objects are updated in place.
+ This has the following implications:
+
+ - You may omit both ``"data"`` and ``"datafile"`` if the object you
+ are updating is already a stream. In that case the original stream
+ data is preserved. You must always provide a stream dictionary,
+ but it may be empty. Note that an empty stream dictionary will
+ clear the old dictionary. There is no way to indicate that an old
+ stream dictionary should be left alone, so if your intention is to
+ replace the stream data and preserve the dictionary, the
+ original dictionary must appear in the JSON file.
+
+ - You can change one object type to another object type including
+ replacing a stream with a non-stream or a non-stream with a
+ stream. If you replace a non-stream with a stream, you must
+ provide data for the stream.
+
+ - Objects that you do not wish to modify can be omitted from the
+ JSON. That includes the trailer. That means you can use the output
+ of a qpdf JSON file that was written using
+ :qpdf:ref:`--json-object` to have it include only the objects you
+ intend to modify.
+
+ - You can omit the ``"pdfversion"`` key. The input PDF version will
+ be preserved.
+
+.. _json.workflow-cli:
+
+qpdf JSON Workflow: CLI
+~~~~~~~~~~~~~~~~~~~~~~~
+
+This section includes a few examples of using qpdf JSON.
+
+- Convert a PDF file to JSON format, edit the JSON, and convert back
+ to PDF. This is an alternative to using QDF mode (see :ref:`qdf`) to
+ modify PDF files in a text editor. Each method has its own
+ advantages and disadvantages.
+
+ ::
+
+ qpdf --json-output in.pdf pdf.json
+ # edit pdf.json
+ qpdf --json-input pdf.json out.pdf
+
+- Extract only a specific object into a JSON file, modify the object
+ in JSON, and use the modified object to update the original PDF. In
+ this case, we're editing object 4, whatever that may happen to be.
+ You would have to know through some other means which object you
+ wanted to edit, such as by looking at other JSON output or using a
+ tool (possibly but not necessarily qpdf) to identify the object.
+
+ ::
+
+ qpdf --json-output in.pdf pdf.json --json-object=4,0
+ # edit pdf.json
+ qpdf in.pdf --update-from-json=pdf.json out.pdf
+
+ Rather than using :qpdf:ref:`--json-object` as in the above example,
+ you could edit the JSON file to remove the objects you didn't need.
+ You could also just leave them there, though the update process
+ would be slower.
+
+ You could also add new objects to a file by adding them to
+ ``pdf.json``. Just be sure the object number doesn't conflict with
+ an existing object. The ``"maxobjectid"`` field in the original
+ output can help with this. You don't have to update it if you add
+ objects as it is ignored when the file is read back in.
+
+- Use :qpdf:ref:`--json-input` and :qpdf:ref:`--json-output` together
+ to demonstrate preservation of object numbers. In this example,
+ ``a.json`` and ``b.json`` will have the same objects and object
+ numbers. The files may not be identical since strings may be
+ normalized, fields may appear in a different order, etc. However
+ ``b.json`` and ``c.json`` are probably identical.
+
+ ::
+
+ qpdf --json-output in.pdf a.json
+ qpdf --json-input --json-output a.json b.json
+ qpdf --json-input --json-output b.json c.json
+
+
+.. _json.workflow-api:
+
+qpdf JSON Workflow: API
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Everything that can be done using the qpdf CLI can be done using the
+C++ API. See comments in :file:`QPDF.hh` for ``writeJSON``,
+``createFromJSON``, and ``updateFromJSON`` for details.
.. _json-guarantees:
-JSON Guarantees
----------------
+JSON Compatibility Guarantees
+-----------------------------
The qpdf JSON representation includes a JSON serialization of the raw
objects in the PDF file as well as some computed information in a more
@@ -37,24 +553,23 @@ format. These guarantees are designed to simplify the experience of a
developer working with the JSON format.
Compatibility
- The top-level JSON object output is a dictionary. The JSON output
- contains various nested dictionaries and arrays. With the exception
- of dictionaries that are populated by the fields of objects from the
- file, all instances of a dictionary are guaranteed to have exactly
- the same keys. Future versions of qpdf are free to add additional
- keys but not to remove keys or change the type of object that a key
- points to. The qpdf program validates this guarantee, and in the
- unlikely event that a bug in qpdf should cause it to generate data
- that doesn't conform to this rule, it will ask you to file a bug
- report.
-
- The top-level JSON structure contains a "``version``" key whose value
- is simple integer. The value of the ``version`` key will be
+ The top-level JSON object is a dictionary (JSON "object"). The JSON
+ output contains various nested dictionaries and arrays. With the
+ exception of dictionaries that are populated by the fields of
+ PDF objects from the file, all instances of a dictionary are
+ guaranteed to have exactly the same keys.
+
+ The top-level JSON structure contains a ``"version"`` key whose
+ value is simple integer. The value of the ``version`` key will be
incremented if a non-compatible change is made. A non-compatible
change would be any change that involves removal of a key, a change
- to the format of data pointed to by a key, or a semantic change that
- requires a different interpretation of a previously existing key. A
- strong effort will be made to avoid breaking compatibility.
+ to the format of data pointed to by a key, or a semantic change
+ that requires a different interpretation of a previously existing
+ key.
+
+ With a specific qpdf JSON version, future versions of qpdf are free
+ to add additional keys but not to remove keys or change the type of
+ object that a key points to.
Documentation
The :command:`qpdf` command can be invoked with the
@@ -66,28 +581,29 @@ Documentation
- A dictionary in the help output means that the corresponding
location in the actual JSON output is also a dictionary with
- exactly the same keys; that is, no keys present in help are absent
- in the real output, and no keys will be present in the real output
- that are not in help. As a special case, if the dictionary has a
- single key whose name starts with ``<`` and ends with ``>``, it
- means that the JSON output is a dictionary that can have any keys,
- each of which conforms to the value of the special key. This is
- used for cases in which the keys of the dictionary are things like
- object IDs.
+ exactly the same keys; that is, no keys present in help are
+ absent in the real output, and no keys will be present in the
+ real output that are not in help. It is possible for a key to be
+ present and have a value that is explicitly ``null``. As a
+ special case, if the dictionary has a single key whose name
+ starts with ``<`` and ends with ``>``, it means that the JSON
+ output is a dictionary that can have any value as a key. This is
+ used for cases in which the keys of the dictionary are things
+ like object IDs.
- A string in the help output is a description of the item that
appears in the corresponding location of the actual output. The
- corresponding output can have any format.
+ corresponding output can have any value including ``null``.
- An array in the help output always contains a single element. It
indicates that the corresponding location in the actual output is
- also an array, and that each element of the array has whatever
- format is implied by the single element of the help output's
- array.
+ an array of any length, and that each element of the array has
+ whatever format is implied by the single element of the help
+ output's array.
- For example, the help output indicates includes a "``pagelabels``"
+ For example, the help output indicates includes a ``"pagelabels"``
key whose value is an array of one element. That element is a
- dictionary with keys "``index``" and "``label``". In addition to
+ dictionary with keys ``"index"`` and ``"label"``. In addition to
describing the meaning of those keys, this tells you that the actual
JSON output will contain a ``pagelabels`` array, each of whose
elements is a dictionary that contains an ``index`` key, a ``label``
@@ -95,56 +611,13 @@ Documentation
Directness and Simplicity
The JSON output contains the value of every object in the file, but
- it also contains some processed data. This is analogous to how qpdf's
- library interface works. The processed data is similar to the helper
- functions in that it allows you to look at certain aspects of the PDF
- file without having to understand all the nuances of the PDF
+ it also contains some summary data. This is analogous to how qpdf's
+ library interface works. The summary data is similar to the helper
+ functions in that it allows you to look at certain aspects of the
+ PDF file without having to understand all the nuances of the PDF
specification, while the raw objects allow you to mine the PDF for
anything that the higher-level interfaces are lacking.
-.. _json.limitations:
-
-Limitations of JSON Representation
-----------------------------------
-
-There are a few limitations to be aware of with the JSON structure:
-
-- Strings, names, and indirect object references in the original PDF
- file are all converted to strings in the JSON representation. In the
- case of a "normal" PDF file, you can tell the difference because a
- name starts with a slash (``/``), and an indirect object reference
- looks like ``n n R``, but if there were to be a string that looked
- like a name or indirect object reference, there would be no way to
- tell this from the JSON output. Note that there are certain cases
- where you know for sure what something is, such as knowing that
- dictionary keys in objects are always names and that certain things
- in the higher-level computed data are known to contain indirect
- object references.
-
-- The JSON format doesn't support binary data very well. Mostly the
- details are not important, but they are presented here for
- information. When qpdf outputs a string in the JSON representation,
- it converts the string to UTF-8, assuming usual PDF string semantics.
- Specifically, if the original string is UTF-16, it is converted to
- UTF-8. Otherwise, it is assumed to have PDF doc encoding, and is
- converted to UTF-8 with that assumption. This causes strange things
- to happen to binary strings. For example, if you had the binary
- string ``<038051>``, this would be output to the JSON as ``\u0003•Q``
- because ``03`` is not a printable character and ``80`` is the bullet
- character in PDF doc encoding and is mapped to the Unicode value
- ``2022``. Since ``51`` is ``Q``, it is output as is. If you wanted to
- convert back from here to a binary string, would have to recognize
- Unicode values whose code points are higher than ``0xFF`` and map
- those back to their corresponding PDF doc encoding characters. There
- is no way to tell the difference between a Unicode string that was
- originally encoded as UTF-16 or one that was converted from PDF doc
- encoding. In other words, it's best if you don't try to use the JSON
- format to extract binary strings from the PDF file, but if you really
- had to, it could be done. Note that qpdf's
- :qpdf:ref:`--show-object` option does not have this
- limitation and will reveal the string as encoded in the original
- file.
-
.. _json.considerations:
JSON: Special Considerations
@@ -157,12 +630,15 @@ be aware of:
- If a PDF file has certain types of errors in its pages tree (such as
page objects that are direct or multiple pages sharing the same
object ID), qpdf will automatically repair the pages tree. If you
- specify ``"objects"`` and/or ``"objectinfo"`` without any other
- keys, you will see the original pages tree without any corrections.
- If you specify any of keys that require page tree traversal (for
- example, ``"pages"``, ``"outlines"``, or ``"pagelabel"``), then
- ``"objects"`` and ``"objectinfo"`` will show the repaired page tree
- so that object references will be consistent throughout the file.
+ specify ``"objects"`` (and, with qpdf JSON version 1, also
+ ``"objectinfo"``) without any other keys, you will see the original
+ pages tree without any corrections. If you specify any of keys that
+ require page tree traversal (for example, ``"pages"``,
+ ``"outlines"``, or ``"pagelabel"``), then ``"objects"`` (and
+ ``"objectinfo"``) will show the repaired page tree so that object
+ references will be consistent throughout the file. This is not an
+ issue with :qpdf:ref:`--json-output`, which doesn't repair the pages
+ tree.
- While qpdf guarantees that keys present in the help will be present
in the output, those fields may be null or empty if the information
@@ -177,22 +653,128 @@ be aware of:
1. Note that JSON indexes from 0, and you would also use 0-based
indexing using the API. However, 1-based indexing is easier in this
case because the command-line syntax for specifying page ranges is
- 1-based. If you were going to write a program that looked through the
- JSON for information about specific pages and then use the
+ 1-based. If you were going to write a program that looked through
+ the JSON for information about specific pages and then use the
command-line to extract those pages, 1-based indexing is easier.
- Besides, it's more convenient to subtract 1 from a program in a real
- programming language than it is to add 1 from shell code.
+ Besides, it's more convenient to subtract 1 in a real programming
+ language than it is to add 1 in shell code.
- The image information included in the ``page`` section of the JSON
- output includes the key "``filterable``". Note that the value of this
- field may depend on the :qpdf:ref:`--decode-level` that
- you invoke qpdf with. The JSON output includes a top-level key
- "``parameters``" that indicates the decode level used for computing
- whether a stream was filterable. For example, jpeg images will be
- shown as not filterable by default, but they will be shown as
- filterable if you run :command:`qpdf --json
+ output includes the key ``"filterable"``. Note that the value of
+ this field may depend on the :qpdf:ref:`--decode-level` that you
+ invoke qpdf with. The JSON output includes a top-level key
+ ``"parameters"`` that indicates the decode level that was used for
+ computing whether a stream was filterable. For example, jpeg images
+ will be shown as not filterable by default, but they will be shown
+ as filterable if you run :command:`qpdf --json
--decode-level=all`.
- The ``encrypt`` key's values will be populated for non-encrypted
files. Some values will be null, and others will have values that
apply to unencrypted files.
+
+- The qpdf library itself never loads an entire PDF into memory. This
+ remains true for PDF files represented in JSON format. In general,
+ qpdf will hold the entire object structure in memory once a file has
+ been fully read (objects are loaded into memory lazily but stay
+ there once loaded), but it will never have more than two copies of a
+ stream in memory at once. That said, if you ask qpdf to write JSON
+ to memory, it will do so, so be careful about this if you are
+ working with very large PDF files. There is nothing in the qpdf
+ library itself that prevents working with PDF files much larger than
+ available system memory. qpdf can both read and write such files in
+ JSON format. If you need to work with a PDF file's json
+ representation in memory, it is recommended that you use either
+ ``none`` or ``file`` as the argument to
+ :qpdf:ref:`--json-stream-data`, or if using the API, use
+ ``qpdf_sj_none`` or ``pdf_sj_file`` as the json stream data value.
+ If using ``none``, you can use other means to obtain the stream
+ data.
+
+.. _json-v2-changes:
+
+Changes from JSON v1 to v2
+--------------------------
+
+The following changes were made to qpdf's JSON output format for
+version 2.
+
+- The representation of objects has changed. For details, see
+ :ref:`json.objects`.
+
+ - The representation of strings is now unambiguous for all strings.
+ Strings a prefixed with either ``u:`` for Unicode strings or
+ ``b:`` for byte strings.
+
+ - Names are shown in qpdf's canonical form rather than in PDF
+ syntax. (Example: the PDF-syntax name ``/text#2fplain`` appeared
+ as ``"/text#2fplain"`` in v1 but appears as ``"/text/plain"`` in
+ v2.
+
+ - The top-level representation of an object in ``"objects"`` is a
+ dictionary containing either a ``"value"`` key or a ``"stream"``
+ key, making it possible to distinguish streams from other objects.
+
+- The ``"objectinfo"`` key has been removed in favor of a
+ representation in ``"objects"`` that differentiates between a stream
+ and other kinds of objects. In v1, it was not possible to tell a
+ stream from a dictionary within ``"objects"``.
+
+- Within the ``"objects"`` dictionary, keys are now ``"obj:O G R"``
+ where ``O`` and ``G`` are the object and generation number.
+ ``"trailer"`` remains the key for the trailer dictionary. In v1, the
+ ``obj:`` prefix was not present. The rationale for this change is as
+ follows:
+
+ - Having a unique prefix (``obj:``) makes it much easier to search
+ in the JSON file for the definition of an object
+
+ - Having the key still contain ``O G R`` makes it much easier to
+ construct the key from an indirect reference. You just have to
+ prepend ``obj:``. There is no need to parse the indirect object
+ reference.
+
+- In the ``"encrypt"`` object, the ``"modifyannotations"`` was
+ misspelled as ``"moddifyannotations"`` in v1. This has been
+ corrected.
+
+Motivation for qpdf JSON version 2
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+qpdf JSON version 2 was created to make it possible to manipulate PDF
+files using JSON syntax instead of native PDF syntax. This makes it
+possible to make low-level updates to PDF files from just about any
+programming language or even to do so from the command-line using
+tools like ``jq`` or any editor that's capable of working with JSON
+files. There were several limitations of JSON format version 1 that
+made this impossible:
+
+- Strings, names, and indirect object references in the original PDF
+ file were all converted to strings in the JSON representation. For
+ casual human inspection, this was fine, but in the general case,
+ there was no way to tell the difference between a string that looked
+ like a name or indirect object reference from an actual name or
+ indirect object reference.
+
+- PDF strings were not unambiguously represented in the JSON format.
+ The way qpdf JSON v1 represented a string was to try to convert the
+ string to UTF-8. This was done by assuming a string that was not
+ explicitly marked as Unicode was encoded in PDF doc encoding. The
+ problem is that there is not a perfect bidirectional mapping between
+ Unicode and PDF doc encoding, so if a binary string happened to
+ contain characters that couldn't be bidirectionally mapped, there
+ would be no way to get back to the original PDF string. Even when
+ possible, trying to map from the JSON representation of a binary
+ string back to the original string required knowledge of the mapping
+ between PDF doc encoding and Unicode.
+
+- There was no representation of stream data. If you wanted to extract
+ stream data, you could use :qpdf:ref:`--show-object`, so this wasn't
+ that important for inspection, but it was a blocker for being able
+ to go from JSON back to PDF. qpdf JSON version 2 allows stream data
+ to be included inline as base64-encoded data. There is also an
+ option to write all stream data to external files, which makes it
+ possible to work with very large PDF files in JSON format even with
+ tools that try to read the entire JSON structure into memory.
+
+- The PDF version from PDF header was not represented in qpdf JSON v1.
diff --git a/manual/library.rst b/manual/library.rst
index 2ff7d59a..72ed1a39 100644
--- a/manual/library.rst
+++ b/manual/library.rst
@@ -70,12 +70,14 @@ Python
qpdf's capabilities with other functionality provided by Python's
rich standard library and available modules.
-Other Languages
- Starting with version 8.3.0, the :command:`qpdf`
- command-line tool can produce a JSON representation of the PDF file's
- non-content data. This can facilitate interacting programmatically
- with PDF files through qpdf's command line interface. For more
- information, please see :ref:`json`.
+Other Languages Starting with version 11.0.0, the :command:`qpdf`
+ command-line tool can produce an unambiguous JSON representation of
+ a PDF file and can also create or update PDF files using this JSON
+ representation. qpdf versions from 8.3.0 through 10.6.3 had a more
+ limited JSON output format. The qpdf JSON format makes it possible
+ to inspect and modify the structure of a PDF file down to the
+ object level from the command-line or from any language that can
+ handle JSON data. Please see :ref:`json` for details.
Wrappers
The `qpdf Wiki <https://github.com/qpdf/qpdf/wiki>`__ contains a
diff --git a/manual/object-streams.rst b/manual/object-streams.rst
index 116b2b8f..89687701 100644
--- a/manual/object-streams.rst
+++ b/manual/object-streams.rst
@@ -122,7 +122,7 @@ entries in ``/W`` above. Each entry consists of one or more fields, the
first of which is the type of the field. The number of bytes for each
field is given by ``/W`` above. A 0 in ``/W`` indicates that the field
is omitted and has the default value. The default value for the field
-type is "``1``". All other default values are "``0``".
+type is ``1``. All other default values are ``0``.
PDF 1.5 has three field types:
diff --git a/manual/qdf.rst b/manual/qdf.rst
index ed697bec..fac70e69 100644
--- a/manual/qdf.rst
+++ b/manual/qdf.rst
@@ -28,6 +28,13 @@ able to restore edited files to a correct state. The
arguments. It reads a possibly edited QDF file from standard input and
writes a repaired file to standard output.
+For another way to work with PDF files in an editor, see :ref:`json`.
+Using qpdf JSON format allows you to edit the PDF file semantically
+without having to be concerned about PDF syntax. However, QDF files
+are actually valid PDF files, so the feedback cycle may be faster if
+previewing with a PDF reader. Also, since QDF files are valid PDF, you
+can experiment with all aspects of the PDF file, including syntax.
+
The following attributes characterize a QDF file:
- All objects appear in numerical order in the PDF file, including when
diff --git a/manual/qpdf-job.rst b/manual/qpdf-job.rst
index 43dd7f0e..57bb9f09 100644
--- a/manual/qpdf-job.rst
+++ b/manual/qpdf-job.rst
@@ -27,6 +27,10 @@ executable is available from inside the C++ library using the
- Use from the C API with ``qpdfjob_run_from_json`` from :file:`qpdfjob-c.h`
+ - Note: this is unrelated to :qpdf:ref:`--json` but can be combined
+ with it. For more information on qpdf JSON (vs. QPDFJob JSON), see
+ :ref:`json`.
+
- The ``QPDFJob`` C++ API
If you can understand how to use the :command:`qpdf` CLI, you can
diff --git a/manual/release-notes.rst b/manual/release-notes.rst
index ad8d14f9..67400364 100644
--- a/manual/release-notes.rst
+++ b/manual/release-notes.rst
@@ -60,7 +60,8 @@ For a detailed list of changes, please see the file
- CLI: breaking changes
- The default json output version when :qpdf:ref:`--json` is
- specified has been changed from ``1`` to ``latest``.
+ specified has been changed from ``1`` to ``latest``, which is
+ now ``2``.
- The :qpdf:ref:`--allow-weak-crypto` flag is now mandatory when
explicitly creating files with weak cryptographic algorithms.
@@ -100,7 +101,7 @@ For a detailed list of changes, please see the file
- ``qpdf --list-attachments --verbose`` include some additional
information about attachments. Additional information about
- attachments is also included in the ``attachments`` json key
+ attachments is also included in the ``attachments`` JSON key
with ``--json``.
- For encrypted files, ``qpdf --json`` reveals the user password
@@ -647,8 +648,8 @@ For a detailed list of changes, please see the file
passwords from files or standard input than using
:samp:`@file` for this purpose.
- - Add some information about attachments to the json output, and
- added ``attachments`` as an additional json key. The
+ - Add some information about attachments to the JSON output, and
+ added ``attachments`` as an additional JSON key. The
information included here is limited to the preferred name and
content stream and a reference to the file spec object. This is
enough detail for clients to avoid the hassle of navigating a