aboutsummaryrefslogtreecommitdiffstats
path: root/manual
diff options
context:
space:
mode:
authorJay Berkenbilt <ejb@ql.org>2022-07-31 16:34:05 +0200
committerJay Berkenbilt <ejb@ql.org>2022-07-31 22:23:17 +0200
commit5f4224f31a500452a4f97f36ed57351b41ca0114 (patch)
treebe355db5a4fcc8334410baf76c7eb1330578faa8 /manual
parent80acfc3826704064db8cc2f6af0c338b3aa557e7 (diff)
downloadqpdf-5f4224f31a500452a4f97f36ed57351b41ca0114.tar.zst
Simplify --json-output
Now --json-output just changes defaults. Allow output file with --json.
Diffstat (limited to 'manual')
-rw-r--r--manual/cli.rst80
-rw-r--r--manual/json.rst222
2 files changed, 152 insertions, 150 deletions
diff --git a/manual/cli.rst b/manual/cli.rst
index 809437b7..8383b87f 100644
--- a/manual/cli.rst
+++ b/manual/cli.rst
@@ -3194,7 +3194,16 @@ Related Options
:qpdf:ref:`--json-help` option to get a description of the JSON
object.
-.. qpdf:option:: --json-help
+ Starting with qpdf 11, when this option is specified, an output
+ file is optional (for backward compatibility) and defaults to
+ standard output. You may specify an output file to write the JSON
+ to a file rather than standard output.
+
+ Stream data is only included if :qpdf:ref:`--json-output` is
+ specified or if a value other than ``none`` is passed to
+ :qpdf:ref:`--json-stream-data`.
+
+.. qpdf:option:: --json-help[=version]
.. help: show format of JSON output
@@ -3202,12 +3211,13 @@ Related Options
output a JSON object with the same keys and with values
containing descriptive text.
- Describe the format of the JSON output by writing to standard
- output a JSON object with the same structure as the JSON generated
- by qpdf. In the output written by ``--json-help``, each key's value
- is a description of the key. The specific contract guaranteed by
- qpdf in its JSON representation is explained in more detail in the
- :ref:`json`.
+ Describe the format of the corresponding version of JSON output by
+ writing to standard output a JSON object with the same structure as
+ the JSON generated by qpdf. In the output written by
+ ``--json-help``, each key's value is a description of the key. The
+ specific contract guaranteed by qpdf in its JSON representation is
+ explained in more detail in the :ref:`json`. The default version of
+ help is version ``2``, as with the :qpdf:ref:`--json` flag.
.. qpdf:option:: --json-key=key
@@ -3233,11 +3243,9 @@ Related Options
objects will be shown.
This option is repeatable. If given, only specified objects will be
- shown in the ``"objects"`` key of the JSON output. Otherwise, all
- objects will be shown. For qpdf JSON version 1, this also affects
- the ``"objectinfo"`` key, which is not present in version 2. This
- option may be used with :qpdf:ref:`--json` and also with
- :qpdf:ref:`--json-output`.
+ shown in the objects dictionary in the JSON output. Otherwise, all
+ objects will be shown. See :ref:`json` for details about the qpdf
+ JSON format.
.. qpdf:option:: --json-stream-data={none|inline|file}
@@ -3281,28 +3289,30 @@ Related Options
.. qpdf:option:: --json-output[=version]
- .. help: serialize to JSON
+ .. help: apply defaults for JSON serialization
- The output file will be qpdf JSON format at the given version.
- "version" may be a specific version or "latest" (the default).
- The only supported version is 2. See also --json-stream-data,
- --json-stream-prefix, and --decode-level.
+ Implies --json=version. Changes default values for certain
+ options so that the JSON output written is the most faithful
+ representation of the original PDF and contains no additional
+ JSON keys. See also --json-stream-data, --json-stream-prefix,
+ and --decode-level.
- The output file, instead of being a PDF file, will be a JSON file
- in qpdf JSON format at the given version. ``version`` may be a
- specific version or ``latest`` (the default). The only supported
- version is 2. See also :qpdf:ref:`--json-stream-data` and
- :qpdf:ref:`--json-stream-prefix`. This option also changes the
- following defaults:
+ Implies :qpdf:ref:`--json` at the specified version. This option
+ changes several default values, all of which can be overridden by
+ specifying the stated option:
- The default value for :qpdf:ref:`--json-stream-data` changes from
``none`` to ``inline``.
- - The default decode level for stream data becomes ``none``, but you can
- override it with :qpdf:ref:`--decode-level`.
+ - The default value for :qpdf:ref:`--decode-level` changes from
+ ``generalized`` to ``none``.
+
+ - By default, only the ``"qpdf"`` key is included in the JSON
+ output, but you can add additional keys with
+ :qpdf:ref:`--json-key`.
- - Only the ``"qpdf"`` key is included in the JSON output, but you
- can add additional keys with :qpdf:ref:`--json-key`.
+ - Excludes the ``"version"`` and ``"parameters"`` keys from the
+ JSON output.
If you want to look at the contents of streams easily as you would
in QDF mode (see :ref:`qdf`), you can use
@@ -3313,15 +3323,15 @@ Related Options
.. help: input file is qpdf JSON
- Treat the input file as a JSON file in qpdf JSON format as
- written by qpdf --json-output. See the "qpdf JSON Format"
- section of the manual for information about how to use this
- option.
+ Treat the input file as a JSON file in qpdf JSON format. See the
+ "qpdf JSON Format" section of the manual for information about
+ how to use this option.
- Treat the input file as a JSON file in qpdf JSON format as written
- by ``qpdf --json-output``. The input file must be complete and
- include all stream data. For information about converting between
- PDF and JSON, please see :ref:`json`.
+ Treat the input file as a JSON file in qpdf JSON format. The input
+ file must be complete and include all stream data. The JSON version
+ must be at least 2. All top-level keys are ignored except for
+ ``"qpdf"``. For information about converting between PDF and JSON,
+ please see :ref:`json`.
.. qpdf:option:: --update-from-json=qpdf-json-file
diff --git a/manual/json.rst b/manual/json.rst
index 0becd405..9f1dc489 100644
--- a/manual/json.rst
+++ b/manual/json.rst
@@ -24,27 +24,28 @@ represents the contents of a PDF file. This is distinct from the
interacting with qpdf the way the command-line tool does. For
information about that, see :ref:`qpdf-job`.
-The qpdf JSON format is specific to qpdf. There are two ways to use
-qpdf JSON:
-
-- The :qpdf:ref:`--json` command-line flag causes creation of a JSON
- representation of all the objects in a PDF file, excluding stream
- data. This includes an unambiguous representation of the PDF object
- structure and also provides JSON-formatted summaries of other
- information about the file. This functionality is built into
- ``QPDFJob`` and can be accessed from the ``qpdf`` command-line tool
- or from the ``QPDFJob`` C or C++ API.
-
-- qpdf can create a JSON file that completely represents a PDF file.
- You can think of this as using JSON as an *alternative syntax* for
- representing a PDF file. Using qpdf JSON, it is possible to
- convert a PDF file to JSON, manipulate the structure or contents of
- the objects at a low level, and convert the results back to a PDF
- file. This functionality can be accessed from the command-line with
- the :qpdf:ref:`--json-output`, :qpdf:ref:`--json-input`, and
- :qpdf:ref:`--update-from-json` flags, or from the API using the
- ``QPDF::writeJSON``, ``QPDF::createFromJSON``, and
- ``QPDF::updateFromJSON`` methods.
+The qpdf JSON format is specific to qpdf. With JSON version 2, the
+:qpdf:ref:`--json` command-line flag causes creation of a JSON
+representation of all the objects in a PDF file. This includes an
+unambiguous representation of the PDF object structure and also
+provides JSON-formatted summaries of other information about the file.
+This functionality is built into ``QPDFJob`` and can be accessed from
+the ``qpdf`` command-line tool or from the ``QPDFJob`` C or C++ API.
+
+By default, stream data is omitted, but it can be included by
+specifying the :qpdf:ref:`--json-stream-data` option. With stream data
+included, the generated JSON file completely represents a PDF file.
+You can think of this as using JSON as an *alternative syntax* for
+representing a PDF file. Using qpdf JSON, it is possible to convert a
+PDF file to JSON, manipulate the structure or contents of the objects
+at a low level, and convert the results back to a PDF file. This
+functionality can be accessed from the command-line with the
+:qpdf:ref:`--json-input`, and :qpdf:ref:`--update-from-json` flags, or
+from the API using the ``QPDF::writeJSON``, ``QPDF::createFromJSON``,
+and ``QPDF::updateFromJSON`` methods. The :qpdf:ref:`--json-output`
+flag changes a handful of defaults so that the resulting JSON is as
+close as possible to the original input and is ready for being
+converted back to PDF.
.. _json-terminology:
@@ -120,18 +121,53 @@ qpdf JSON Object Representation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This section describes the representation of PDF objects in qpdf JSON
-version 2. PDF objects are represented within the ``"objects"``
-dictionary of a qpdf JSON file. This is true both for PDF serialized
-to JSON (:qpdf:ref:`--json-output`, ``QPDF::writeJSON``) or objects as
-they appear in the output of ``qpdf`` with the :qpdf:ref:`--json`
-option.
-
-Each key in the ``"objects"`` dictionary is either ``"trailer"`` or a
-string of the form ``"obj:O G R"`` where ``O`` and ``G`` are the
-object and generation numbers and ``R`` is the literal string ``R``.
-This is the PDF syntax for the indirect object reference prepended by
-``obj:``. The value, representing the object itself, is a JSON object
-whose structure is described below.
+version 2. PDF objects are represented within the ``"qpdf"`` entry of
+a qpdf JSON file. The ``"qpdf"`` entry is a two-element array. The
+first element is a dictionary containing header-like information about
+the file such as the PDF version. The second element is a dictionary
+containing all the objects in the PDF file. We refer to this as the
+*objects dictionary*.
+
+The first element contains the following keys:
+
+- ``"jsonversion"`` -- a number indicating the JSON version used for
+ writing. This will always be ``2``.
+
+- ``"pdfversion"`` -- a string containing PDF version as indicated in
+ the PDF header (e.g. ``"1.7"``, ``"2.0"``)
+
+- ``pushedinheritedpageresources`` -- a boolean indicating whether
+ the library pushed inherited resources down to the page level.
+ Certain library calls cause this to happen, and qpdf needs to know
+ when reading a JSON file back in whether it should do this as it may
+ cause certain objects to be renumbered.
+
+- ``calledgetallpages`` -- a boolean indicating whether
+ ``getAllPages`` was called prior to writing the JSON output. This
+ method causes page tree repair to occur, which may renumber some
+ objects (in very rare cases of corrupted page trees), so qpdf needs
+ to know this information when reading a JSON file back in.
+
+- ``"maxobjectid"`` -- a number indicating the object ID of the
+ highest numbered object in the file. This is provided to make it
+ easier for software that wants to add new objects to the file as you
+ can safely start with one above that number when creating new
+ objects. Note that the value of ``"maxobjectid"`` may be higher than
+ the actual maximum object that appears in the input PDF since it
+ takes into consideration any dangling indirect object references
+ from the original file. This prevents you from unwittingly creating
+ an object that doesn't exist but that is referenced, which may have
+ unintended side effects. (The PDF specification explicitly allows
+ dangling references and says to treat them as nulls. This can happen
+ if objects are removed from a PDF file.)
+
+The second element is the objects dictionary. Each key in the objects
+dictionary is either ``"trailer"`` or a string of the form ``"obj:O G
+R"`` where ``O`` and ``G`` are the object and generation numbers and
+``R`` is the literal string ``R``. This is the PDF syntax for the
+indirect object reference prepended by ``obj:``. The value,
+representing the object itself, is a JSON object whose structure is
+described below.
Top-level Stream Objects
Stream objects are represented as a JSON object with the single key
@@ -143,6 +179,7 @@ Top-level Stream Objects
- ``none``: stream data is not represented; no other keys are
present
+ specified.
- ``inline``: the stream data appears as a base64-encoded string as
the value of the ``"data"`` key
@@ -249,57 +286,6 @@ Object Values
the string representations of names and whose values are
representations of PDF objects.
-.. _json.output:
-
-qpdf JSON Output
-~~~~~~~~~~~~~~~~
-
-The format of the JSON written by qpdf's :qpdf:ref:`--json-output`
-flag or the ``QPDF::writeJSON`` API call is a JSON object consisting
-of a single key: ``"qpdf"``. This may be the only key, or it may be
-embedded in the output of ``qpdf --json``. Unknown keys are ignored
-for future compatibility. It is guaranteed that qpdf will never add
-any keys whose names start with ``xdata``, so users are free to add
-their own metadata using keys whose names start with ``xdata`` without
-fear of clashing with a future version of qpdf.
-
-The ``"qpdf"`` key points to a two-element JSON array. The first element is
-a JSON object with the following keys:
-
-- ``"jsonversion"`` -- a number indicating the JSON version used for
- writing. This will always be ``2``.
-
-- ``"pdfversion"`` -- a string containing PDF version as indicated in
- the PDF header (e.g. ``"1.7"``, ``"2.0"``)
-
-- ``pushedinheritedpageresources`` -- a boolean indicating whether
- the library pushed inherited resources down to the page level.
- Certain library calls cause this to happen, and qpdf needs to know
- when reading a JSON file back in whether it should do this as it may
- cause certain objects to be renumbered.
-
-- ``calledgetallpages`` -- a boolean indicating whether
- ``getAllPages`` was called prior to writing the JSON output. This
- method causes page tree repair to occur, which may renumber some
- objects (in very rare cases of corrupted page trees), so qpdf needs
- to know this information when reading a JSON file back in.
-
-- ``"maxobjectid"`` -- a number indicating the object ID of the
- highest numbered object in the file. This is provided to make it
- easier for software that wants to add new objects to the file as you
- can safely start with one above that number when creating new
- objects. Note that the value of ``"maxobjectid"`` may be higher than
- the actual maximum object that appears in the input PDF since it
- takes into consideration any dangling indirect object references
- from the original file. This prevents you from unwittingly creating
- an object that doesn't exist but that is referenced, which may have
- unintended side effects. (The PDF specification explicitly allows
- dangling references and says to treat them as nulls. This can happen
- if objects are removed from a PDF file.)
-
-The second element is a JSON object containing the actual PDF objects
-as described in :ref:`json.objects`.
-
Note that writing JSON output is done by ``QPDF``, not ``QPDFWriter``.
As such, none of the things ``QPDFWriter`` does apply. This includes
recompression of streams, renumbering of objects, anything to do with
@@ -325,7 +311,7 @@ qpdf JSON format.
"pdfversion": "1.3",
"pushedinheritedpageresources": false,
"calledgetallpages": false,
- "maxobjectid": 5,
+ "maxobjectid": 5
},
{
"obj:1 0 R": {
@@ -389,8 +375,7 @@ qpdf JSON format.
qpdf JSON Input
~~~~~~~~~~~~~~~
-Output in the JSON output format described in :ref:`json.output` can
-be used in two different ways:
+The qpdf JSON output can be used in two different ways:
- By using the :qpdf:ref:`--json-input` flag or calling
``QPDF::createFromJSON`` in place of ``QPDF::processFile``, a qpdf
@@ -408,8 +393,11 @@ Here are some important things to know about qpdf JSON input.
- When a qpdf JSON file is used as the primary input file, it must be
complete. This means
+ - A JSON version number must be specified with the ``"jsonversion"``
+ key in the first array element
+
- A PDF version number must be specified with the ``"pdfversion"``
- key
+ key in the first array element
- Stream data must be present for all streams
@@ -422,6 +410,9 @@ Here are some important things to know about qpdf JSON input.
- ``"maxobjectid"`` is ignored, so it is not necessary to update it
when adding new objects.
+ - ``"calledgetallpages"`` and ``"pushedinheritedpageresources"`` are
+ treated as false if omitted.
+
- ``"/Length"`` is ignored in all stream dictionaries. qpdf doesn't
put it there when it creates JSON output, and it is not necessary
to add it.
@@ -432,14 +423,13 @@ Here are some important things to know about qpdf JSON input.
- Unknown keys at the to top level of the file, within ``objects``,
at the top level of each individual object (inside the object that
has the ``"value"`` or ``"stream"`` key) and directly within
- ``"stream"`` are ignored for future compatibility. You should
- avoid putting your own values in those places if you wish to avoid
- risking that your JSON files will not work in future versions of
- qpdf. The exception to this advice is at the top level of the
- overall file where it is explicitly supported for you to add your
- own keys. For example, you could add your own metadata at the top
- level, and qpdf will ignore it. Note that extra top-level keys are
- not preserved when qpdf reads your JSON file.
+ ``"stream"`` are ignored for future compatibility. This includes
+ other top-level keys generated by ``qpdf`` itself (such as
+ ``"pages"``). As such, those keys don't have to be consistent with
+ the ``"qpdf"`` key if modifying a JSON file for conversion back to
+ PDF. If you wish to store application-specific metadata, you can
+ do so by adding a key whose name starts with ``x-``. qpdf is
+ guaranteed not to add any of its own keys that starts with ``x-``.
- When qpdf reads a PDF file, the internal object numbers are always
preserved. However, when qpdf writes a file using ``QPDFWriter``,
@@ -458,9 +448,9 @@ Here are some important things to know about qpdf JSON input.
# edit pdf.json
qpdf in.pdf out.pdf --update-from-json=pdf.json
- The following will not produce predictable results because
- ``out.pdf`` won't have the same object numbers as ``pdf.json`` and
- ``in.pdf``.
+ The following will produce unpredictable and probably incorrect
+ results because ``out.pdf`` won't have the same object numbers as
+ ``pdf.json`` and ``in.pdf``.
::
@@ -658,15 +648,16 @@ be aware of:
- If a PDF file has certain types of errors in its pages tree (such as
page objects that are direct or multiple pages sharing the same
object ID), qpdf will automatically repair the pages tree. If you
- specify ``"objects"`` (and, with qpdf JSON version 1, also
+ specify ``"qpdf"`` (or, with qpdf JSON version 1, ``"objects"`` or
``"objectinfo"``) without any other keys, you will see the original
pages tree without any corrections. If you specify any of keys that
require page tree traversal (for example, ``"pages"``,
- ``"outlines"``, or ``"pagelabel"``), then ``"objects"`` (and
- ``"objectinfo"``) will show the repaired page tree so that object
- references will be consistent throughout the file. This is not an
- issue with :qpdf:ref:`--json-output`, which doesn't repair the pages
- tree.
+ ``"outlines"``, or ``"pagelabel"``), then ``"qpdf"`` (and
+ ``"objects"`` and ``"objectinfo"``) will show the repaired page
+ tree so that object references will be consistent throughout the
+ file. You can tell if this has happened by looking at the
+ ``"calledgetallpages"`` and ``"pushedinheritedpageresources"``
+ fields in the first element of the ``"qpdf"`` array.
- While qpdf guarantees that keys present in the help will be present
in the output, those fields may be null or empty if the information
@@ -743,16 +734,17 @@ version 2.
dictionary containing either a ``"value"`` key or a ``"stream"``
key, making it possible to distinguish streams from other objects.
-- The ``"objectinfo"`` key has been removed in favor of a
- representation in ``"objects"`` that differentiates between a stream
- and other kinds of objects. In v1, it was not possible to tell a
- stream from a dictionary within ``"objects"``.
-
-- Within the ``"objects"`` dictionary, keys are now ``"obj:O G R"``
- where ``O`` and ``G`` are the object and generation number.
- ``"trailer"`` remains the key for the trailer dictionary. In v1, the
- ``obj:`` prefix was not present. The rationale for this change is as
- follows:
+- The ``"objectinfo"`` and ``"objects"`` keys have been removed in
+ favor of a representation in ``"qpdf"`` that includes header
+ information and differentiates between a stream and other kinds of
+ objects. In v1, it was not possible to tell a stream from a
+ dictionary within ``"objects"``, and the PDF version was not
+ captured at all.
+
+- Within the objects dictionary, keys are now ``"obj:O G R"`` where
+ ``O`` and ``G`` are the object and generation number. ``"trailer"``
+ remains the key for the trailer dictionary. In v1, the ``obj:``
+ prefix was not present. The rationale for this change is as follows:
- Having a unique prefix (``obj:``) makes it much easier to search
in the JSON file for the definition of an object