aboutsummaryrefslogtreecommitdiffstats
path: root/TODO
diff options
context:
space:
mode:
authorJay Berkenbilt <ejb@ql.org>2022-05-30 22:38:17 +0200
committerJay Berkenbilt <ejb@ql.org>2022-05-31 02:03:08 +0200
commit0bd908b550603a6bcc399a825a170a1263378b22 (patch)
treed4cf0cdeafa09fb914e744b5708b5bc6e49b16f2 /TODO
parentb7bbf12e85fa46e7971d84143d1597c992045af1 (diff)
downloadqpdf-0bd908b550603a6bcc399a825a170a1263378b22.tar.zst
Update documentation for qpdf JSON v2
Diffstat (limited to 'TODO')
-rw-r--r--TODO201
1 files changed, 26 insertions, 175 deletions
diff --git a/TODO b/TODO
index b52dd93a..7d70e254 100644
--- a/TODO
+++ b/TODO
@@ -2,14 +2,13 @@
Next
====
+Before Release:
+
* At next release, hide release-qpdf-10.6.3.0cmake* versions at readthedocs
* Stay on top of https://github.com/pikepdf/pikepdf/pull/315
* Release qtest with updates to qtest-driver and copy back into qpdf
-In order:
-* json v2
-
-Other (do in any order):
+Pending changes:
* Good C API for json v2
* QPDFPagesTree -- avoid ever flattening the pages tree.
@@ -50,180 +49,10 @@ Other (do in any order):
* Rework tests so that nothing is written into the source directory.
Ideally then the entire build could be done with a read-only
source tree.
+* Consider adding fuzzer code for JSON
Soon: Break ground on "Document-level work"
-Output JSON v2
-==============
-
-Remaining work:
-
-* Make sure all the information from informational options is
- available in the json output.
-
- * --check: add but maybe not by default?
-
- * --show-linearization: add but maybe not by default? Also figure
- out whether warnings reported for some of the PDF specs (1.7) are
- qpdf problems. This may not be worth adding in the first
- increment.
-
- * --show-xref: add
-
-* Consider having --check, --show-encryption, etc., just select the
- right keys when in json mode. I don't think I want check on by
- default, so that might be different.
-
-* Consider having warnings be included in the json in a "warnings" key
- in json mode.
-
-Notes for documentation:
-
-* Find all mentions of json in the manual and update.
-
-* Document typo fix in encrypt in release notes along with any other
- non-compatible json 2 changes. Scrutinize all the output to decide
- what should change.
-
-* Keys other than "qpdf-v2" are ignored so people can stash their own
- stuff. Unknown keys are ignored at other places for future
- compatibility. Readers of qpdf json should continue to ignore keys
- they don't recognize.
-
-* Change: names are written in canonical form with a leading slash
- just as they are treated in the code. In v1, they were written in
- PDF syntax in the json file. Example: /text#2fplain in pdf will be
- written as /text/plain in json v2 and as /text#2fplain in json v1.
-
-* Document changes to strings, objects, streams, object keys.
-
-* CLI: --json-input, --json-output[=version], --update-from-json. With
- --json-input, the input file is a JSON file instead of a PDF file.
- It must be complete, meaning that a PDF version must be given, all
- streams must have exactly one of data or datafile, and a trailer
- dictionary must be present, even if empty.
-
- With --update-from-json, the JSON file updates objects in place. If
- updating an old stream, if stream data is omitted, the data remains
- untouched. The dictionary is always required. Remember that
- QPDFWriter does not preserve object numbers, though --json-output
- does. Therefore, if you want to update a PDF with a JSON, the input
- to --update-from-json must be the same PDF as the one that
- --json-output was run on previously. Otherwise, object numbers won't
- match. Show this with an example. When updating,
-
-* Certain fields are ignored when reading the JSON. This includes
- maxobjectid, any computed fields in trailer (such as /Size), and all
- /Length keys in stream dictionaries. There is no need for the user
- to correct, remove, or otherwise worry about any values those keys
- might have. The maxobjectid field is present in the original output
- to assist with adding new objects to the file.
-
-* JSON strings within PDF objects:
-
- * "n n R" is an indirect object
-
- * "/Name" is a name in canonical form with a leading slash (like
- "/text/plain"), not PDF syntax (like "/text#2fplain").
-
- * "b:hex-digits" is a binary string ("b:feff03c0"). Hex digits may be
- mixed case. There must be an even number of digits.
-
- * "u:utf-8" is a UTF-8 encoded string ("u:π", "u:\u03c0"). UTF-16
- surrogate pairs are allowed. These are all equivalent: "u:🥔",
- "u:\ud83e\udd54", "b:FEFFD83EDD54", "b:efbbbff09fa594".
-
- * Both "b:" and "u:" are valid representations of the empty string.
-
- * Anything else is an error
-
-* Document use of --json-input and --json-output together to show
- preservation of object numbers. Draw attention to "original object
- ID" comments in qdf as another way to show it.
-
-* Document top-level keys of "qpdf-v2" ("pdfversion", "objects",
- "maxobjectid") noting that "maxobjectid" is ignored when reading.
-
-* Stream data: "data" is base64-encoded stream data. "datafile" is the
- path to a file (relative path recommended but not required)
- containing the binary data. As with any PDF representation, the data
- must be consistent with the filters. --decode-level is honored by
- --json-output.
-
-* Other changes from v1:
-
- * in "objects", keys are "obj:o g R" or "trailer"
-
- * Non-stream objects are dictionaries with a "value" key whose value
- is the object. Stream objects are dictionaries with a "stream" key
- whose value is {"dict": stream-dictionary}. The "/Length" key is
- omitted from the stream dictionary.
-
- * "objectinfo" is gone as it is now possible to tell a stream from a
- non-stream directly. To get stream data, use the --json-output
- option. Note about how "pages" may cause the pages tree to be
- corrected.
-
-For non-streams:
-
- "obj:o g R": {
- "value": ...
- }
-
-For streams:
-
- "obj:o g R": {
- "stream": {
- "dict": { ... stream dictionary ... },
- "data": "base64-encoded data",
- "datafile": "path to base64-encoded data"
- }
- }
-
-Rationale of "obj:o g R" is that indirect object references are just
-"o g R", and so code that wants to resolve one can do so easily by
-just prepending "obj:" and not having to parse or split the string.
-Having a prefix rather than making the key just "o g R" makes it much
-easier to search in the JSON for the definition of an object.
-
-CLI:
-
-Example workflow:
-* qpdf in.pdf --json-output pdf.json
-* edit pdf.json
-* qpdf --json-input pdf.json out.pdf
-
-* qpdf in.pdf --json-output pdf.json
-* edit pdf.json keeping only objects that need to be changed
-* qpdf in.pdf --update-from-json=pdf.json out.pdf
-
-To modify a single object:
-
-* qpdf in.pdf --json-output pdf.json --json-object=o,g
-* edit pdf.json
-* qpdf in.pdf --update-from-json=pdf.json out.pdf
-
-Historical note: you can't create a PDF from v1 json because
-
-* The PDF version header is not recorded
-
-* Strings cannot be unambiguously encoded/decoded
-
- * Can't tell string from name from indirect object
-
- * Strings are treated as PDF doc encoding and output as UTF-8, which
- doesn't work since multiple PDF doc code points are undefined and
- is absurd for binary strings
-
-* There is no representation of stream data
-
-* You can't tell a stream from a dictionary except by looking in both
- "object" and "objectinfo".
-
-* Using "n n R" as a key in "objects" and "objectinfo" makes it hard
- to search for things when viewing the JSON file in an editor.
-
-
QPDFPagesTree
=============
@@ -256,6 +85,28 @@ sure /Count and /Parent are correct.
refs/attic/QPDFPagesTree-old -- original, abandoned branch -- clean up
when done.
+Possible future JSON enhancements
+=================================
+
+* Add to JSON output the information available from a few additional
+ informational options:
+
+ * --check: add but maybe not by default?
+
+ * --show-linearization: add but maybe not by default? Also figure
+ out whether warnings reported for some of the PDF specs (1.7) are
+ qpdf problems. This may not be worth adding in the first
+ increment.
+
+ * --show-xref: add
+
+* Consider having --check, --show-encryption, etc., just select the
+ right keys when in json mode. I don't think I want check on by
+ default, so that might be different.
+
+* Consider having warnings be included in the json in a "warnings" key
+ in json mode.
+
QPDFJob
=======