Major rework -- see long comments

* Replace --create-from-json=file with --json-input, which causes the regular input to be treated as json. * Eliminate --to-json * In --json=2, bring back "objects" and eliminate "objectinfo". Stream data is never present. * In --json-output=2, write "qpdf-v2" with "objects" and include stream data.
author: Jay Berkenbilt <ejb@ql.org> 2022-05-19 00:22:57 +0200
committer: Jay Berkenbilt <ejb@ql.org> 2022-05-20 15:16:25 +0200
commit: 6f43bf8de36b08c55b172b4f4133c79657651666 (patch)
tree: ec17bbf42d9ea78d44ab3b9d2cac363cc6b9bc68 /TODO
parent: 23fc6756f1894e1af35853eb2251f08d5b25cf30 (diff)
download: qpdf-6f43bf8de36b08c55b172b4f4133c79657651666.tar.zst
1 files changed, 35 insertions, 38 deletions
diff --git a/TODO b/TODO
index f3eaebaf..fbf10c05 100644
--- a/TODO
+++ b/TODO
@@ -54,6 +54,10 @@ Soon: Break ground on "Document-level work"
 Output JSON v2
 ==============
 
+Some of this documentation has drifted from the actual implementation.
+
+Make sure pages tree repair generates warnings.
+
 * Reread from perspective of update
 * Test all ignore cases with QTC
 * Test case of correct file with dict before data/datafile
@@ -62,6 +66,7 @@ Output JSON v2
   after creation.
 * Test invalid data, invalid data file
 * Tests: round-trip through json, round-trip through qpdf --qdf
+* Test to see if we get CR/NL on Windows, which is okay
 
 Try to never flatten pages tree. Make sure we do something reasonable
 with pages tree repair. The problem is that if pages tree repair is
@@ -75,6 +80,9 @@ General things to remember:
 * Make sure all the information from --check and other informational
   options (--show-linearization, --show-encryption, --show-xref,
   --list-attachments, --show-npages) is available in the json output.
+  Consider having --check, --show-encryption, etc., just select the
+  right keys when in json mode. I don't think I want check on by
+  default, so that might be different.
 
 * Consider changing the contract to allow fields to be absent even
   when present in the schema. It's reasonable for people to check for
@@ -95,41 +103,32 @@ General things to remember:
 
 JSON to PDF:
 
-Have --create-from-json and --update-from-json. With
---create-from-json, the json file must be complete, meaning all stream
-data, the trailer, and the PDF version must be present. In
---update-from-json, an object explicitly set to null (not "value":
-null) is deleted. For streams with no stream data, the dictionary is
-updated but the data is left untouched. Other things that are omitted
-are left alone. Make sure document that, when writing a PDF file from
-QPDF, there is no expectation of object numbers being preserved. As
-such, --update-from-json can only be used to update the exact file
-that the json was created from. You can put multiple objects in the
-update file, but you can't use a json from one file to update the
-output of a previous update since the object numbers will have
-changed. Note that, when creating from a JSON, object numbers are
-preserved in the resulting QPDF object but still modified by
-QPDFWriter for the output. This would be visible by combining
---to-json and --create-from-json. Also using --qdf with
---create-from-json would show original object IDs in comments. It will
-be important to capture this in the documentation.
+Have --json-input and --update-from-json. With --json-input, the json
+file must be complete, meaning all stream data, the trailer, and the
+PDF version must be present. In --update-from-json, an object
+explicitly set to null (not "value": null) is deleted. For streams
+with no stream data, the dictionary is updated but the data is left
+untouched. Other things that are omitted are left alone. Make sure
+document that, when writing a PDF file from QPDF, there is no
+expectation of object numbers being preserved. As such,
+--update-from-json can only be used to update the exact file that the
+json was created from. You can put multiple objects in the update
+file, but you can't use a json from one file to update the output of a
+previous update since the object numbers will have changed. Note that,
+when creating from a JSON, object numbers are preserved in the
+resulting QPDF object but still modified by QPDFWriter for the output.
+This would be visible by combining --json-output and --json-input.
+Also using --qdf with --create-from-json would show original object
+IDs in comments. It will be important to capture this in the
+documentation.
 
 When reading a JSON string, any string that doesn't look like a name
 or indirect object or start with "b:" or "u:" should be considered an
 error. Just use newUnicodeString on "u:" strings. For "b:" strings,
 decode the bytes with hex_decode and use newString.
 
-For going back from JSON to PDF, we can have
-QPDF::createFromJSON(std::shared_ptr<InputSource>)
-which will have logic similar to copyForeignObject. Note that this
-InputSource is not going to be this->file. We have to keep it
-separately. There's also non-static QPDF::updateFromJSON. Both
-createFromJSON and updateFromJSON will call the same internal method
-with different options. That method will use a reactor that is a
-private QPDF class that just proxies to private QPDF methods.
-
-Test case: combine --create-from-json and --to-json to preservation of
-object numbers. QPDFWriter won't show that although --qdf with the
+Test case: combine --json-input and --json-output to show preservation
+of object numbers. QPDFWriter won't show that although --qdf with the
 original object ID comments would.
 
 The backing input source for createFromJSON is this memory block:
@@ -145,8 +144,7 @@ startxref
 %%EOF
 ```
 
-* Ignore all keys except .qpdf.
-* Verify that .qpdf.jsonVersion is 2
+* Ignore all keys except .qpdf-v2.
 * Set this->m->pdf_version based on the .qpdf.pdfVersion key
 * For each object in .qpdf.objects:
   * Walk through the object detecting any indirect objects. For each
@@ -170,15 +168,14 @@ Documentation:
 
 Serialized PDF:
 
-The JSON output will have a "qpdf" key containing
-* jsonversion
+The JSON output will have a "qpdf-v2" key containing
 * pdfversion
 * maxobjectid
 * objects
 
-The "qpdf" key replaces "objects" and "objectinfo" in v1 JSON.
+In regular json mode, "objectinfo" is gone.
 
-Within .qpdf.objects, the key is "obj:o g R" or "trailer", and the
+Within .objects, the key is "obj:o g R" or "trailer", and the
 value is a dictionary with exactly one of "value" or "stream" as its
 single key.
 
@@ -222,11 +219,11 @@ for when the file is read back in.
 CLI:
 
 Example workflow:
-* qpdf in.pdf --to-json > pdf.json
+* qpdf in.pdf --json-output=2 pdf.json
 * edit pdf.json
-* qpdf --create-from-json=pdf.json out.pdf
+* qpdf --json-input pdf.json out.pdf
 
-* qpdf in.pdf --to-json > pdf.json
+* qpdf in.pdf --json-output=2 pdf.json
 * edit pdf.json keeping only objects that need to be changed
 * qpdf in.pdf --update-from-json=pdf.json out.pdf
author	Jay Berkenbilt <ejb@ql.org>	2022-05-19 00:22:57 +0200
committer	Jay Berkenbilt <ejb@ql.org>	2022-05-20 15:16:25 +0200
commit	6f43bf8de36b08c55b172b4f4133c79657651666 (patch)
tree	ec17bbf42d9ea78d44ab3b9d2cac363cc6b9bc68 /TODO
parent	23fc6756f1894e1af35853eb2251f08d5b25cf30 (diff)
download	qpdf-6f43bf8de36b08c55b172b4f4133c79657651666.tar.zst