From 62d47bff523ec6b64161651b33bb1563b6a80776 Mon Sep 17 00:00:00 2001 From: Jay Berkenbilt Date: Sat, 21 May 2022 16:48:36 -0400 Subject: TODO: notes on QPDFPagesTree --- TODO | 42 +++++++++++++++++++++++++++++++++--------- 1 file changed, 33 insertions(+), 9 deletions(-) diff --git a/TODO b/TODO index b13deecc..c8fe968c 100644 --- a/TODO +++ b/TODO @@ -11,6 +11,7 @@ In order: Other (do in any order): +* QPDFPagesTree -- avoid ever flattening the pages tree. * Check about runpath in the linux-bin distribution. I think the appimage build specifically is setting the runpath, which is actually desirable in this case. Make sure to understand and @@ -56,17 +57,8 @@ Output JSON v2 Some of this documentation has drifted from the actual implementation. -Make sure pages tree repair generates warnings. - * Document that /Length is ignored in stream dictionary replacements -Try to never flatten pages tree. Make sure we do something reasonable -with pages tree repair. The problem is that if pages tree repair is -done as a side effect of running --json, the qpdf part of the json may -contain object numbers that aren't there. Maybe we need to indicate -whether pages tree repair has been done in the json, but this would -have to be known early in parsing, which is a problem. - General things to remember: * Make sure all the information from --check and other informational @@ -240,6 +232,38 @@ Additionally, using "n n R" as a key in "objects" and "objectinfo" messes up searching for things. +QPDFPagesTree +============= + +Partial work is on qpdf-pages-tree branch. QPDFPageTree is mostly +implemented and mostly tested. There are not enough cases of different +kinds of operations (pclm, linearize, json, etc.) with non-flat pages +trees. Insertion is not implemented. + +Page tree repair is silent (no warnings) and has a comment saying that +we don't need warnings, but I think we should have warnings now that +we have json v2. The reason is that page tree repair will change +object numbers, and it's useful to know that. + +I'm thinking we will want to keep a pages cache for efficient +insertion. There's no reason we can't keep a vector of page objects up +to date and just do a traversal the first time we do getAllPages just +like we do now. The difference is that we would not flatten the pages +tree. It would be useful to go through QPDF_pages and re-reimplement +everything without calling flattenPagesTree. Then we can remove +flattenPagesTree, which is private. + +In its current state, QPDFPagesTree does not proactively fix /Type or +correct page objects that are used multiple times. You have to +traverse the pages tree to trigger this operation. It would be nice if +we would do that somewhere but not do it more often than necessary so +isPagesObject and isPageObject are reliable and can be made more +reliable. Maybe add a validate or repair function? It should also make +sure /Count and /Parent are correct. + +refs/attic/QPDFPagesTree-old -- original, abndoned branch -- clean up +when done. + QPDFJob ======= -- cgit v1.2.3-54-g00ecf