Decide not to allow stream data providers to modify dictionary

author: Jay Berkenbilt <ejb@ql.org> 2020-12-22 21:19:18 +0100
committer: Jay Berkenbilt <ejb@ql.org> 2020-12-26 14:48:20 +0100
commit: 0675a3f61a465f282eba8e1f54bdda3920257959 (patch)
tree: 9b3545baae79933df6bbbc237bdd6b8bdbbfb263 /TODO
parent: cc8895078a1d64928e8ee335f1e8c7d6928de1b3 (diff)
download: qpdf-0675a3f61a465f282eba8e1f54bdda3920257959.tar.zst
1 files changed, 46 insertions, 5 deletions
diff --git a/TODO b/TODO
index 5a3aad47..1479aa56 100644
--- a/TODO
+++ b/TODO
@@ -29,11 +29,6 @@ Candidates for upcoming release
   * big page even with --remove-unreferenced-resources=yes, even with --empty
   * optimize image failure because of colorspace
 
-* Make it possible for StreamDataProvider to modify the stream
-  dictionary in addition to the stream data so it can calculate things
-  about the dictionary at runtime. Will require a small change to
-  QPDFWriter.
-
 * Take flattenRotation code from pdf-split and do something with it,
   maybe adding it to the library. Once there, call it from pdf-split
   and bump up the required version of qpdf.
@@ -558,3 +553,49 @@ I find it useful to make reference to them in this list
    filtering and tokenizer rewrite and should be done in a manner that
    takes advantage of the other lexical features. This sanitizer
    should also clear metadata and replace images.
+
+ * Here are some notes about having stream data providers modify
+   stream dictionaries. I had wanted to add this functionality to make
+   it more efficient to create stream data providers that may
+   dynamically decide what kind of filters to use and that may end up
+   modifying the dictionary conditionally depending on the original
+   stream data. Ultimately I decided not to implement this feature.
+   This paragraph describes why.
+
+   * When writing, the way objects are placed into the queue for
+     writing strongly precludes creation of any new indirect objects,
+     or even changing which indirect objects are referenced from which
+     other objects, because we sometimes write as we are traversing
+     and enqueuing objects. For non-linearized files, there is a risk
+     that an indirect object that used to be referenced would no
+     longer be referenced, and whether it was already written to the
+     output file would be based on an accident of where it was
+     encountered when traversing the object structure. For linearized
+     files, the situation is considerably worse. We decide which
+     section of the file to write an object to based on a mapping of
+     which objects are used by which other objects. Changing this
+     mapping could cause an object to appear in the wrong section, to
+     be written even though it is unreferenced, or to be entirely
+     omitted since, during linearization, we don't enqueue new objects
+     as we traverse for writing.
+
+   * There are several places in QPDFWriter that query a stream's
+     dictionary in order to prepare for writing or to make decisions
+     about certain aspects of the writing process. If the stream data
+     provider has the chance to modify the dictionary, every piece of
+     code that gets stream data would have to be aware of this. This
+     would potentially include end user code. For example, any code
+     that called getDict() on a stream before installing a stream data
+     provider and expected that dictionary to be valid would
+     potentially be broken. As implemented right now, you must perform
+     any modifications on the dictionary in advance and provided
+     /Filter and /DecodeParms at the time you installed the stream
+     data provider. This means that some computations would have to be
+     done more than once, but for linearized files, stream data
+     providers are already called more than once. If the work done by
+     a stream data provider is especially expensive, it can implement
+     its own cache.
+
+   The implementation of pluggable stream filters includes an example
+   that illustrates how a program might handle making decisions about
+   filters and decode parameters based on the input data.
author	Jay Berkenbilt <ejb@ql.org>	2020-12-22 21:19:18 +0100
committer	Jay Berkenbilt <ejb@ql.org>	2020-12-26 14:48:20 +0100
commit	0675a3f61a465f282eba8e1f54bdda3920257959 (patch)
tree	9b3545baae79933df6bbbc237bdd6b8bdbbfb263 /TODO
parent	cc8895078a1d64928e8ee335f1e8c7d6928de1b3 (diff)
download	qpdf-0675a3f61a465f282eba8e1f54bdda3920257959.tar.zst