diff options
author | Jay Berkenbilt <ejb@ql.org> | 2020-12-22 21:19:18 +0100 |
---|---|---|
committer | Jay Berkenbilt <ejb@ql.org> | 2020-12-26 14:48:20 +0100 |
commit | 0675a3f61a465f282eba8e1f54bdda3920257959 (patch) | |
tree | 9b3545baae79933df6bbbc237bdd6b8bdbbfb263 /TODO | |
parent | cc8895078a1d64928e8ee335f1e8c7d6928de1b3 (diff) | |
download | qpdf-0675a3f61a465f282eba8e1f54bdda3920257959.tar.zst |
Decide not to allow stream data providers to modify dictionary
Diffstat (limited to 'TODO')
-rw-r--r-- | TODO | 51 |
1 files changed, 46 insertions, 5 deletions
@@ -29,11 +29,6 @@ Candidates for upcoming release * big page even with --remove-unreferenced-resources=yes, even with --empty * optimize image failure because of colorspace -* Make it possible for StreamDataProvider to modify the stream - dictionary in addition to the stream data so it can calculate things - about the dictionary at runtime. Will require a small change to - QPDFWriter. - * Take flattenRotation code from pdf-split and do something with it, maybe adding it to the library. Once there, call it from pdf-split and bump up the required version of qpdf. @@ -558,3 +553,49 @@ I find it useful to make reference to them in this list filtering and tokenizer rewrite and should be done in a manner that takes advantage of the other lexical features. This sanitizer should also clear metadata and replace images. + + * Here are some notes about having stream data providers modify + stream dictionaries. I had wanted to add this functionality to make + it more efficient to create stream data providers that may + dynamically decide what kind of filters to use and that may end up + modifying the dictionary conditionally depending on the original + stream data. Ultimately I decided not to implement this feature. + This paragraph describes why. + + * When writing, the way objects are placed into the queue for + writing strongly precludes creation of any new indirect objects, + or even changing which indirect objects are referenced from which + other objects, because we sometimes write as we are traversing + and enqueuing objects. For non-linearized files, there is a risk + that an indirect object that used to be referenced would no + longer be referenced, and whether it was already written to the + output file would be based on an accident of where it was + encountered when traversing the object structure. For linearized + files, the situation is considerably worse. We decide which + section of the file to write an object to based on a mapping of + which objects are used by which other objects. Changing this + mapping could cause an object to appear in the wrong section, to + be written even though it is unreferenced, or to be entirely + omitted since, during linearization, we don't enqueue new objects + as we traverse for writing. + + * There are several places in QPDFWriter that query a stream's + dictionary in order to prepare for writing or to make decisions + about certain aspects of the writing process. If the stream data + provider has the chance to modify the dictionary, every piece of + code that gets stream data would have to be aware of this. This + would potentially include end user code. For example, any code + that called getDict() on a stream before installing a stream data + provider and expected that dictionary to be valid would + potentially be broken. As implemented right now, you must perform + any modifications on the dictionary in advance and provided + /Filter and /DecodeParms at the time you installed the stream + data provider. This means that some computations would have to be + done more than once, but for linearized files, stream data + providers are already called more than once. If the work done by + a stream data provider is especially expensive, it can implement + its own cache. + + The implementation of pluggable stream filters includes an example + that illustrates how a program might handle making decisions about + filters and decode parameters based on the input data. |