aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorJay Berkenbilt <ejb@ql.org>2018-01-14 02:18:28 +0100
committerJay Berkenbilt <ejb@ql.org>2018-01-14 02:19:11 +0100
commit512a518dd9327147444d4207cc395bff967d1079 (patch)
treef769b2c5bf47db41becf3d728543a71143262965
parentf34af6b8c18cf035814fa4f02dddb3dc964cfa3c (diff)
downloadqpdf-512a518dd9327147444d4207cc395bff967d1079.tar.zst
Update TODO
-rw-r--r--TODO41
1 files changed, 40 insertions, 1 deletions
diff --git a/TODO b/TODO
index eb89388c..33d9fa01 100644
--- a/TODO
+++ b/TODO
@@ -1,6 +1,10 @@
Soon
====
+ * Take changes on encryption-keys branch and make them usable.
+ Replace the hex encoding and decoding piece, and come up with a
+ more robust way of specifying the key.
+
* Consider whether there should be a mode in which QPDFObjectHandle
returns nulls for operations on the wrong type instead of asserting
the type. The way things are wired up now, this would have to be a
@@ -19,7 +23,7 @@ Soon
* Support user-pluggable stream filters. This would enable external
code to provide interpretation for filters that are missing from
- qpdf. Make it possible for user-provided fitlers to override
+ qpdf. Make it possible for user-provided filters to override
built-in filters. Make sure that the pluggable filters can be
prioritized so that we can poll all registered filters to see
whether they are capable of filtering a particular stream.
@@ -37,6 +41,41 @@ Soon
- See ../misc/broken-files
+Lexical
+=======
+
+Consider rewriting the tokenizer. These are rough ideas at this point.
+I may or may not do this as described.
+
+ * Use flex. Generate them from ./autogen.sh and include them in the
+ source package, but do not commit them.
+
+ * Make it possible to run the lexer (tokenizer) over a while file
+ such that the following things would be possible:
+
+ * Rewrite fix-qdf in C++ so that there is no longer a runtime perl
+ dependency
+
+ * Create a way to filter content streams that could be used to
+ preserve the content stream exactly including spaces but also to
+ do things like replace everything between a detected set of
+ markers. This is to support form flattening. Ideally, it should
+ be possible to use this programmatically on broken files.
+
+ * Make it possible to replace all strings in a file lexically even
+ on badly broken files. Ideally this should work files that are
+ lacking xref, have broken links, etc., and ideally it should work
+ with encrypted files if possible. This should go through the
+ streams and strings and replace them with fixed or random
+ characters, preferably, but not necessarily, in a manner that
+ works with fonts. One possibility would be to detect whether a
+ string contains characters with normal encoding, and if so, use
+ 0x41. If the string uses character maps, use 0x01. The output
+ should otherwise be unrelated to the input. This could be built
+ after the filtering and tokenizer rewrite and should be done in a
+ manner that takes advantage of the other lexical features. This
+ sanitizer should also clear metadata and replace images.
+
General
=======