TODO: Move lexical stuff and add detail

author: Jay Berkenbilt <ejb@ql.org> 2019-12-30 15:17:05 +0100
committer: Jay Berkenbilt <ejb@ql.org> 2020-01-13 15:18:36 +0100
commit: 49f4600dd6feae74079ad3a3678f6a390bb4e3a1 (patch)
tree: 156a1eec254ab858262d9408227b7f0d3431f39b /TODO
parent: 0ae19c375ebc24c303765953ff127ecb6a4664dc (diff)
download: qpdf-49f4600dd6feae74079ad3a3678f6a390bb4e3a1.tar.zst
1 files changed, 18 insertions, 23 deletions
diff --git a/TODO b/TODO
index 9654059d..4e367cae 100644
--- a/TODO
+++ b/TODO
@@ -59,29 +59,6 @@ C++-11
   time.
 
 
-Lexical
-=======
-
- * Make it possible to run the lexer (tokenizer) over a whole file
-   such that the following things would be possible:
-
-   * Rewrite fix-qdf in C++ so that there is no longer a runtime perl
-     dependency
-
-   * Make it possible to replace all strings in a file lexically even
-     on badly broken files. Ideally this should work files that are
-     lacking xref, have broken links, etc., and ideally it should work
-     with encrypted files if possible. This should go through the
-     streams and strings and replace them with fixed or random
-     characters, preferably, but not necessarily, in a manner that
-     works with fonts. One possibility would be to detect whether a
-     string contains characters with normal encoding, and if so, use
-     0x41. If the string uses character maps, use 0x01. The output
-     should otherwise be unrelated to the input. This could be built
-     after the filtering and tokenizer rewrite and should be done in a
-     manner that takes advantage of the other lexical features. This
-     sanitizer should also clear metadata and replace images.
-
 Page splitting/merging
 ======================
 
@@ -407,3 +384,21 @@ I find it useful to make reference to them in this list
  * If I ever decide to make appearance stream-generation aware of
    fonts or font metrics, see email from Tobias with Message-ID
    <5C3C9C6C.8000102@thax.hardliners.org> dated 2019-01-14.
+
+ * Consider creating a sanitizer to make it easier for people to send
+   broken files. Now that we have json mode, this is probably no
+   longer worth doing. Here is the previous idea, possibly implemented
+   by making it possible to run the lexer (tokenizer) over a whole
+   file. Make it possible to replace all strings in a file lexically
+   even on badly broken files. Ideally this should work files that are
+   lacking xref, have broken links, etc., and ideally it should work
+   with encrypted files if possible. This should go through the
+   streams and strings and replace them with fixed or random
+   characters, preferably, but not necessarily, in a manner that works
+   with fonts. One possibility would be to detect whether a string
+   contains characters with normal encoding, and if so, use 0x41. If
+   the string uses character maps, use 0x01. The output should
+   otherwise be unrelated to the input. This could be built after the
+   filtering and tokenizer rewrite and should be done in a manner that
+   takes advantage of the other lexical features. This sanitizer
+   should also clear metadata and replace images.
author	Jay Berkenbilt <ejb@ql.org>	2019-12-30 15:17:05 +0100
committer	Jay Berkenbilt <ejb@ql.org>	2020-01-13 15:18:36 +0100
commit	49f4600dd6feae74079ad3a3678f6a390bb4e3a1 (patch)
tree	156a1eec254ab858262d9408227b7f0d3431f39b /TODO
parent	0ae19c375ebc24c303765953ff127ecb6a4664dc (diff)
download	qpdf-49f4600dd6feae74079ad3a3678f6a390bb4e3a1.tar.zst