From 99101044429c3c91bd11bdd1b26e5b6c2ceb140b Mon Sep 17 00:00:00 2001 From: Jay Berkenbilt Date: Fri, 2 Feb 2018 18:21:34 -0500 Subject: Implement TokenFilter and refactor Pl_QPDFTokenizer Implement a TokenFilter class and refactor Pl_QPDFTokenizer to use a TokenFilter class called ContentNormalizer. Pl_QPDFTokenizer is now a general filter that passes data through a TokenFilter. --- ChangeLog | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) (limited to 'ChangeLog') diff --git a/ChangeLog b/ChangeLog index 256d83ea..20cb0e80 100644 --- a/ChangeLog +++ b/ChangeLog @@ -107,6 +107,49 @@ applications that use page-level APIs in QPDFObjectHandle to be more tolerant of certain types of damaged files. + * Add QPDFObjectHandle::TokenFilter class and methods to use it to + perform lexical filtering on content streams. You can call + QPDFObjectHandle::addTokenFilter on stream object, or you can call + the higher level QPDFObjectHandle::addContentTokenFilter on a page + object to cause the stream's contents to passed through a token + filter while being retrieved by QPDFWriter or any other consumer. + For details on using TokenFilter, please see comments in + QPDFObjectHandle.hh. + + * Enhance the string, type QPDFTokenizer::Token constructor to + initialize a raw value in addition to a value. Tokens have a + value, which is a canonical representation, and a raw value. For + all tokens except strings and names, the raw value and the value + are the same. For strings, the value excludes the outer delimiters + and has non-printing characters normalized. For names, the value + resolves non-printing characters. In order to better facilitate + token filters that mostly preserve contents and to enable + developers to be mostly unconcerned about the nuances of token + values and raw values, creating string and name tokens now + properly handles this subtlety of values and raw values. When + constructing string tokens, take care to avoid passing in the + outer delimiters. This has always been the case, but it is now + clarified in comments in QPDFObjectHandle.hh::TokenFilter. This + has no impact on any existing code unless there's some code + somewhere that was relying on Token::getRawValue() returning an + empty string for a manually constructed token. The token class's + operator== method still only looks at type and value, not raw + value. For example, string tokens for <41> and (A) would still be + equal because both are representations of the string "A". + + * Add QPDFObjectHandle::isDataModified method. This method just + returns true if addTokenFilter has been called on the stream. It + enables a caller to determine whether it is safe to optimize away + piping of stream data in cases where the input and output are + expected to be the same. QPDFWriter uses this internally to skip + the optimization of not re-compressing already compressed streams + if addTokenFilter has been called. Most developers will not have + to worry about this as it is used internally in the library in the + places that need it. If you are manually retrieving stream data + with QPDFObjectHandle::getStreamData or + QPDFObjectHandle::pipeStreamData, you don't need to worry about + this at all. + 2018-02-04 Jay Berkenbilt * Add QPDFWriter::setLinearizationPass1Filename method and -- cgit v1.2.3-54-g00ecf