aboutsummaryrefslogtreecommitdiffstats
path: root/libqpdf/QPDFTokenizer.cc
AgeCommit message (Collapse)Author
2019-08-20Improve invalid name token warning messageJay Berkenbilt
This message used to only appear for PDF >= 1.2. The invalid name is valid for PDF 1.0 and 1.1. However, since QPDFWriter may write a newer version, it's better to detect and warn in all cases. Therefore make the warning more informative.
2019-08-20Handle invalid name tokens symmetrically for PDF < 1.2 (fixes #332)Jay Berkenbilt
2019-06-22Remove broken QPDFTokenizer::expectInlineImageJay Berkenbilt
2019-06-21Fix sign and conversion warnings (major)Jay Berkenbilt
This makes all integer type conversions that have potential data loss explicit with calls that do range checks and raise an exception. After this commit, qpdf builds with no warnings when -Wsign-conversion -Wconversion is used with gcc or clang or when -W3 -Wd4800 is used with MSVC. This significantly reduces the likelihood of potential crashes from bogus integer values. There are some parts of the code that take int when they should take size_t or an offset. Such places would make qpdf not support files with more than 2^31 of something that usually wouldn't be so large. In the event that such a file shows up and is valid, at least qpdf would raise an error in the right spot so the issue could be legitimately addressed rather than failing in some weird way because of a silent overflow condition.
2019-03-12Undefined functions because of missing std:: or header. (#295)Thorsten Schöning
* [bcc32 Error] QPDF.cc(375): E2268 Call to undefined function 'atof' Full parser context QPDF.cc(358): parsing: void QPDF::parse(const char *) * [bcc32 Error] QPDFTokenizer.cc(183): E2268 Call to undefined function 'strtol' Full parser context QPDFTokenizer.cc(163): parsing: void QPDFTokenizer::resolveLiteral() * [bcc32 Error] pdf-split-pages.cc(52): E2268 Call to undefined function 'exit' Full parser context pdf-split-pages.cc(50): parsing: void usage() * PR #295: Including "cstdlib" should be replaced with "stdlib.h" to be more consistent. At the same time I changed the order of the surrounding includes to reflect alphabetical order, because at some files this already have been the case.
2019-02-01Make inline image token exactly contain the image dataJay Berkenbilt
Do not include the trailing EI, and handle cases where EI is not preceded by a delimiter. Such cases have been seen in the wild.
2019-01-31Improve locating inline image's EIJay Berkenbilt
We've actually seen a PDF file in the wild that contained EI surrounded by delimiters inside the image data, which confused qpdf's naive code. This significantly improves EI detection.
2019-01-31Refactor QPDFTokenizer's inline image handlingJay Berkenbilt
Add a version of expectInlineImage that takes an input source and searches for EI. This is in preparation for improving the way EI is found. This commit just refactors the code without changing the functionality and adds tests to make sure the old and new code behave identically.
2019-01-31Inline image token value ends with EI, not delimiterJay Berkenbilt
The inline image token erroneously included the delimiter that followed EI. The ObjectHandle created from it was correct.
2018-08-06Fix EOL handling inside strings (fixes #226)Jay Berkenbilt
CR, CRLF, and LF are all supposed to be treated as LF; only one EOL is to be ignored after backslash.
2018-05-05Fix small logic error in Token construct (fixes #206)Jay Berkenbilt
The special case around name token was not reachable. This would only affect constructors of name tokens that were represented in non-canonical form such as with a hex substitution for a printable character. The error was harmless but still a bug.
2018-02-19More robust handling of type errorsJay Berkenbilt
Give objects descriptions and context so it is possible to issue warnings instead of fatal errors for attempts to access objects of the wrong type.
2018-02-19Implement TokenFilter and refactor Pl_QPDFTokenizerJay Berkenbilt
Implement a TokenFilter class and refactor Pl_QPDFTokenizer to use a TokenFilter class called ContentNormalizer. Pl_QPDFTokenizer is now a general filter that passes data through a TokenFilter.
2018-02-19Inline image token typeJay Berkenbilt
2018-02-19Push QPDFTokenizer members into a nested structureJay Berkenbilt
This is for protection against future ABI breaking changes.
2018-02-19Lexer enhancements: EOF, comment, spaceJay Berkenbilt
Significant enhancements to the lexer to improve EOF handling and to support comments and spaces as tokens. Various other minor issues were fixed as well.
2018-01-29Minor fixes to tokenizerJay Berkenbilt
2017-08-22Limit token length during xref recoveryJay Berkenbilt
While scanning the file looking for objects, limit the length of tokens we allow. This prevents us from getting caught up in reading a file character by character while digging through large streams.
2017-08-11Find starxref without PCREJay Berkenbilt
2017-08-11Allow QPDFTokenizer::readToken to return bad tokensJay Berkenbilt
Sometimes we want to ignore bad tokens rather than having them throw an exception. A coverage case is commented out here and added in a later commit.
2017-07-30Include missing header in QPDFTokenizer.cc (fixes #125)Pranjal Bhor
Required for strtol()
2017-07-27Move lexer helper functions to QUtilJay Berkenbilt
2017-07-27Remove PCRE from QPDFTokenizerJay Berkenbilt
2013-10-18Security: replace operator[] with atJay Berkenbilt
For std::string and std::vector, replace operator[] with at. This was done using an automated process. See README.hardening for details.
2013-03-04Remove all old-style casts from C++ codeJay Berkenbilt
2013-01-20Add QPDFObjectHandle::parseContentStream methodJay Berkenbilt
This method allows parsing of the PDF objects in a content stream or array of content streams.
2012-08-11Bug fix: let EOF resolve literal tokenJay Berkenbilt
Previously only whitespace and comments did it. This fix is needed for object streams whose last object is a literal (name, integer, real, string) not terminated by space or newline.
2012-08-11Refactor: move resolution of literal to its own methodJay Berkenbilt
2012-07-21Move readToken from QPDF to QPDFTokenizerJay Berkenbilt
2012-06-20ABI change: fix use of off_t, size_t, and integer typesJay Berkenbilt
Significantly improve the code's use of off_t for file offsets, size_t for memory sizes, and integer types in cases where there has to be compatibility with external interfaces. Rework sections of the code that would have prevented qpdf from working on files larger than 2 (or maybe 4) GB in size.
2011-12-28Don't declare any PCRE objects static.Jay Berkenbilt
2009-10-12do DLL_EXPORT only in header files and only at the class or top-level ↵Jay Berkenbilt
function level git-svn-id: svn+q:///qpdf/trunk@796 71b93d88-0707-0410-a8cf-f5a4172ac649
2009-09-26removed qexc; non-compatible ABI changeJay Berkenbilt
git-svn-id: svn+q:///qpdf/trunk@709 71b93d88-0707-0410-a8cf-f5a4172ac649
2009-08-06stick DLL_EXPORT in front of every public method of every public classJay Berkenbilt
git-svn-id: svn+q:///qpdf/trunk@688 71b93d88-0707-0410-a8cf-f5a4172ac649
2009-02-21fix many typos in comments and stringsJay Berkenbilt
git-svn-id: svn+q:///qpdf/trunk@651 71b93d88-0707-0410-a8cf-f5a4172ac649
2008-05-04missing header files for gcc 4.3Jay Berkenbilt
git-svn-id: svn+q:///qpdf/trunk@607 71b93d88-0707-0410-a8cf-f5a4172ac649
2008-04-29update release date to actual daterelease-qpdf-2.0Jay Berkenbilt
git-svn-id: svn+q:///qpdf/trunk@599 71b93d88-0707-0410-a8cf-f5a4172ac649