aboutsummaryrefslogtreecommitdiffstats
path: root/libqpdf/QPDFTokenizer.cc
AgeCommit message (Collapse)Author
2022-08-27Fix commit b45420am-holger
2022-08-27Add methods InputSource::fastRead, fastUnRead and fastTellm-holger
Provide buffered input for QPDFTokenizer.
2022-08-25Remove redundant tests in QPDFTokenizer::readTokenm-holger
2022-08-25In QPDFTokenizer::readToken move call to getToken out of loopm-holger
2022-08-25Remove unnecessary string copy in QPDFTokenizer::getTokenm-holger
2022-08-25Remove QPDFTokenizer::unread_charm-holger
2022-08-25Refactor QPDFTokenizer::betweenTokens()m-holger
2022-08-25Refactor QPDFTokenizer::presentEOFm-holger
2022-08-25Integrate booleans and null into state machine in QPDFTokenizerm-holger
2022-08-25Integrate numbers into state machine in QPDFTokenizerm-holger
2022-08-25Integrate names into state machine in QPDFTokenizerm-holger
2022-08-25Split QPDFTokenizer::handleCharacter into individual methodsm-holger
2022-08-25Refactor QPDFTokenizer::inCharCodem-holger
2022-08-25Refactor st_top case in QPDFTokenizer::handleCharacterm-holger
2022-08-25Refactor QPDFTokenizer::inHexstringm-holger
2022-08-25Code tidy: replace if with case statement in QPDFTokenizer::inStringm-holger
2022-08-25Add state st_string_escape in QPDFTokenizerm-holger
2022-08-21Add state st_string_after_cr in QPDFTokenizerm-holger
2022-08-21Add state st_char_code in QPDFTokenizerm-holger
2022-08-21Add private method QPDFTokenizer::inStringm-holger
2022-08-21Add private method QPDFTokenizer::inHexstringm-holger
2022-08-21Code tidy: replace if with case statement in QPDFTokenizer::handleCharacterm-holger
2022-08-21Add private method QPDFTokenizer::handleCharacterm-holger
2022-08-21Code tidy: replace if with case statement in QPDFTokenizer::presentCharacterm-holger
2022-08-20Avoid shrinking QPDFTokenizer::val and QPDFTokenizer::raw_valm-holger
2022-08-18Remove QPDFTokenizer::Membersm-holger
2022-07-26Code tidy : replace 0 with nullptr or truem-holger
2022-05-21Code clean up: use range-style for loops wherever possiblem-holger
Remove variables obsoleted by commit 4f24617.
2022-04-30Code clean up: use range-style for loops wherever possibleJay Berkenbilt
Where not possible, use "auto" to get the iterator type. Editorial note: I have avoid this change for a long time because of not wanting to make gratuitous changes to version history, which can obscure when certain changes were made, but with having recently touched every single file to apply automatic code formatting and with making several broad changes to the API, I decided it was time to take the plunge and get rid of the older (pre-C++11) verbose iterator syntax. The new code is just easier to read and understand, and in many cases, it will be more effecient as fewer temporary copies are being made. m-holger, if you're reading, you can see that I've finally come around. :-)
2022-04-16Use anonymous namespaces for file-private classesJay Berkenbilt
2022-04-16Use = default and = delete where possible in classesJay Berkenbilt
2022-04-09Replace PointerHolder with std::shared_ptr in library sources onlyJay Berkenbilt
(patrepl and cleanpatch are my own utilities) patrepl s/PointerHolder/std::shared_ptr/g {include,libqpdf}/qpdf/*.hh patrepl s/PointerHolder/std::shared_ptr/g libqpdf/*.cc patrepl s/make_pointer_holder/std::make_shared/g libqpdf/*.cc patrepl s/make_array_pointer_holder/QUtil::make_shared_array/g libqpdf/*.cc patrepl s,qpdf/std::shared_ptr,qpdf/PointerHolder, **/*.cc **/*.hh git restore include/qpdf/PointerHolder.hh cleanpatch ./format-code
2022-04-04Programmatically apply new formatting to codeJay Berkenbilt
Run this: for i in **/*.cc **/*.c **/*.h **/*.hh; do clang-format < $i >| $i.new && mv $i.new $i done
2022-02-08WHITESPACE ONLY -- expand tabs in source codeJay Berkenbilt
This comment expands all tabs using an 8-character tab-width. You should ignore this commit when using git blame or use git blame -w. In the early days, I used to use tabs where possible for indentation, since emacs did this automatically. In recent years, I have switched to only using spaces, which means qpdf source code has been a mixture of spaces and tabs. I have avoided cleaning this up because of not wanting gratuitous whitespaces change to cloud the output of git blame, but I changed my mind after discussing with users who view qpdf source code in editors/IDEs that have other tab widths by default and in light of the fact that I am planning to start applying automatic code formatting soon.
2022-02-04PointerHolder: deprecate getPointer() and getRefcount()Jay Berkenbilt
Use get() and use_count() instead. Add #define NO_POINTERHOLDER_DEPRECATION to remove deprecation markers for these only. This commit also removes all deprecated PointerHolder API calls from qpdf's code except in PointerHolder's test suite, which must continue to test the deprecated APIs.
2022-01-11Fix signed/unsigned char warning (fixes #604)Jay Berkenbilt
2020-10-27Revert removal of unreadCh change for performanceJay Berkenbilt
Turns out unreadCh is much more efficient than seek(-1, SEEK_CUR). Update comments and code to reflect this.
2020-10-18Stop using InputSource::unreadChJay Berkenbilt
2020-04-16Fix warnings reported by -Wshadow=local (fixes #431)Jay Berkenbilt
2019-08-20Improve invalid name token warning messageJay Berkenbilt
This message used to only appear for PDF >= 1.2. The invalid name is valid for PDF 1.0 and 1.1. However, since QPDFWriter may write a newer version, it's better to detect and warn in all cases. Therefore make the warning more informative.
2019-08-20Handle invalid name tokens symmetrically for PDF < 1.2 (fixes #332)Jay Berkenbilt
2019-06-22Remove broken QPDFTokenizer::expectInlineImageJay Berkenbilt
2019-06-21Fix sign and conversion warnings (major)Jay Berkenbilt
This makes all integer type conversions that have potential data loss explicit with calls that do range checks and raise an exception. After this commit, qpdf builds with no warnings when -Wsign-conversion -Wconversion is used with gcc or clang or when -W3 -Wd4800 is used with MSVC. This significantly reduces the likelihood of potential crashes from bogus integer values. There are some parts of the code that take int when they should take size_t or an offset. Such places would make qpdf not support files with more than 2^31 of something that usually wouldn't be so large. In the event that such a file shows up and is valid, at least qpdf would raise an error in the right spot so the issue could be legitimately addressed rather than failing in some weird way because of a silent overflow condition.
2019-03-12Undefined functions because of missing std:: or header. (#295)Thorsten Schöning
* [bcc32 Error] QPDF.cc(375): E2268 Call to undefined function 'atof' Full parser context QPDF.cc(358): parsing: void QPDF::parse(const char *) * [bcc32 Error] QPDFTokenizer.cc(183): E2268 Call to undefined function 'strtol' Full parser context QPDFTokenizer.cc(163): parsing: void QPDFTokenizer::resolveLiteral() * [bcc32 Error] pdf-split-pages.cc(52): E2268 Call to undefined function 'exit' Full parser context pdf-split-pages.cc(50): parsing: void usage() * PR #295: Including "cstdlib" should be replaced with "stdlib.h" to be more consistent. At the same time I changed the order of the surrounding includes to reflect alphabetical order, because at some files this already have been the case.
2019-02-01Make inline image token exactly contain the image dataJay Berkenbilt
Do not include the trailing EI, and handle cases where EI is not preceded by a delimiter. Such cases have been seen in the wild.
2019-01-31Improve locating inline image's EIJay Berkenbilt
We've actually seen a PDF file in the wild that contained EI surrounded by delimiters inside the image data, which confused qpdf's naive code. This significantly improves EI detection.
2019-01-31Refactor QPDFTokenizer's inline image handlingJay Berkenbilt
Add a version of expectInlineImage that takes an input source and searches for EI. This is in preparation for improving the way EI is found. This commit just refactors the code without changing the functionality and adds tests to make sure the old and new code behave identically.
2019-01-31Inline image token value ends with EI, not delimiterJay Berkenbilt
The inline image token erroneously included the delimiter that followed EI. The ObjectHandle created from it was correct.
2018-08-06Fix EOL handling inside strings (fixes #226)Jay Berkenbilt
CR, CRLF, and LF are all supposed to be treated as LF; only one EOL is to be ignored after backslash.
2018-05-05Fix small logic error in Token construct (fixes #206)Jay Berkenbilt
The special case around name token was not reachable. This would only affect constructors of name tokens that were represented in non-canonical form such as with a hex substitution for a printable character. The error was harmless but still a bug.