diff options
-rw-r--r-- | ChangeLog | 5 | ||||
-rw-r--r-- | README.maintainer | 4 | ||||
-rw-r--r-- | TODO | 107 | ||||
-rw-r--r-- | manual/qpdf-manual.xml | 160 |
4 files changed, 174 insertions, 102 deletions
@@ -1,3 +1,8 @@ +2013-03-25 Jay Berkenbilt <ejb@ql.org> + + * manual/qpdf-manual.xml: Document the casting policy that is + followed in qpdf's implementation. + 2013-03-11 Jay Berkenbilt <ejb@ql.org> * When creating Windows binary distributions, make sure to only diff --git a/README.maintainer b/README.maintainer index 4be45f51..a1ac5847 100644 --- a/README.maintainer +++ b/README.maintainer @@ -31,7 +31,9 @@ Release Reminders * Check all open issues in the sourceforge trackers and on github. * If any interfaces were added or changed, check C API to see whether - changes are appropriate there as well. + changes are appropriate there as well. If necessary, review the + casting policy in the manual, and ensure that integer types are + properly handled. * Increment shared library version information as needed (libqpdf/build.mk) @@ -1,3 +1,9 @@ +4.1.0 +===== + + * New public interfaces have been added. + + 4.2.0 ===== @@ -38,107 +44,6 @@ - See ../misc/broken-files -4.1.0 -===== - - * Add to documentation, and mention this documentation in - README.maintainer: - - Casting policy. - - The C++ code in qpdf is free of old-style casts except where - unavoidable (e.g. where the old-style cast is in a macro provided - by a third-party header file). When there is a need for a cast, it - is handled, in order of preference by rewriting the code to avoid - the need for a cast, calling const_cast, calling static_cast, - calling reinterpret_cast, or calling some combination of the above. - The casting policy explicitly prohibits casting between sizes for - no purpose other than to quiet a compiler warning when there is no - reasonable chance of a problem resulting. The reason for this - exclusion is that it takes away enabling additional compiler - warnings as a tool for making future improvements to this aspect of - the code and also damages the readability of the code. As a last - resort, a compiler-specific pragma may be used to suppress a - warning that we don't want to fix. Examples may include - suppressing warnings about the use of old-style casts in code that - is shared between C and C++ code. - - There are a few significant areas where casting is common in the qpdf - sources or where casting would be required to quiet higher levels - of compiler warnings but is omitted at present: - - * signed vs. unsigned char. For historical reasons, there are a - lot of places in qpdf's internals that deal with unsigned char, - which means that a lot of casting is required to interoperate - with standard library calls and std::string. In retrospect, - qpdf should have probably used signed char everywhere and just - cast to unsigned char when needed. There are reinterpret_cast - calls to go between char* and unsigned char*, and there are - static_cast calls to go between char and unsigned char. These - should always be safe. - - * non-const unsigned char* used in Pipeline interface. The - pipeline interface has a write() call that uses unsigned char* - without a const qualifier. The main reason for this is to - support pipelines that make calls to third-party libraries, such - as zlib, that don't include const in their interfaces. - Unfortunately, there are many places in the code where it is - desirable to have const char* with pipelines. None of the - pipeline implementations in qpdf currently modify the data - passed to write, and doing so would be counter to the intent of - Pipeline. There are places in the code where const_cast is used - to remove the const-ness of pointers going into Pipelines. This - could be potentially unsafe, but there is adequate testing to - assert that it is safe in qpdf's code. - - * size_t vs. qpdf_offset_t. This is pretty much unavoidable since - offsets are signed types and sizes are unsigned types. Whenever - it is necessary to seek by an amount given by a size_t, it - becomes necessary to mix and match between size_t and - qpdf_offset_t. Additionally, qpdf sometimes treats memory - buffers like files, and those seek interfaces have to be - consistent with file-based input sources. Neither gcc nor MSVC - give warnings for this case by default, but both have warning - flags that can enable this. (MSVC: /W14267 or /W3 (which also - enables some additional warnings that we ignore); gcc: - -Wconversion -Wsign-conversion). This could matter for files - whose sizes are larger than 2^63 bytes, but it is reasonable to - expect that a world where such files are common would also have - larger size_t and qpdf_offset_t types in it. I am not aware of - any cases where 32-bit systems that have size_t smaller than - qpdf_offset_t could run into problems, though I can't - conclusively rule out the possibility. In the event that - someone should produce a file that qpdf can't handle because of - what is suspected to be issues involving the handling of size_t - vs. qpdf_offset_t (such files may behave properly on 64-bit - systems but not on 32-bit systems and may have very large - embedded files or streams, for example), the above mentioned - warning flags could be enabled and all those implicit - conversions could be carefully scrutinized. (I have already - gone through that exercise once in adding support for files > - 4GB in size.) I continue to be commited to supporting large - files on 32-bit systems, but I would not go to any lengths to - support corner cases involving large embedded files or large - streams that work on 64-bit systems but not on 32-bit systems - because of size_t being too small. It is reasonable to assume - that anyone working with such files would be using a 64-bit - system anyway. - - * size_t vs. int. There are some cases where size_t and int or - size_t and unsigned int are used interchangeably. These cases - occur when working with very small amounts of memory, such as - with the bit readers (where we're working with just a few bytes - at a time), some cases of strlen, and a few other cases. I have - scrutinized all of these cases and determined them to be safe, - but there is no mechanism in the code to ensure that new unsafe - conversions between int and size_t aren't introduced short of - good testing and strong awareness of the issues. Again, if any - such bugs are suspected in the future, enable the additional - warning flags and scrutinizing the warnings would be in order. - - * New public interfaces have been added. - - General ======= diff --git a/manual/qpdf-manual.xml b/manual/qpdf-manual.xml index 59b6c355..cfa6ec2b 100644 --- a/manual/qpdf-manual.xml +++ b/manual/qpdf-manual.xml @@ -1623,6 +1623,166 @@ outfile.pdf</option> </itemizedlist> </para> </sect1> + <sect1 id="ref.casting"> + <title>Casting Policy</title> + <para> + This section describes the casting policy followed by qpdf's + implementation. This is no concern to qpdf's end users and + largely of no concern to people writing code that uses qpdf, but + it could be of interest to people who are porting qpdf to a new + platform or who are making modifications to the code. + </para> + <para> + The C++ code in qpdf is free of old-style casts except where + unavoidable (e.g. where the old-style cast is in a macro provided + by a third-party header file). When there is a need for a cast, + it is handled, in order of preference, by rewriting the code to + avoid the need for a cast, calling + <function>const_cast</function>, calling + <function>static_cast</function>, calling + <function>reinterpret_cast</function>, or calling some combination + of the above. As a last resort, a compiler-specific + <literal>#pragma</literal> may be used to suppress a warning that + we don't want to fix. Examples may include suppressing warnings + about the use of old-style casts in code that is shared between C + and C++ code. + </para> + <para> + The casting policy explicitly prohibits casting between integer + sizes for no purpose other than to quiet a compiler warning when + there is no reasonable chance of a problem resulting. The reason + for this exclusion is that the practice of adding these additional + casts precludes future use of additional compiler warnings as a + tool for making future improvements to this aspect of the code, + and it also damages the readability of the code. + </para> + <para> + There are a few significant areas where casting is common in the + qpdf sources or where casting would be required to quiet higher + levels of compiler warnings but is omitted at present: + <itemizedlist> + <listitem> + <para> + <type>char</type> vs. <type>unsigned char</type>. For + historical reasons, there are a lot of places in qpdf's + internals that deal with <type>unsigned char</type>, which + means that a lot of casting is required to interoperate with + standard library calls and <type>std::string</type>. In + retrospect, qpdf should have probably used regular (signed) + <type>char</type> and <type>char*</type> everywhere and just + cast to <type>unsigned char</type> when needed, but it's too + late to make that change now. There are + <function>reinterpret_cast</function> calls to go between + <type>char*</type> and <type>unsigned char*</type>, and there + are <function>static_cast</function> calls to go between + <type>char</type> and <type>unsigned char</type>. These should + always be safe. + </para> + </listitem> + <listitem> + <para> + Non-const <type>unsigned char*</type> used in the + <type>Pipeline</type> interface. The pipeline interface has a + <function>write</function> call that uses <type>unsigned + char*</type> without a <type>const</type> qualifier. The main + reason for this is to support pipelines that make calls to + third-party libraries, such as zlib, that don't include + <type>const</type> in their interfaces. Unfortunately, there + are many places in the code where it is desirable to have + <type>const char*</type> with pipelines. None of the pipeline + implementations in qpdf currently modify the data passed to + write, and doing so would be counter to the intent of + <type>Pipeline</type>, but there is nothing in the code to + prevent this from being done. There are places in the code + where <function>const_cast</function> is used to remove the + const-ness of pointers going into <type>Pipeline</type>s. This + could theoretically be unsafe, but there is adequate testing to + assert that it is safe and will remain safe in qpdf's code. + </para> + </listitem> + <listitem> + <para> + <type>size_t</type> vs. <type>qpdf_offset_t</type>. This is + pretty much unavoidable since sizes are unsigned types and + offsets are signed types. Whenever it is necessary to seek by + an amount given by a <type>size_t</type>, it becomes necessary + to mix and match between <type>size_t</type> and + <type>qpdf_offset_t</type>. Additionally, qpdf sometimes + treats memory buffers like files (as with + <type>BufferInputSource</type>, and those seek interfaces have + to be consistent with file-based input sources. Neither gcc + nor MSVC give warnings for this case by default, but both have + warning flags that can enable this. (MSVC: + <option>/W14267</option> or <option>/W3</option>, which also + enables some additional warnings that we ignore; gcc: + <option>-Wconversion -Wsign-conversion</option>). This could + matter for files whose sizes are larger than + 2<superscript>63</superscript> bytes, but it is reasonable to + expect that a world where such files are common would also have + larger <type>size_t</type> and <type>qpdf_offset_t</type> types + in it. On most 64-bit systems at the time of this writing (the + release of version 4.1.0 of qpdf), both <type>size_t</type> and + <type>qpdf_offset_t</type> are 64-bit integer types, while on + many current 32-bit systems, <type>size_t</type> is a 32-bit + type while <type>qpdf_offset_t</type> is a 64-bit type. I am + not aware of any cases where 32-bit systems that have + <type>size_t</type> smaller than <type>qpdf_offset_t</type> + could run into problems. Although I can't conclusively rule + out the possibility of such problems existing, I suspect any + cases would be pretty contrived. In the event that someone + should produce a file that qpdf can't handle because of what is + suspected to be issues involving the handling of + <type>size_t</type> vs. <type>qpdf_offset_t</type> (such files + may behave properly on 64-bit systems but not on 32-bit systems + because they have very large embedded files or streams, for + example), the above mentioned warning flags could be enabled + and all those implicit conversions could be carefully + scrutinized. (I have already gone through that exercise once + in adding support for files larger than 4 GB in size.) I + continue to be commited to supporting large files on 32-bit + systems, but I would not go to any lengths to support corner + cases involving large embedded files or large streams that work + on 64-bit systems but not on 32-bit systems because of + <type>size_t</type> being too small. It is reasonable to + assume that anyone working with such files would be using a + 64-bit system anyway since many 32-bit applications would have + similar difficulties. + </para> + </listitem> + <listitem> + <para> + <type>size_t</type> vs. <type>int</type> or <type>long</type>. + There are some cases where <type>size_t</type> and + <type>int</type> or <type>long</type> or <type>size_t</type> + and <type>unsigned int</type> or <type>unsigned long</type> are + used interchangeably. These cases occur when working with very + small amounts of memory, such as with the bit readers (where + we're working with just a few bytes at a time), some cases of + <function>strlen</function>, and a few other cases. I have + scrutinized all of these cases and determined them to be safe, + but there is no mechanism in the code to ensure that new unsafe + conversions between <type>int</type> and <type>size_t</type> + aren't introduced short of good testing and strong awareness of + the issues. Again, if any such bugs are suspected in the + future, enabling the additional warning flags and scrutinizing + the warnings would be in order. + </para> + </listitem> + </itemizedlist> + </para> + <para> + To be clear, I believe qpdf to be well-behaved with respect to + sizes and offsets, and qpdf's test suite includes actual + generation and full processing of files larger than 4 GB in + size. The issues raised here are largely academic and should not + in any way be interpreted to mean that qpdf has practical problems + involving sloppiness with integer types. I also believe that + appropriate measures have been taken in the code to avoid problems + with signed vs. unsigned integers from resulting in memory + overwrites or other issues with potential security implications, + though there are never any absolute guarantees. + </para> + </sect1> <sect1 id="ref.encryption"> <title>Encryption</title> <para> |