diff options
Diffstat (limited to 'manual')
-rw-r--r-- | manual/qpdf-manual.xml | 195 |
1 files changed, 62 insertions, 133 deletions
diff --git a/manual/qpdf-manual.xml b/manual/qpdf-manual.xml index 225153b6..721ce845 100644 --- a/manual/qpdf-manual.xml +++ b/manual/qpdf-manual.xml @@ -3340,139 +3340,68 @@ outfile.pdf</option> and C++ code. </para> <para> - The casting policy explicitly prohibits casting between integer - sizes for no purpose other than to quiet a compiler warning when - there is no reasonable chance of a problem resulting. The reason - for this exclusion is that the practice of adding these additional - casts precludes future use of additional compiler warnings as a - tool for making future improvements to this aspect of the code, - and it also damages the readability of the code. - </para> - <para> - There are a few significant areas where casting is common in the - qpdf sources or where casting would be required to quiet higher - levels of compiler warnings but is omitted at present: - <itemizedlist> - <listitem> - <para> - <type>char</type> vs. <type>unsigned char</type>. For - historical reasons, there are a lot of places in qpdf's - internals that deal with <type>unsigned char</type>, which - means that a lot of casting is required to interoperate with - standard library calls and <type>std::string</type>. In - retrospect, qpdf should have probably used regular (signed) - <type>char</type> and <type>char*</type> everywhere and just - cast to <type>unsigned char</type> when needed, but it's too - late to make that change now. There are - <function>reinterpret_cast</function> calls to go between - <type>char*</type> and <type>unsigned char*</type>, and there - are <function>static_cast</function> calls to go between - <type>char</type> and <type>unsigned char</type>. These should - always be safe. - </para> - </listitem> - <listitem> - <para> - Non-const <type>unsigned char*</type> used in the - <type>Pipeline</type> interface. The pipeline interface has a - <function>write</function> call that uses <type>unsigned - char*</type> without a <type>const</type> qualifier. The main - reason for this is to support pipelines that make calls to - third-party libraries, such as zlib, that don't include - <type>const</type> in their interfaces. Unfortunately, there - are many places in the code where it is desirable to have - <type>const char*</type> with pipelines. None of the pipeline - implementations in qpdf currently modify the data passed to - write, and doing so would be counter to the intent of - <type>Pipeline</type>, but there is nothing in the code to - prevent this from being done. There are places in the code - where <function>const_cast</function> is used to remove the - const-ness of pointers going into <type>Pipeline</type>s. This - could theoretically be unsafe, but there is adequate testing to - assert that it is safe and will remain safe in qpdf's code. - </para> - </listitem> - <listitem> - <para> - <type>size_t</type> vs. <type>qpdf_offset_t</type>. This is - pretty much unavoidable since sizes are unsigned types and - offsets are signed types. Whenever it is necessary to seek by - an amount given by a <type>size_t</type>, it becomes necessary - to mix and match between <type>size_t</type> and - <type>qpdf_offset_t</type>. Additionally, qpdf sometimes - treats memory buffers like files (as with - <type>BufferInputSource</type>, and those seek interfaces have - to be consistent with file-based input sources. Neither gcc - nor MSVC give warnings for this case by default, but both have - warning flags that can enable this. (MSVC: - <option>/W14267</option> or <option>/W3</option>, which also - enables some additional warnings that we ignore; gcc: - <option>-Wconversion -Wsign-conversion</option>). This could - matter for files whose sizes are larger than - 2<superscript>63</superscript> bytes, but it is reasonable to - expect that a world where such files are common would also have - larger <type>size_t</type> and <type>qpdf_offset_t</type> types - in it. On most 64-bit systems at the time of this writing (the - release of version 4.1.0 of qpdf), both <type>size_t</type> and - <type>qpdf_offset_t</type> are 64-bit integer types, while on - many current 32-bit systems, <type>size_t</type> is a 32-bit - type while <type>qpdf_offset_t</type> is a 64-bit type. I am - not aware of any cases where 32-bit systems that have - <type>size_t</type> smaller than <type>qpdf_offset_t</type> - could run into problems. Although I can't conclusively rule - out the possibility of such problems existing, I suspect any - cases would be pretty contrived. In the event that someone - should produce a file that qpdf can't handle because of what is - suspected to be issues involving the handling of - <type>size_t</type> vs. <type>qpdf_offset_t</type> (such files - may behave properly on 64-bit systems but not on 32-bit systems - because they have very large embedded files or streams, for - example), the above mentioned warning flags could be enabled - and all those implicit conversions could be carefully - scrutinized. (I have already gone through that exercise once - in adding support for files larger than 4 GB in size.) I - continue to be committed to supporting large files on 32-bit - systems, but I would not go to any lengths to support corner - cases involving large embedded files or large streams that work - on 64-bit systems but not on 32-bit systems because of - <type>size_t</type> being too small. It is reasonable to - assume that anyone working with such files would be using a - 64-bit system anyway since many 32-bit applications would have - similar difficulties. - </para> - </listitem> - <listitem> - <para> - <type>size_t</type> vs. <type>int</type> or <type>long</type>. - There are some cases where <type>size_t</type> and - <type>int</type> or <type>long</type> or <type>size_t</type> - and <type>unsigned int</type> or <type>unsigned long</type> are - used interchangeably. These cases occur when working with very - small amounts of memory, such as with the bit readers (where - we're working with just a few bytes at a time), some cases of - <function>strlen</function>, and a few other cases. I have - scrutinized all of these cases and determined them to be safe, - but there is no mechanism in the code to ensure that new unsafe - conversions between <type>int</type> and <type>size_t</type> - aren't introduced short of good testing and strong awareness of - the issues. Again, if any such bugs are suspected in the - future, enabling the additional warning flags and scrutinizing - the warnings would be in order. - </para> - </listitem> - </itemizedlist> - </para> - <para> - To be clear, I believe qpdf to be well-behaved with respect to - sizes and offsets, and qpdf's test suite includes actual - generation and full processing of files larger than 4 GB in - size. The issues raised here are largely academic and should not - in any way be interpreted to mean that qpdf has practical problems - involving sloppiness with integer types. I also believe that - appropriate measures have been taken in the code to avoid problems - with signed vs. unsigned integers from resulting in memory - overwrites or other issues with potential security implications, - though there are never any absolute guarantees. + The <classname>QIntC</classname> namespace, provided by + <filename>include/qpdf/QIntC.hh</filename>, implements safe + functions for converting between integer types. These functions do + range checking and throw a <type>std::range_error</type>, which is + subclass of <type>std::runtime_error</type>, if conversion from one + integer type to another results in loss of information. There are + many cases in which we have to move between different integer + types because of incompatible integer types used in interoperable + interfaces. Some are unavoidable, such as moving between sizes and + offsets, and others are there because of old code that is too in + entrenched to be fixable without breaking source compatibility and + causing pain for users. QPDF is compiled with extra warnings to + detect conversions with potential data loss, and all such cases + should be fixed by either using a function from + <classname>QIntC</classname> or a + <function>static_cast</function>. + </para> + <para> + When the intention is just to switch the type because of + exchanging data between incompatible interfaces, use + <classname>QIntC</classname>. This is the usual case. However, + there are some cases in which we are explicitly intending to use + the exact same bit pattern with a different type. This is most + common when switching between signed and unsigned characters. A + lot of qpdf's code uses unsigned characters internally, but + <type>std::string</type> and <type>char</type> are signed. Using + <function>QIntC::to_char</function> would be wrong for converting + from unsigned to signed characters because a negative + <type>char</type> value and the corresponding <type>unsigned + char</type> value greater than 127 <emphasis>mean the same + thing</emphasis>. There are also cases in which we use + <function>static_cast</function> when working with bit fields + where we are not representing a numerical value but rather a bunch + of bits packed together in some integer type. Also note that + <type>size_t</type> and <type>long</type> both typically differ + between 32-bit and 64-bit environments, so sometimes an explicit + cast may not be needed to avoid warnings on one platform but may + be needed on another. A conversion with + <classname>QIntC</classname> should always be used when the types + are different even if the underlying size is the same. QPDF's CI + build builds on 32-bit and 64-bit platforms, and the test suite is + very thorough, so it is hard to make any of the potential errors + here without being caught in build or test. + </para> + <para> + Non-const <type>unsigned char*</type> is used in the + <type>Pipeline</type> interface. The pipeline interface has a + <function>write</function> call that uses <type>unsigned + char*</type> without a <type>const</type> qualifier. The main + reason for this is to support pipelines that make calls to + third-party libraries, such as zlib, that don't include + <type>const</type> in their interfaces. Unfortunately, there are + many places in the code where it is desirable to have <type>const + char*</type> with pipelines. None of the pipeline implementations + in qpdf currently modify the data passed to write, and doing so + would be counter to the intent of <type>Pipeline</type>, but there + is nothing in the code to prevent this from being done. There are + places in the code where <function>const_cast</function> is used + to remove the const-ness of pointers going into + <type>Pipeline</type>s. This could theoretically be unsafe, but + there is adequate testing to assert that it is safe and will + remain safe in qpdf's code. </para> </sect1> <sect1 id="ref.encryption"> |