aboutsummaryrefslogtreecommitdiffstats
path: root/manual/qpdf-manual.xml
diff options
context:
space:
mode:
authorJay Berkenbilt <ejb@ql.org>2019-06-21 14:53:08 +0200
committerJay Berkenbilt <ejb@ql.org>2019-06-21 19:17:45 +0200
commit6fca27995ea2d4da06bab85e5d7911e01618ddda (patch)
tree0265aeb3d27bafade1a19cea2e75016f9676613c /manual/qpdf-manual.xml
parentcc2e8853b5a87eefa304c03d2d242a339d942c77 (diff)
downloadqpdf-6fca27995ea2d4da06bab85e5d7911e01618ddda.tar.zst
Update casting policy in the documentation
Diffstat (limited to 'manual/qpdf-manual.xml')
-rw-r--r--manual/qpdf-manual.xml195
1 files changed, 62 insertions, 133 deletions
diff --git a/manual/qpdf-manual.xml b/manual/qpdf-manual.xml
index 225153b6..721ce845 100644
--- a/manual/qpdf-manual.xml
+++ b/manual/qpdf-manual.xml
@@ -3340,139 +3340,68 @@ outfile.pdf</option>
and C++ code.
</para>
<para>
- The casting policy explicitly prohibits casting between integer
- sizes for no purpose other than to quiet a compiler warning when
- there is no reasonable chance of a problem resulting. The reason
- for this exclusion is that the practice of adding these additional
- casts precludes future use of additional compiler warnings as a
- tool for making future improvements to this aspect of the code,
- and it also damages the readability of the code.
- </para>
- <para>
- There are a few significant areas where casting is common in the
- qpdf sources or where casting would be required to quiet higher
- levels of compiler warnings but is omitted at present:
- <itemizedlist>
- <listitem>
- <para>
- <type>char</type> vs. <type>unsigned char</type>. For
- historical reasons, there are a lot of places in qpdf's
- internals that deal with <type>unsigned char</type>, which
- means that a lot of casting is required to interoperate with
- standard library calls and <type>std::string</type>. In
- retrospect, qpdf should have probably used regular (signed)
- <type>char</type> and <type>char*</type> everywhere and just
- cast to <type>unsigned char</type> when needed, but it's too
- late to make that change now. There are
- <function>reinterpret_cast</function> calls to go between
- <type>char*</type> and <type>unsigned char*</type>, and there
- are <function>static_cast</function> calls to go between
- <type>char</type> and <type>unsigned char</type>. These should
- always be safe.
- </para>
- </listitem>
- <listitem>
- <para>
- Non-const <type>unsigned char*</type> used in the
- <type>Pipeline</type> interface. The pipeline interface has a
- <function>write</function> call that uses <type>unsigned
- char*</type> without a <type>const</type> qualifier. The main
- reason for this is to support pipelines that make calls to
- third-party libraries, such as zlib, that don't include
- <type>const</type> in their interfaces. Unfortunately, there
- are many places in the code where it is desirable to have
- <type>const char*</type> with pipelines. None of the pipeline
- implementations in qpdf currently modify the data passed to
- write, and doing so would be counter to the intent of
- <type>Pipeline</type>, but there is nothing in the code to
- prevent this from being done. There are places in the code
- where <function>const_cast</function> is used to remove the
- const-ness of pointers going into <type>Pipeline</type>s. This
- could theoretically be unsafe, but there is adequate testing to
- assert that it is safe and will remain safe in qpdf's code.
- </para>
- </listitem>
- <listitem>
- <para>
- <type>size_t</type> vs. <type>qpdf_offset_t</type>. This is
- pretty much unavoidable since sizes are unsigned types and
- offsets are signed types. Whenever it is necessary to seek by
- an amount given by a <type>size_t</type>, it becomes necessary
- to mix and match between <type>size_t</type> and
- <type>qpdf_offset_t</type>. Additionally, qpdf sometimes
- treats memory buffers like files (as with
- <type>BufferInputSource</type>, and those seek interfaces have
- to be consistent with file-based input sources. Neither gcc
- nor MSVC give warnings for this case by default, but both have
- warning flags that can enable this. (MSVC:
- <option>/W14267</option> or <option>/W3</option>, which also
- enables some additional warnings that we ignore; gcc:
- <option>-Wconversion -Wsign-conversion</option>). This could
- matter for files whose sizes are larger than
- 2<superscript>63</superscript> bytes, but it is reasonable to
- expect that a world where such files are common would also have
- larger <type>size_t</type> and <type>qpdf_offset_t</type> types
- in it. On most 64-bit systems at the time of this writing (the
- release of version 4.1.0 of qpdf), both <type>size_t</type> and
- <type>qpdf_offset_t</type> are 64-bit integer types, while on
- many current 32-bit systems, <type>size_t</type> is a 32-bit
- type while <type>qpdf_offset_t</type> is a 64-bit type. I am
- not aware of any cases where 32-bit systems that have
- <type>size_t</type> smaller than <type>qpdf_offset_t</type>
- could run into problems. Although I can't conclusively rule
- out the possibility of such problems existing, I suspect any
- cases would be pretty contrived. In the event that someone
- should produce a file that qpdf can't handle because of what is
- suspected to be issues involving the handling of
- <type>size_t</type> vs. <type>qpdf_offset_t</type> (such files
- may behave properly on 64-bit systems but not on 32-bit systems
- because they have very large embedded files or streams, for
- example), the above mentioned warning flags could be enabled
- and all those implicit conversions could be carefully
- scrutinized. (I have already gone through that exercise once
- in adding support for files larger than 4&nbsp;GB in size.) I
- continue to be committed to supporting large files on 32-bit
- systems, but I would not go to any lengths to support corner
- cases involving large embedded files or large streams that work
- on 64-bit systems but not on 32-bit systems because of
- <type>size_t</type> being too small. It is reasonable to
- assume that anyone working with such files would be using a
- 64-bit system anyway since many 32-bit applications would have
- similar difficulties.
- </para>
- </listitem>
- <listitem>
- <para>
- <type>size_t</type> vs. <type>int</type> or <type>long</type>.
- There are some cases where <type>size_t</type> and
- <type>int</type> or <type>long</type> or <type>size_t</type>
- and <type>unsigned int</type> or <type>unsigned long</type> are
- used interchangeably. These cases occur when working with very
- small amounts of memory, such as with the bit readers (where
- we're working with just a few bytes at a time), some cases of
- <function>strlen</function>, and a few other cases. I have
- scrutinized all of these cases and determined them to be safe,
- but there is no mechanism in the code to ensure that new unsafe
- conversions between <type>int</type> and <type>size_t</type>
- aren't introduced short of good testing and strong awareness of
- the issues. Again, if any such bugs are suspected in the
- future, enabling the additional warning flags and scrutinizing
- the warnings would be in order.
- </para>
- </listitem>
- </itemizedlist>
- </para>
- <para>
- To be clear, I believe qpdf to be well-behaved with respect to
- sizes and offsets, and qpdf's test suite includes actual
- generation and full processing of files larger than 4&nbsp;GB in
- size. The issues raised here are largely academic and should not
- in any way be interpreted to mean that qpdf has practical problems
- involving sloppiness with integer types. I also believe that
- appropriate measures have been taken in the code to avoid problems
- with signed vs. unsigned integers from resulting in memory
- overwrites or other issues with potential security implications,
- though there are never any absolute guarantees.
+ The <classname>QIntC</classname> namespace, provided by
+ <filename>include/qpdf/QIntC.hh</filename>, implements safe
+ functions for converting between integer types. These functions do
+ range checking and throw a <type>std::range_error</type>, which is
+ subclass of <type>std::runtime_error</type>, if conversion from one
+ integer type to another results in loss of information. There are
+ many cases in which we have to move between different integer
+ types because of incompatible integer types used in interoperable
+ interfaces. Some are unavoidable, such as moving between sizes and
+ offsets, and others are there because of old code that is too in
+ entrenched to be fixable without breaking source compatibility and
+ causing pain for users. QPDF is compiled with extra warnings to
+ detect conversions with potential data loss, and all such cases
+ should be fixed by either using a function from
+ <classname>QIntC</classname> or a
+ <function>static_cast</function>.
+ </para>
+ <para>
+ When the intention is just to switch the type because of
+ exchanging data between incompatible interfaces, use
+ <classname>QIntC</classname>. This is the usual case. However,
+ there are some cases in which we are explicitly intending to use
+ the exact same bit pattern with a different type. This is most
+ common when switching between signed and unsigned characters. A
+ lot of qpdf's code uses unsigned characters internally, but
+ <type>std::string</type> and <type>char</type> are signed. Using
+ <function>QIntC::to_char</function> would be wrong for converting
+ from unsigned to signed characters because a negative
+ <type>char</type> value and the corresponding <type>unsigned
+ char</type> value greater than 127 <emphasis>mean the same
+ thing</emphasis>. There are also cases in which we use
+ <function>static_cast</function> when working with bit fields
+ where we are not representing a numerical value but rather a bunch
+ of bits packed together in some integer type. Also note that
+ <type>size_t</type> and <type>long</type> both typically differ
+ between 32-bit and 64-bit environments, so sometimes an explicit
+ cast may not be needed to avoid warnings on one platform but may
+ be needed on another. A conversion with
+ <classname>QIntC</classname> should always be used when the types
+ are different even if the underlying size is the same. QPDF's CI
+ build builds on 32-bit and 64-bit platforms, and the test suite is
+ very thorough, so it is hard to make any of the potential errors
+ here without being caught in build or test.
+ </para>
+ <para>
+ Non-const <type>unsigned char*</type> is used in the
+ <type>Pipeline</type> interface. The pipeline interface has a
+ <function>write</function> call that uses <type>unsigned
+ char*</type> without a <type>const</type> qualifier. The main
+ reason for this is to support pipelines that make calls to
+ third-party libraries, such as zlib, that don't include
+ <type>const</type> in their interfaces. Unfortunately, there are
+ many places in the code where it is desirable to have <type>const
+ char*</type> with pipelines. None of the pipeline implementations
+ in qpdf currently modify the data passed to write, and doing so
+ would be counter to the intent of <type>Pipeline</type>, but there
+ is nothing in the code to prevent this from being done. There are
+ places in the code where <function>const_cast</function> is used
+ to remove the const-ness of pointers going into
+ <type>Pipeline</type>s. This could theoretically be unsafe, but
+ there is adequate testing to assert that it is safe and will
+ remain safe in qpdf's code.
</para>
</sect1>
<sect1 id="ref.encryption">