summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorJay Berkenbilt <ejb@ql.org>2013-03-25 19:37:25 +0100
committerJay Berkenbilt <ejb@ql.org>2013-03-25 19:37:25 +0100
commite8ddac89501e232205e1737a07ddb7d1c2425e4b (patch)
treeacc331b2f8ae7869432e3e6c5c5edef9beceeda3
parent1ec1b12864c60b9cc6750c7f175a9262f2efd558 (diff)
downloadqpdf-e8ddac89501e232205e1737a07ddb7d1c2425e4b.tar.zst
Document casting policy
-rw-r--r--ChangeLog5
-rw-r--r--README.maintainer4
-rw-r--r--TODO107
-rw-r--r--manual/qpdf-manual.xml160
4 files changed, 174 insertions, 102 deletions
diff --git a/ChangeLog b/ChangeLog
index c00f7a18..36f7f051 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,8 @@
+2013-03-25 Jay Berkenbilt <ejb@ql.org>
+
+ * manual/qpdf-manual.xml: Document the casting policy that is
+ followed in qpdf's implementation.
+
2013-03-11 Jay Berkenbilt <ejb@ql.org>
* When creating Windows binary distributions, make sure to only
diff --git a/README.maintainer b/README.maintainer
index 4be45f51..a1ac5847 100644
--- a/README.maintainer
+++ b/README.maintainer
@@ -31,7 +31,9 @@ Release Reminders
* Check all open issues in the sourceforge trackers and on github.
* If any interfaces were added or changed, check C API to see whether
- changes are appropriate there as well.
+ changes are appropriate there as well. If necessary, review the
+ casting policy in the manual, and ensure that integer types are
+ properly handled.
* Increment shared library version information as needed
(libqpdf/build.mk)
diff --git a/TODO b/TODO
index 97e7d2c4..ebbd8105 100644
--- a/TODO
+++ b/TODO
@@ -1,3 +1,9 @@
+4.1.0
+=====
+
+ * New public interfaces have been added.
+
+
4.2.0
=====
@@ -38,107 +44,6 @@
- See ../misc/broken-files
-4.1.0
-=====
-
- * Add to documentation, and mention this documentation in
- README.maintainer:
-
- Casting policy.
-
- The C++ code in qpdf is free of old-style casts except where
- unavoidable (e.g. where the old-style cast is in a macro provided
- by a third-party header file). When there is a need for a cast, it
- is handled, in order of preference by rewriting the code to avoid
- the need for a cast, calling const_cast, calling static_cast,
- calling reinterpret_cast, or calling some combination of the above.
- The casting policy explicitly prohibits casting between sizes for
- no purpose other than to quiet a compiler warning when there is no
- reasonable chance of a problem resulting. The reason for this
- exclusion is that it takes away enabling additional compiler
- warnings as a tool for making future improvements to this aspect of
- the code and also damages the readability of the code. As a last
- resort, a compiler-specific pragma may be used to suppress a
- warning that we don't want to fix. Examples may include
- suppressing warnings about the use of old-style casts in code that
- is shared between C and C++ code.
-
- There are a few significant areas where casting is common in the qpdf
- sources or where casting would be required to quiet higher levels
- of compiler warnings but is omitted at present:
-
- * signed vs. unsigned char. For historical reasons, there are a
- lot of places in qpdf's internals that deal with unsigned char,
- which means that a lot of casting is required to interoperate
- with standard library calls and std::string. In retrospect,
- qpdf should have probably used signed char everywhere and just
- cast to unsigned char when needed. There are reinterpret_cast
- calls to go between char* and unsigned char*, and there are
- static_cast calls to go between char and unsigned char. These
- should always be safe.
-
- * non-const unsigned char* used in Pipeline interface. The
- pipeline interface has a write() call that uses unsigned char*
- without a const qualifier. The main reason for this is to
- support pipelines that make calls to third-party libraries, such
- as zlib, that don't include const in their interfaces.
- Unfortunately, there are many places in the code where it is
- desirable to have const char* with pipelines. None of the
- pipeline implementations in qpdf currently modify the data
- passed to write, and doing so would be counter to the intent of
- Pipeline. There are places in the code where const_cast is used
- to remove the const-ness of pointers going into Pipelines. This
- could be potentially unsafe, but there is adequate testing to
- assert that it is safe in qpdf's code.
-
- * size_t vs. qpdf_offset_t. This is pretty much unavoidable since
- offsets are signed types and sizes are unsigned types. Whenever
- it is necessary to seek by an amount given by a size_t, it
- becomes necessary to mix and match between size_t and
- qpdf_offset_t. Additionally, qpdf sometimes treats memory
- buffers like files, and those seek interfaces have to be
- consistent with file-based input sources. Neither gcc nor MSVC
- give warnings for this case by default, but both have warning
- flags that can enable this. (MSVC: /W14267 or /W3 (which also
- enables some additional warnings that we ignore); gcc:
- -Wconversion -Wsign-conversion). This could matter for files
- whose sizes are larger than 2^63 bytes, but it is reasonable to
- expect that a world where such files are common would also have
- larger size_t and qpdf_offset_t types in it. I am not aware of
- any cases where 32-bit systems that have size_t smaller than
- qpdf_offset_t could run into problems, though I can't
- conclusively rule out the possibility. In the event that
- someone should produce a file that qpdf can't handle because of
- what is suspected to be issues involving the handling of size_t
- vs. qpdf_offset_t (such files may behave properly on 64-bit
- systems but not on 32-bit systems and may have very large
- embedded files or streams, for example), the above mentioned
- warning flags could be enabled and all those implicit
- conversions could be carefully scrutinized. (I have already
- gone through that exercise once in adding support for files >
- 4GB in size.) I continue to be commited to supporting large
- files on 32-bit systems, but I would not go to any lengths to
- support corner cases involving large embedded files or large
- streams that work on 64-bit systems but not on 32-bit systems
- because of size_t being too small. It is reasonable to assume
- that anyone working with such files would be using a 64-bit
- system anyway.
-
- * size_t vs. int. There are some cases where size_t and int or
- size_t and unsigned int are used interchangeably. These cases
- occur when working with very small amounts of memory, such as
- with the bit readers (where we're working with just a few bytes
- at a time), some cases of strlen, and a few other cases. I have
- scrutinized all of these cases and determined them to be safe,
- but there is no mechanism in the code to ensure that new unsafe
- conversions between int and size_t aren't introduced short of
- good testing and strong awareness of the issues. Again, if any
- such bugs are suspected in the future, enable the additional
- warning flags and scrutinizing the warnings would be in order.
-
- * New public interfaces have been added.
-
-
General
=======
diff --git a/manual/qpdf-manual.xml b/manual/qpdf-manual.xml
index 59b6c355..cfa6ec2b 100644
--- a/manual/qpdf-manual.xml
+++ b/manual/qpdf-manual.xml
@@ -1623,6 +1623,166 @@ outfile.pdf</option>
</itemizedlist>
</para>
</sect1>
+ <sect1 id="ref.casting">
+ <title>Casting Policy</title>
+ <para>
+ This section describes the casting policy followed by qpdf's
+ implementation. This is no concern to qpdf's end users and
+ largely of no concern to people writing code that uses qpdf, but
+ it could be of interest to people who are porting qpdf to a new
+ platform or who are making modifications to the code.
+ </para>
+ <para>
+ The C++ code in qpdf is free of old-style casts except where
+ unavoidable (e.g. where the old-style cast is in a macro provided
+ by a third-party header file). When there is a need for a cast,
+ it is handled, in order of preference, by rewriting the code to
+ avoid the need for a cast, calling
+ <function>const_cast</function>, calling
+ <function>static_cast</function>, calling
+ <function>reinterpret_cast</function>, or calling some combination
+ of the above. As a last resort, a compiler-specific
+ <literal>#pragma</literal> may be used to suppress a warning that
+ we don't want to fix. Examples may include suppressing warnings
+ about the use of old-style casts in code that is shared between C
+ and C++ code.
+ </para>
+ <para>
+ The casting policy explicitly prohibits casting between integer
+ sizes for no purpose other than to quiet a compiler warning when
+ there is no reasonable chance of a problem resulting. The reason
+ for this exclusion is that the practice of adding these additional
+ casts precludes future use of additional compiler warnings as a
+ tool for making future improvements to this aspect of the code,
+ and it also damages the readability of the code.
+ </para>
+ <para>
+ There are a few significant areas where casting is common in the
+ qpdf sources or where casting would be required to quiet higher
+ levels of compiler warnings but is omitted at present:
+ <itemizedlist>
+ <listitem>
+ <para>
+ <type>char</type> vs. <type>unsigned char</type>. For
+ historical reasons, there are a lot of places in qpdf's
+ internals that deal with <type>unsigned char</type>, which
+ means that a lot of casting is required to interoperate with
+ standard library calls and <type>std::string</type>. In
+ retrospect, qpdf should have probably used regular (signed)
+ <type>char</type> and <type>char*</type> everywhere and just
+ cast to <type>unsigned char</type> when needed, but it's too
+ late to make that change now. There are
+ <function>reinterpret_cast</function> calls to go between
+ <type>char*</type> and <type>unsigned char*</type>, and there
+ are <function>static_cast</function> calls to go between
+ <type>char</type> and <type>unsigned char</type>. These should
+ always be safe.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Non-const <type>unsigned char*</type> used in the
+ <type>Pipeline</type> interface. The pipeline interface has a
+ <function>write</function> call that uses <type>unsigned
+ char*</type> without a <type>const</type> qualifier. The main
+ reason for this is to support pipelines that make calls to
+ third-party libraries, such as zlib, that don't include
+ <type>const</type> in their interfaces. Unfortunately, there
+ are many places in the code where it is desirable to have
+ <type>const char*</type> with pipelines. None of the pipeline
+ implementations in qpdf currently modify the data passed to
+ write, and doing so would be counter to the intent of
+ <type>Pipeline</type>, but there is nothing in the code to
+ prevent this from being done. There are places in the code
+ where <function>const_cast</function> is used to remove the
+ const-ness of pointers going into <type>Pipeline</type>s. This
+ could theoretically be unsafe, but there is adequate testing to
+ assert that it is safe and will remain safe in qpdf's code.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <type>size_t</type> vs. <type>qpdf_offset_t</type>. This is
+ pretty much unavoidable since sizes are unsigned types and
+ offsets are signed types. Whenever it is necessary to seek by
+ an amount given by a <type>size_t</type>, it becomes necessary
+ to mix and match between <type>size_t</type> and
+ <type>qpdf_offset_t</type>. Additionally, qpdf sometimes
+ treats memory buffers like files (as with
+ <type>BufferInputSource</type>, and those seek interfaces have
+ to be consistent with file-based input sources. Neither gcc
+ nor MSVC give warnings for this case by default, but both have
+ warning flags that can enable this. (MSVC:
+ <option>/W14267</option> or <option>/W3</option>, which also
+ enables some additional warnings that we ignore; gcc:
+ <option>-Wconversion -Wsign-conversion</option>). This could
+ matter for files whose sizes are larger than
+ 2<superscript>63</superscript> bytes, but it is reasonable to
+ expect that a world where such files are common would also have
+ larger <type>size_t</type> and <type>qpdf_offset_t</type> types
+ in it. On most 64-bit systems at the time of this writing (the
+ release of version 4.1.0 of qpdf), both <type>size_t</type> and
+ <type>qpdf_offset_t</type> are 64-bit integer types, while on
+ many current 32-bit systems, <type>size_t</type> is a 32-bit
+ type while <type>qpdf_offset_t</type> is a 64-bit type. I am
+ not aware of any cases where 32-bit systems that have
+ <type>size_t</type> smaller than <type>qpdf_offset_t</type>
+ could run into problems. Although I can't conclusively rule
+ out the possibility of such problems existing, I suspect any
+ cases would be pretty contrived. In the event that someone
+ should produce a file that qpdf can't handle because of what is
+ suspected to be issues involving the handling of
+ <type>size_t</type> vs. <type>qpdf_offset_t</type> (such files
+ may behave properly on 64-bit systems but not on 32-bit systems
+ because they have very large embedded files or streams, for
+ example), the above mentioned warning flags could be enabled
+ and all those implicit conversions could be carefully
+ scrutinized. (I have already gone through that exercise once
+ in adding support for files larger than 4&nbsp;GB in size.) I
+ continue to be commited to supporting large files on 32-bit
+ systems, but I would not go to any lengths to support corner
+ cases involving large embedded files or large streams that work
+ on 64-bit systems but not on 32-bit systems because of
+ <type>size_t</type> being too small. It is reasonable to
+ assume that anyone working with such files would be using a
+ 64-bit system anyway since many 32-bit applications would have
+ similar difficulties.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <type>size_t</type> vs. <type>int</type> or <type>long</type>.
+ There are some cases where <type>size_t</type> and
+ <type>int</type> or <type>long</type> or <type>size_t</type>
+ and <type>unsigned int</type> or <type>unsigned long</type> are
+ used interchangeably. These cases occur when working with very
+ small amounts of memory, such as with the bit readers (where
+ we're working with just a few bytes at a time), some cases of
+ <function>strlen</function>, and a few other cases. I have
+ scrutinized all of these cases and determined them to be safe,
+ but there is no mechanism in the code to ensure that new unsafe
+ conversions between <type>int</type> and <type>size_t</type>
+ aren't introduced short of good testing and strong awareness of
+ the issues. Again, if any such bugs are suspected in the
+ future, enabling the additional warning flags and scrutinizing
+ the warnings would be in order.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ <para>
+ To be clear, I believe qpdf to be well-behaved with respect to
+ sizes and offsets, and qpdf's test suite includes actual
+ generation and full processing of files larger than 4&nbsp;GB in
+ size. The issues raised here are largely academic and should not
+ in any way be interpreted to mean that qpdf has practical problems
+ involving sloppiness with integer types. I also believe that
+ appropriate measures have been taken in the code to avoid problems
+ with signed vs. unsigned integers from resulting in memory
+ overwrites or other issues with potential security implications,
+ though there are never any absolute guarantees.
+ </para>
+ </sect1>
<sect1 id="ref.encryption">
<title>Encryption</title>
<para>