From 10fb619d3e0618528b7ac6c20cad6262020cf947 Mon Sep 17 00:00:00 2001 From: Jay Berkenbilt Date: Sat, 18 Dec 2021 09:01:52 -0500 Subject: Split documentation into multiple pages, change theme --- manual/qdf.rst | 96 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 96 insertions(+) create mode 100644 manual/qdf.rst (limited to 'manual/qdf.rst') diff --git a/manual/qdf.rst b/manual/qdf.rst new file mode 100644 index 00000000..b7ee7813 --- /dev/null +++ b/manual/qdf.rst @@ -0,0 +1,96 @@ +.. _ref.qdf: + +QDF Mode +======== + +In QDF mode, qpdf creates PDF files in what we call *QDF +form*. A PDF file in QDF form, sometimes called a QDF +file, is a completely valid PDF file that has ``%QDF-1.0`` as its third +line (after the pdf header and binary characters) and has certain other +characteristics. The purpose of QDF form is to make it possible to edit +PDF files, with some restrictions, in an ordinary text editor. This can +be very useful for experimenting with different PDF constructs or for +making one-off edits to PDF files (though there are other reasons why +this may not always work). Note that QDF mode does not support +linearized files. If you enable linearization, QDF mode is automatically +disabled. + +It is ordinarily very difficult to edit PDF files in a text editor for +two reasons: most meaningful data in PDF files is compressed, and PDF +files are full of offset and length information that makes it hard to +add or remove data. A QDF file is organized in a manner such that, if +edits are kept within certain constraints, the +:command:`fix-qdf` program, distributed with qpdf, is +able to restore edited files to a correct state. The +:command:`fix-qdf` program takes no command-line +arguments. It reads a possibly edited QDF file from standard input and +writes a repaired file to standard output. + +The following attributes characterize a QDF file: + +- All objects appear in numerical order in the PDF file, including when + objects appear in object streams. + +- Objects are printed in an easy-to-read format, and all line endings + are normalized to UNIX line endings. + +- Unless specifically overridden, streams appear uncompressed (when + qpdf supports the filters and they are compressed with a non-lossy + compression scheme), and most content streams are normalized (line + endings are converted to just a UNIX-style linefeeds). + +- All streams lengths are represented as indirect objects, and the + stream length object is always the next object after the stream. If + the stream data does not end with a newline, an extra newline is + inserted, and a special comment appears after the stream indicating + that this has been done. + +- If the PDF file contains object streams, if object stream *n* + contains *k* objects, those objects are numbered from *n+1* through + *n+k*, and the object number/offset pairs appear on a separate line + for each object. Additionally, each object in the object stream is + preceded by a comment indicating its object number and index. This + makes it very easy to find objects in object streams. + +- All beginnings of objects, ``stream`` tokens, ``endstream`` tokens, + and ``endobj`` tokens appear on lines by themselves. A blank line + follows every ``endobj`` token. + +- If there is a cross-reference stream, it is unfiltered. + +- Page dictionaries and page content streams are marked with special + comments that make them easy to find. + +- Comments precede each object indicating the object number of the + corresponding object in the original file. + +When editing a QDF file, any edits can be made as long as the above +constraints are maintained. This means that you can freely edit a page's +content without worrying about messing up the QDF file. It is also +possible to add new objects so long as those objects are added after the +last object in the file or subsequent objects are renumbered. If a QDF +file has object streams in it, you can always add the new objects before +the xref stream and then change the number of the xref stream, since +nothing generally ever references it by number. + +It is not generally practical to remove objects from QDF files without +messing up object numbering, but if you remove all references to an +object, you can run qpdf on the file (after running +:command:`fix-qdf`), and qpdf will omit the now-orphaned +object. + +When :command:`fix-qdf` is run, it goes through the file +and recomputes the following parts of the file: + +- the ``/N``, ``/W``, and ``/First`` keys of all object stream + dictionaries + +- the pairs of numbers representing object numbers and offsets of + objects in object streams + +- all stream lengths + +- the cross-reference table or cross-reference stream + +- the offset to the cross-reference table or cross-reference stream + following the ``startxref`` token -- cgit v1.2.3-70-g09d2