From c8729398ddb9ac82b00bbafaf24e8d37543e5b9e Mon Sep 17 00:00:00 2001 From: Jay Berkenbilt Date: Tue, 11 Jan 2022 11:49:33 -0500 Subject: Generate help content from manual This is a massive rewrite of the help text and cli.rst section of the manual. All command-line flags now have their own help and are specifically index. qpdf --help is completely redone. --- manual/encryption.rst | 327 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 327 insertions(+) create mode 100644 manual/encryption.rst (limited to 'manual/encryption.rst') diff --git a/manual/encryption.rst b/manual/encryption.rst new file mode 100644 index 00000000..6ca950f7 --- /dev/null +++ b/manual/encryption.rst @@ -0,0 +1,327 @@ +.. _pdf-encryption: + +PDF Encryption +============== + +This chapter discusses PDF encryption in a general way with an angle +toward how it works in :command:`qpdf`. This chapter is not intended +to replace the PDF specification. Please consult the spec for full +details. + +PDF Encryption Concepts +----------------------- + +Encryption + Encryption is the replacement of *clear text* with encrypted text, + also known as *ciphertext*. The clear text may be retrieved from the + ciphertext if the encryption key is known. + + PDF files consist of an object structure. PDF objects may be of a + variety of types including (among others) numbers, boolean values, + names, arrays, dictionaries, strings, and streams. In a PDF file, + only strings and streams are encrypted. + +Security Handler + Since the inception of PDF, there have been several modifications to + the way files are encrypted. Encryption is handled by a *security + handler*. The *standard security handler* is password-based. This is + the only security handler implemented by qpdf, and this material is + all focused on the standard security handler. There are various + flags that control the specific details of encryption with the + standard security handler. These are discussed below. + +Encryption Key + This refers to the actual key used by the encryption and decryption + algorithms. It is distinct from the password. The main encryption + key is generated at random and stored encrypted in the PDF file. The + passwords used to protect a PDF file, if any, are used to protect + the encryption key. This design makes it possible to use different + passwords (e.g., user and owner passwords) to retrieve the + encryption key or even to change the password on a file without + changing the encryption key. qpdf can expose the encryption key when + run with the :qpdf:ref:`--show-encryption-key` option and can accept + a hex-encoded encryption key in place of a password when run with + the :qpdf:ref:`--password-is-hex-key` option. + +Password Protection + Password protection is distinct from encryption. This point is often + misunderstood. A PDF file can be encrypted without being + password-protected. The intent of PDF encryption was that there + would be two passwords: a *user password* and an *owner password*. + Either password can be used to retrieve the encryption key. A + conforming reader is supposed to obey the security restrictions + if the file is opened using the user password but not if the file is + opened with the owner password. :command:`qpdf` makes no distinction + between which password is used to open the file. The distinction + made by conforming readers between the user and owner password is + what makes it common to create encrypted files with no password + protection. This is done by using the empty string as the user + password and some secret string as the owner password. When a user + opens the PDF file, the empty string is used to retrieve the + encryption key, making the file usable, but a conforming reader + restricts certain operations from the user. + +What does all this mean? Here are a few things to realize. + +- Since the user password and the owner password are both used to + recover the single encryption key, there is *fundamentally no way* + to prevent an application from disregarding the security + restrictions on a file. Any software that can read the encrypted + file at all has the encryption key. Therefore, the security of the + restrictions placed on PDF files is solely enforced by the software. + Any open source PDF reader could be trivially modified to ignore the + security restrictions on a file. The PDF specification is clear + about this point. This means that PDF restrictions on + non-password-protected files only restrict users who don't know how + to circumvent them. + +- If a file is password-protected, you have to know at least one of + the user or owner password to retrieve the encryption key. However, + in the case of 40-bit encryption, the actual encryption key is only + 5 bytes long and can be easily brute-forced. As such, files + encrypted with 40-bit encryption are not secure regardless of how + strong the password is. With 128-bit encryption, the default + security handler uses RC4 encryption, which is also known be + insecure. As such, the only way to securely encrypt a PDF file using + the standard security handler (as of the last review of this chapter + in 2022) is to use AES encryption. This is the only supported + algorithm with 256-bit encryption, and it can be selected to be used + with 128-bit encryption as well. However there is no reason to use + 128-bit encryption with AES. If you are going to use AES, just use + 256-bit encryption instead. The security of a 256-bit AES-encrypted + PDF file with a strong password is comparable to using a + general-purpose encryption tool like :command:`gpg` or + :command:`openssl` to encrypt the PDF file with the same password, + but the advantage of using PDF encryption is that no software is + required beyond a regular PDF viewer. + +PDF Encryption Details +---------------------- + +This section describes a few details about PDF encryption. It does not +describe all the details. For that, read the PDF specification. The +details presented here, however, should go a long way toward helping a +casual user/developer understand what's going on with encrypted PDF +files. + +Here are more concepts to understand. + +Algorithm parameters ``V`` and ``R`` + There are two parameters that control the details of encryption + using the standard security handler: ``V`` and ``R``. + + ``V`` is a code specifying the algorithms that are used for + encrypting the file, handling keys, etc. It may have any of the + following values: + + - 1: The original algorithm, which encrypted files using 40-bit keys. + + - 2: An extension of the original algorithm allowing longer keys. + Introduced in PDF 1.4. + + - 3: An unpublished algorithm that permits file encryption key + lengths ranging from 40 to 128 bits. Introduced in PDF 1.4. qpdf + is believed to be able to read files with ``V`` = 3 but does not + write such files. + + - 4: An extension of the algorithm that allows it to be + parameterized by additional rules for handling strings and + streams. Introduced in PDF 1.5. + + - 5: An algorithm that allows specification of separate security + handlers for strings and streams as well as embedded files, and + which supports 256-bit keys. Introduced in PDF 1.7 extension level + 3 and later extended in extension level 8. This is the encryption + system in the PDF 2.0 specification, ISO-32000. + + ``R`` is a code specifying the revision of the standard handler. It + is tightly coupled with the value of ``V``. ``R`` may have any of + the following values: + + - 2: ``V`` must be 1 + + - 3: ``V`` must be 2 or 3 + + - 4: ``V`` must be 4 + + - 5: ``V`` must be 5; this extension was never fully specified and + existed for a short time in some versions of Acrobat. + :command:`qpdf` is able to read and write this format, but it + should not be used for any purpose other than testing + compatibility with the format. + + - 6: ``V`` must be 5. This is the only value that is not deprecated + in the PDF 2.0 specification, ISO-32000. + +Encryption Dictionary + Encrypted PDF files have an encryption dictionary. There are several + fields, but these are the important ones for our purposes: + + - ``V`` and ``R`` as described above + + - ``O``, ``U``, ``OE``, ``UE``: values used by the algorithms that + recover the encryption key from the user and owner password. Which + of these are defined and how they are used vary based on the value + of ``R``. + + - ``P``: a bit field that describes which restrictions are in place. + This is discussed below in :ref:`security-restrictions` + +Encryption Algorithms + PDF files may be encrypted with the obsolete, insecure RC4 algorithm + or the more secure AES algorithm. See also :ref:`weak-crypto` for a + discussion. 40-bit encryption always uses RC4. 128-bit can use + either RC4 (the default for compatibility reasons) or, starting with + PDF 1.6, AES. 256-bit encryption always uses AES. + +.. _security-restrictions: + +PDF Security Restrictions +------------------------- + +PDF security restrictions are described by a bit field whose value is +stored in the ``P`` field in the encryption dictionary. The value of +``P`` is used by the algorithms to recover the encryption key given +the password, which makes the value of ``P`` tamper-resistent. + +``P`` is a 32-bit integer, treated as a signed twos-complement number. +A 1 in any bit position means the permission is granted. The PDF +specification numbers the bits from 1 (least significant bit) to 32 +(most significant bit) rather than the more customary 0 to 31. For +consistency with the spec, the remainder of this section uses the +1-based numbering. + +Only bits 3, 4, 5, 6, 9, 10, 11, and 12 are used. All other bits are +set to 1. Since bit 32 is always set to 1, the value of ``P`` is +always a negative number. (:command:`qpdf` recognizes a positive +number on behalf of buggy writers that treat ``P`` as unsigned. Such +files have been seen in the wild.) + +Here are the meanings of the bit positions. All bits not listed must +have the value 1 except bits 1 and 2, which must have the value 0. +However, the values of bits other than those in the table are ignored, +so having incorrect values probably doesn't break anything in most +cases. A value of 1 indicates that the permission is granted. + +- 3: for ``R`` = 2 printing; for ``R`` >= 3, printing at low + resolution + +- 4: modifying the document except as controlled by bits 6, + 9, and 11 + +- 5: extracting text and graphics for purposes other than + accessibility to visually impaired users + +- 6: add or modify annotations, fill in interactive form fields; + if bit 4 is also set, create or modify interactive form fields + +- 9: for ``R`` >= 3, fill in interactive form fields even if bit 6 is + clear + +- 10: not used; formerly granted permission to extract material + for accessibility, but the specification now disallows restriction + of accessibility, and conforming readers are to treat this bit as if + it is set regardless of its value + +- 11: for ``R`` >= 3, assemble document including inserting, rotating, + or deleting pages or creating document outlines or thumbnail images + +- 12: for ``R`` >= 3, allow printing at full resolution + +.. _qpdf-P: + +How qpdf handles security restrictions +-------------------------------------- + +The section describes exactly what the qpdf library does with regard +to ``P`` based on the various settings of different security options. + +- Start with all bits set except bits 1 and 2, which are cleared + +- For ``R`` = 2: + + - ``--print=n``: clear bit 3 + + - ``--modify=n``: clear bit 4 + + - ``--extract=n``: clear bit 5 + + - ``--annotate=n``: clear bit 6 + +- For ``R >= 3``: + + - ``--accessibility=n``: for ``R`` = 3, clear bit 10; otherwise, + ignore so bit 10 is always clear if ``R`` >= 4. qpdf allows + creating files with bit 10 clear so that it can be used to create + test files to ensure that a conforming reader ignores the value of + the bit. You should never intentionally clear accessibility. + + - ``--extract=n``: clear bit 5 + + - ``--print=none``: clear bits 3 and 12 + + - ``--print=low``: clear bit 12 + + - ``--modify=none``: clear bits 4, 6, 9, and 11 + + - ``--modify=assembly``: clear bits 4, 6, and 9 + + - ``--modify=form``: clear bits 4 and 6 + + - ``--modify=annotate``: clear bit 4 + + - ``--assemble=n``: clear bit 11 + + - ``--annotate=n``: clear bit 6 + + - ``--form=n``: clear bit 9 + + - ``--modify-other=n``: clear bit 4 + +Options to :command:`qpdf`, both at the CLI and library level, allow +more granular clearing of permission bits than do most tools, +including Adobe Acrobat. As such, PDF viewers may respond in +surprising ways based on options passed to qpdf. If you observe this, +it is probably not because of a bug in qpdf. + +.. _pdf-passwords: + +User and Owner Passwords +------------------------ + +When you use qpdf to show encryption parameters and you open a file +with the owner password, sometimes qpdf reveals the user password, and +sometimes it doesn't. Here's why. + +For ``V`` < 5, the user password is actually stored in the PDF file +encrypted with a key that is derived from the owner password, and the +main encryption key is encrypted using a key derived from the user +password. When you open a PDF file, the reader first tries to treat +the given password as the user password, using it to recover the +encryption key. If that works, you're in with restrictions (assuming +the reader chooses to enforce them). If it doesn't work, then the +reader treats the password as the owner password, using it to recover +the user password, and then uses the user password to retrieve the +encryption key. This is why creating a file with the same user +password and owner password with ``V`` < 5 results in a file that some +readers will never allow you to open as the owner. Typically when a +reader encounters a file with ``V`` < 5, it will first attempt to +treat the empty string as a user password. If that works, the file is +encrypted but not password-protected. If it doesn't work, then a +password prompt is given. Creating a file with an empty owner password +is like creating a file with the same owner and user password: there +is no way to open the file as an owner. + +For ``V`` >= 5, the main encryption key is independently encrypted +using the user password and the owner password. There is no way to +recover the user password from the owner password. Restrictions are +imposed or not depending on which password was used. In this case, the +password supplied, if any, is tried both as the user password and the +owner password, and whichever works is used. Typically the password is +tried as the owner password first. (This is what the PDF specification +says to do.) As such, specifying a user password and leaving the owner +password blank results in a file that is opened as owner with no +password, effectively rendering the security restrictions useless. +This is why :command:`qpdf` requires you to pass +:qpdf:ref:`--allow-insecure` to create a file with an empty owner +password when 256-bit encryption is in use. -- cgit v1.2.3-54-g00ecf