aboutsummaryrefslogtreecommitdiffstats
path: root/manual/encryption.rst
blob: 267077d5ad2340538a71f5a45dd9187ed3c9a9d9 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
.. _pdf-encryption:

PDF Encryption
==============

This chapter discusses PDF encryption in a general way with an angle
toward how it works in :command:`qpdf`. This chapter is not intended
to replace the PDF specification. Please consult the spec for full
details.

PDF Encryption Concepts
-----------------------

Encryption
  Encryption is the replacement of *clear text* with encrypted text,
  also known as *ciphertext*. The clear text may be retrieved from the
  ciphertext if the encryption key is known.

  PDF files consist of an object structure. PDF objects may be of a
  variety of types including (among others) numbers, boolean values,
  names, arrays, dictionaries, strings, and streams. In a PDF file,
  only strings and streams are encrypted.

Security Handler
  Since the inception of PDF, there have been several modifications to
  the way files are encrypted. Encryption is handled by a *security
  handler*. The *standard security handler* is password-based. This is
  the only security handler implemented by qpdf, and this material is
  all focused on the standard security handler. There are various
  flags that control the specific details of encryption with the
  standard security handler. These are discussed below.

Encryption Key
  This refers to the actual key used by the encryption and decryption
  algorithms. It is distinct from the password. The main encryption
  key is generated at random and stored encrypted in the PDF file. The
  passwords used to protect a PDF file, if any, are used to protect
  the encryption key. This design makes it possible to use different
  passwords (e.g., user and owner passwords) to retrieve the
  encryption key or even to change the password on a file without
  changing the encryption key. qpdf can expose the encryption key when
  run with the :qpdf:ref:`--show-encryption-key` option and can accept
  a hex-encoded encryption key in place of a password when run with
  the :qpdf:ref:`--password-is-hex-key` option.

Password Protection
  Password protection is distinct from encryption. This point is often
  misunderstood. A PDF file can be encrypted without being
  password-protected. The intent of PDF encryption was that there
  would be two passwords: a *user password* and an *owner password*.
  Either password can be used to retrieve the encryption key. A
  conforming reader is supposed to obey the security restrictions
  if the file is opened using the user password but not if the file is
  opened with the owner password. :command:`qpdf` makes no distinction
  between which password is used to open the file. The distinction
  made by conforming readers between the user and owner password is
  what makes it common to create encrypted files with no password
  protection. This is done by using the empty string as the user
  password and some secret string as the owner password. When a user
  opens the PDF file, the empty string is used to retrieve the
  encryption key, making the file usable, but a conforming reader
  restricts certain operations from the user.

What does all this mean? Here are a few things to realize.

- Since the user password and the owner password are both used to
  recover the single encryption key, there is *fundamentally no way*
  to prevent an application from disregarding the security
  restrictions on a file. Any software that can read the encrypted
  file at all has the encryption key. Therefore, the security of the
  restrictions placed on PDF files is solely enforced by the software.
  Any open source PDF reader could be trivially modified to ignore the
  security restrictions on a file. The PDF specification is clear
  about this point. This means that PDF restrictions on
  non-password-protected files only restrict users who don't know how
  to circumvent them.

- If a file is password-protected, you have to know at least one of
  the user or owner password to retrieve the encryption key. However,
  in the case of 40-bit encryption, the actual encryption key is only
  5 bytes long and can be easily brute-forced. As such, files
  encrypted with 40-bit encryption are not secure regardless of how
  strong the password is. With 128-bit encryption, the default
  security handler uses RC4 encryption, which is also known to be
  insecure. As such, the only way to securely encrypt a PDF file using
  the standard security handler (as of the last review of this chapter
  in 2022) is to use AES encryption. This is the only supported
  algorithm with 256-bit encryption, and it can be selected to be used
  with 128-bit encryption as well. However there is no reason to use
  128-bit encryption with AES. If you are going to use AES, just use
  256-bit encryption instead. The security of a 256-bit AES-encrypted
  PDF file with a strong password is comparable to using a
  general-purpose encryption tool like :command:`gpg` or
  :command:`openssl` to encrypt the PDF file with the same password,
  but the advantage of using PDF encryption is that no software is
  required beyond a regular PDF viewer.

PDF Encryption Details
----------------------

This section describes a few details about PDF encryption. It does not
describe all the details. For that, read the PDF specification. The
details presented here, however, should go a long way toward helping a
casual user/developer understand what's going on with encrypted PDF
files.

Here are more concepts to understand.

Algorithm parameters ``V`` and ``R``
  There are two parameters that control the details of encryption
  using the standard security handler: ``V`` and ``R``.

  ``V`` is a code specifying the algorithms that are used for
  encrypting the file, handling keys, etc. It may have any of the
  following values:

  .. list-table:: Encryption Algorithms: ``V``
     :widths: 10 80
     :header-rows: 1

     - - V
       - Meaning

     - - 1
       - The original algorithm, which encrypted files using 40-bit keys.

     - - 2
       - An extension of the original algorithm allowing longer keys.
         Introduced in PDF 1.4.

     - - 3
       - An unpublished algorithm that permits file encryption key
         lengths ranging from 40 to 128 bits. Introduced in PDF 1.4.
         qpdf is believed to be able to read files with ``V`` = 3 but
         does not write such files.

     - - 4
       - An extension of the algorithm that allows it to be
         parameterized by additional rules for handling strings and
         streams. Introduced in PDF 1.5.

     - - 5
       - An algorithm that allows specification of separate security
         handlers for strings and streams as well as embedded files,
         and which supports 256-bit keys. Introduced in PDF 1.7
         extension level 3 and later extended in extension level 8.
         This is the encryption system in the PDF 2.0 specification,
         ISO-32000.

  ``R`` is a code specifying the revision of the standard handler. It
  is tightly coupled with the value of ``V``. ``R`` may have any of
  the following values:

  .. list-table:: Relationship between ``R`` and ``V``
     :widths: 10 80
     :header-rows: 1

     - - R
       - Expected V

     - - 2
       - ``V`` must be 1

     - - 3
       - ``V`` must be 2 or 3

     - - 4
       - ``V`` must be 4

     - - 5
       - ``V`` must be 5; this extension was never fully specified and
         existed for a short time in some versions of Acrobat.
         :command:`qpdf` is able to read and write this format, but it
         should not be used for any purpose other than testing
         compatibility with the format.

     - - 6
       - ``V`` must be 5. This is the only value that is not
         deprecated in the PDF 2.0 specification, ISO-32000.

Encryption Dictionary
  Encrypted PDF files have an encryption dictionary. There are several
  fields, but these are the important ones for our purposes:

  - ``V`` and ``R`` as described above

  - ``O``, ``U``, ``OE``, ``UE``: values used by the algorithms that
    recover the encryption key from the user and owner password. Which
    of these are defined and how they are used vary based on the value
    of ``R``.

  - ``P``: a bit field that describes which restrictions are in place.
    This is discussed below in :ref:`security-restrictions`

Encryption Algorithms
  PDF files may be encrypted with the obsolete, insecure RC4 algorithm
  or the more secure AES algorithm. See also :ref:`weak-crypto` for a
  discussion. 40-bit encryption always uses RC4. 128-bit can use
  either RC4 (the default for compatibility reasons) or, starting with
  PDF 1.6, AES. 256-bit encryption always uses AES.

.. _security-restrictions:

PDF Security Restrictions
-------------------------

PDF security restrictions are described by a bit field whose value is
stored in the ``P`` field in the encryption dictionary. The value of
``P`` is used by the algorithms to recover the encryption key given
the password, which makes the value of ``P`` tamper-resistent.

``P`` is a 32-bit integer, treated as a signed twos-complement number.
A 1 in any bit position means the permission is granted. The PDF
specification numbers the bits from 1 (least significant bit) to 32
(most significant bit) rather than the more customary 0 to 31. For
consistency with the spec, the remainder of this section uses the
1-based numbering.

Only bits 3, 4, 5, 6, 9, 10, 11, and 12 are used. All other bits are
set to 1. Since bit 32 is always set to 1, the value of ``P`` is
always a negative number. (:command:`qpdf` recognizes a positive
number on behalf of buggy writers that treat ``P`` as unsigned. Such
files have been seen in the wild.)

Here are the meanings of the bit positions. All bits not listed must
have the value 1 except bits 1 and 2, which must have the value 0.
However, the values of bits other than those in the table are ignored,
so having incorrect values probably doesn't break anything in most
cases. A value of 1 indicates that the permission is granted.

.. list-table:: ``P`` Bit Values
   :widths: 10 80
   :header-rows: 1

   - - Bit
     - Meaning

   - - 3
     - for ``R`` = 2 printing; for ``R`` ≥ 3, printing at low
       resolution

   - - 4
     - modifying the document except as controlled by bits 6,
       9, and 11

   - - 5
     - extracting text and graphics for purposes other than
       accessibility to visually impaired users

   - - 6
     - add or modify annotations, fill in interactive form fields;
       if bit 4 is also set, create or modify interactive form fields

   - - 9
     - for ``R`` ≥ 3, fill in interactive form fields even if bit 6 is
       clear

   - - 10
     - not used; formerly granted permission to extract material for
       accessibility, but the specification now disallows restriction of
       accessibility, and conforming readers are to treat this bit as if
       it is set regardless of its value

   - - 11
     - for ``R`` ≥ 3, assemble document including inserting, rotating,
       or deleting pages or creating document outlines or thumbnail
       images

   - - 12
     - for ``R`` ≥ 3, allow printing at full resolution

.. _qpdf-P:

How qpdf handles security restrictions
--------------------------------------

The section describes exactly what the qpdf library does with regard
to ``P`` based on the various settings of different security options.

- Start with all bits set except bits 1 and 2, which are cleared

- Clear bits and described in the table below:

  .. list-table:: Command-line Arguments and ``P`` Bit Values
     :widths: 20 25 45
     :header-rows: 1

     - - R
       - Argument
       - Bits Cleared

     - - R = 2
       - ``--print=n``
       - 3

     - - R = 2
       - ``--modify=n``
       - 4

     - - R = 2
       - ``--extract=n``
       - 5

     - - R = 2
       - ``--annotate=n``
       - 6

     - - R = 3
       - ``--accessibility=n``
       - 10

     - - R ≥ 4
       - ``--accessibility=n``
       - ignored

     - - R ≥ 3
       - ``--extract=n``
       - 5

     - - R ≥ 3
       - ``--print=none``
       - 3, 12

     - - R ≥ 3
       - ``--print=low``
       - 12

     - - R ≥ 3
       - ``--modify=none``
       - 4, 6, 9, 11

     - - R ≥ 3
       - ``--modify=assembly``
       - 4, 6, 9

     - - R ≥ 3
       - ``--modify=form``
       - 4, 6

     - - R ≥ 3
       - ``--modify=annotate``
       - 4

     - - R ≥ 3
       - ``--assemble=n``
       - 11

     - - R ≥ 3
       - ``--annotate=n``
       - 6

     - - R ≥ 3
       - ``--form=n``
       - 9

     - - R ≥ 3
       - ``--modify-other=n``
       - 4

Options to :command:`qpdf`, both at the CLI and library level, allow
more granular clearing of permission bits than do most tools,
including Adobe Acrobat. As such, PDF viewers may respond in
surprising ways based on options passed to qpdf. If you observe this,
it is probably not because of a bug in qpdf.

.. _pdf-passwords:

User and Owner Passwords
------------------------

When you use qpdf to show encryption parameters and you open a file
with the owner password, sometimes qpdf reveals the user password, and
sometimes it doesn't. Here's why.

For ``V`` < 5, the user password is actually stored in the PDF file
encrypted with a key that is derived from the owner password, and the
main encryption key is encrypted using a key derived from the user
password. When you open a PDF file, the reader first tries to treat
the given password as the user password, using it to recover the
encryption key. If that works, you're in with restrictions (assuming
the reader chooses to enforce them). If it doesn't work, then the
reader treats the password as the owner password, using it to recover
the user password, and then uses the user password to retrieve the
encryption key. This is why creating a file with the same user
password and owner password with ``V`` < 5 results in a file that some
readers will never allow you to open as the owner. When an empty owner
password is given at file creation, the user password is used as both
the user and owner password. Typically when a reader encounters a file
with ``V`` < 5, it will first attempt to treat the empty string as a
user password. If that works, the file is encrypted but not
password-protected. If it doesn't work, then a password prompt is
given.

For ``V`` ≥ 5, the main encryption key is independently encrypted
using the user password and the owner password. There is no way to
recover the user password from the owner password. Restrictions are
imposed or not depending on which password was used. In this case, the
password supplied, if any, is tried both as the user password and the
owner password, and whichever works is used. Typically the password is
tried as the owner password first. (This is what the PDF specification
says to do.) As such, specifying a user password and leaving the owner
password blank results in a file that is opened as owner with no
password, effectively rendering the security restrictions useless.
This is why :command:`qpdf` requires you to pass
:qpdf:ref:`--allow-insecure` to create a file with an empty owner
password when 256-bit encryption is in use.