From e7b8f297ba92f4cadf88efcb394830dc24d54738 Mon Sep 17 00:00:00 2001
From: Jay Berkenbilt <ejb@ql.org>
Date: Wed, 11 Jul 2012 15:29:41 -0400
Subject: Support copying objects from another QPDF object

This includes QPDF::copyForeignObject and supporting foreign objects
as arguments to addPage*.
---
 TODO | 131 +++++++++++++++++++------------------------------------------------
 1 file changed, 37 insertions(+), 94 deletions(-)

(limited to 'TODO')

diff --git a/TODO b/TODO
index 0f408351..b29559ee 100644
--- a/TODO
+++ b/TODO
@@ -28,76 +28,54 @@ Next
    can only be used by one thread at a time, but multiple threads can
    simultaneously use separate objects.
 
+ * Write some documentation about the design of copyForeignObject.
 
-Soon
-====
+ * copyForeignObject still to do:
 
- * Provide an option to copy encryption parameters from another file.
-   This would make it possible to decrypt a file, manually work with
-   it, and then re-encrypt it using the original encryption parameters
-   including a possibly unknown owner password.
+    - qpdf command
 
- * See if I can support the new encryption formats mentioned in the
-   open bug on sourceforge.  Check other sourceforge bugs.
+      Command line could be something like
 
- * Splitting/merging concepts
+      --pages [ --new ] { file [password] numeric-range ... } ... --
 
-   newPDF() could create a PDF with just a trailer, no pages, and a
-   minimal info.  Then the page routines could be used to add pages to
-   it.
+      The first file referenced would be the one whose other data would
+      be preserved (like trailer, info, encryption, outlines, etc.).
+      --new as first file would just use an empty file as the starting
+      point.  Be explicit about whether outlines, etc., are handled.
+      They are not handled initially.
 
-   Starting with any pdf, you should be able to copy objects from
-   another pdf.  The copy should be smart about never traversing into
-   a /Page or /Pages.
+      Example: to grab pages 1-5 from file1 and 11-15 from file2
 
-   We could provide a method of copying objects from one PDF into
-   another.  This would do whatever optimization is necessary (maybe
-   just optimizePagesTree) and then traverse the set of objects
-   specified to find all objects referenced by the set.  Each of those
-   would be copied over with a table mapping old ID to new ID.  This
-   would be done from bottom up most likely disallowing cycles or
-   handling them sanely.
+      --pages file1.pdf 1-5 file2.pdf 11-15 --
 
-   Command line could be something like
+      To implement this, we would remove all pages from file1 except
+      pages 1 through 5.  Then we would take pages 11 through 15 from
+      file2, copy them to the file, and add them as pages.
 
-   --pages [ --new ] { file [password] numeric-range ... } ... --
+    - document that makeIndirectObject doesn't handle foreign objects
+      automatically because copying a foreign object is a big enough
+      deal that it should be explicit.  However addPages* does handle
+      foreign page objects automatically.
 
-   The first file referenced would be the one whose other data would
-   be preserved (like trailer, info, encryption, outlines, etc.).
-   --new as first file would just use an empty file as the starting
-   point.
+    - Test /Outlines and see whether there's any point in handling
+      them in the API.  Maybe just copying them over works.  What
+      about command line tool?  Also think about page labels.
 
-   Example: to grab pages 1-5 from file1 and 11-15 from file2
+    - Tests through qpdf command line: copy pages from multiple PDFs
+      starting with one PDF and also starting with empty.
 
-   --pages file1.pdf 1-5 file2.pdf 11-15 --
+ * (Hopefully) Provide an option to copy encryption parameters from
+   another file.  This would make it possible to decrypt a file,
+   manually work with it, and then re-encrypt it using the original
+   encryption parameters including a possibly unknown owner password.
 
-   To implement this, we would remove all pages from file1 except
-   pages 1 through 5.  Then we would take pages 11 through 15 from
-   file2 and add them to a set for transfer.  This would end up
-   generating a list of indirect objects.  We would copy those objects
-   shallowly to the new PDF keeping track of the mapping and replacing
-   any indirect object keys as appropriate, much like QPDFWriter does.
 
-   When all the objects are registered, we would add those pages to
-   the result.
-
-   This approach could work for both splitting and merging.  It's
-   possible it could be implemented now without any new APIs, but most
-   of the work should be doable by the library with only a small set
-   of additions.
+Soon
+====
 
-   newPDF()
-   QPDFObjectCopier c(qpdf1, qpdf2)
-   QPDFObjectHandle obj = c.copyObject(<object from qpdf1>)
-     Without traversing pages, copies all indirect objects referenced
-     by <object from qpdf1> preserving referential integrity and
-     returns an object handle in qpdf2 of the same object.  If called
-     multiple times on the same object, retraverses in case there were
-     changes.
+ * See if I can support the new encryption formats mentioned in the
+   open bug on sourceforge.  Check other sourceforge bugs.
 
-   QPDFObjectHandle obj = c.getMapping(<object from qpdf1>)
-     find the object in qpdf2 corresponding to the object from qpdf1.
-     Return the null object if none.
 
 General
 =======
@@ -110,23 +88,11 @@ General
  * Update qpdf docs about non-ascii passwords.  See thread from
    2010-12-07,08 for details.
 
- * Look at page splitting.  Subramanyam provided a test file; see
-   ../misc/article-threads.pdf.  Email Q-Count: 431864 from
-   2009-11-03.  See also "Splitting by Pages" below.
-
- * Consider writing a PDF merge utility.  With 2.2, it would be
-   possible to have a StreamDataProvider that would allow stream data
-   to be directly copied from one PDF file to another.  One possible
-   strategy would be to have a program that adds all the pages of one
-   file to the end of another file.  The basic
-   strategy would be to create a table that adds new streams to the
-   original file, mapping the new streams' obj/gen to a stream in the
-   file whose pages are being appended.  The StreamDataProvider, when
-   asked, could simply pipe the streams of the file being appended to
-   the provided pipeline and could copy the filter and decode
-   parameters from the original file.  Being able to do this requires
-   a lot of the same logic as being able to do splitting, so a general
-   split/merge program would be a great addition.
+ * Consider impact of article threads on page splitting/merging.
+   Subramanyam provided a test file; see ../misc/article-threads.pdf.
+   Email Q-Count: 431864 from 2009-11-03.  Other things to consider:
+   outlines, page labels, thumbnails, zones.  There are probably
+   others.
 
  * See whether it's possible to remove the call to
    flattenScalarReferences.  I can't easily figure out why I do it,
@@ -279,26 +245,3 @@ Index: QPDFWriter.cc
 
  * From a suggestion in bug 3152169, consisder having an option to
    re-encode inline images with an ASCII encoding.
-
-
-Splitting by Pages
-==================
-
-Although qpdf does not currently support splitting a file into pages,
-the work done for linearization covers almost all the work.  To do
-page splitting.  If this functionality is needed, study
-obj_user_to_objects and object_to_obj_users created in
-QPDF_optimization for ideas.  It's quite possible that the information
-computed by calculateLinearizationData is actually sufficient to do
-page splitting in many circumstances.  That code knows which objects
-are used by which pages, though it doesn't do anything page-specific
-with outlines, thumbnails, page labels, or anything else.
-
-Another approach would be to traverse only pages that are being output
-taking care not to traverse into the pages tree, and then to fabricate
-a new pages tree.
-
-Either way, care must be taken to handle other things such as
-outlines, page labels, thumbnails, threads, zones, etc. in a sensible
-way.  This may include simply omitting information other than page
-content.
-- 
cgit v1.2.3-70-g09d2