Site Map - skip to main content

Hacker Public Radio

Your ideas, projects, opinions - podcasted.

New episodes every weekday Monday through Friday.
This page was generated by The HPR Robot at


hpr1760 :: pdftk: the PDF Toolkit

Intro to the command-line pdf toolkit

<< First, < Previous, , Latest >>

Thumbnail of Jon Kulp
Hosted by Jon Kulp on 2015-05-01 is flagged as Clean and is released under a CC-BY-SA license.
pdftk, pdf. 1.
The show is available on the Internet Archive at: https://archive.org/details/hpr1760

Listen in ogg, spx, or mp3 format. Play now:

Duration: 00:20:54

general.

Hacking Apart and Re-Assembling PDFs

Extract pages 3–5 from file foobar.pdf:

pdftk foobar.pdf cat 3-5 output excerpt.pdf

Same thing but also grab the cover page:

pdftk foobar.pdf cat 1 3-5 output excerpt.pdf

Combine multiple PDFs:

pdftk file1.pdf file2.pdf file3.pdf cat output combined.pdf

Reassemble a 50-page document with all of the pages in reverse order (I once actually did this for my wife and she was very grateful—she had scanned an article at the library and it ended up with all of the pages in the wrong order from last to first. This command solved her problem in about one second.):

pdftk wrongorder.pdf cat 50-1 output rightorder.pdf

Check the pdftk man page for all kinds of other manipulations you can do, including "bursting" a PDF into its component pages, rotating pages in any direction, applying password protection, etc.

Embedding “Bookmarks” as a Table of Contents

You can also use pdftk to embed a table of contents in a flat PDF file. This is incredibly useful, as it can make large, unwieldy files very easy to navigate. All you have to do is add some bookmark data in a fairly straightforward format as shown below. As a starting point you should that dump the current metadata content of the file with this command:

pdftk foobar.pdf dump_data_utf8

Save the contents of this data dump in a text file and then add bookmark information just below the NumberOfPages value. Here is an excerpt from the huge anthology of public-domain scores I assembled for my music history class:

InfoBegin
InfoKey: ModDate
InfoValue: D:20150106100000-06'00'
InfoBegin
InfoKey: CreationDate
InfoValue: D:20150106100000-06'00'
InfoBegin
InfoKey: Creator
InfoValue: pdftk 2.02 - www.pdftk.com
InfoBegin
InfoKey: Producer
InfoValue: itext-paulo-155 (itextpdf.sf.net-lowagie.com)
PdfID0: ece858bf9affbcad3b575cf3891a187f
PdfID1: 23f89459e103dd43c6e7bc92028245c0
NumberOfPages: 765
BookmarkBegin
BookmarkTitle: Beethoven: Symphony no. 5 in C minor Op. 67
BookmarkLevel: 1
BookmarkPageNumber: 205
BookmarkBegin
BookmarkTitle: Beethoven 5: I. Allegro con brio
BookmarkLevel: 2
BookmarkPageNumber: 205
BookmarkBegin
BookmarkTitle: Beethoven 5: II. Andante con moto
BookmarkLevel: 2
BookmarkPageNumber: 235
BookmarkBegin
BookmarkTitle: Beethoven 5: III. Allegro
BookmarkLevel: 2
BookmarkPageNumber: 256
BookmarkBegin
BookmarkTitle: Beethoven 5: IV. Allegro
BookmarkLevel: 2
BookmarkPageNumber: 275

And here is the command to update the PDF with the table of contents embedded. This tells it to take the input file foobar.pdf and update its metadata using the file foobar.info (with utf8 encoding) and output the results as foobar_with_toc.pdf.

pdftk foobar.pdf update_info_utf8 foobar.info output foobar_with_toc.pdf

Links

Update

I made a screencast as a follow-up, showing the process of embedding bookmarks to make a table of contents: https://m.youtube.com/watch?v=5dv_02v0zzc


Comments

Subscribe to the comments RSS feed.

Comment #1 posted on 2015-05-03 02:42:05 by Jon Kulp

video demo: embedding table of contents in PDF

I made a screencast as a follow-up, showing the process of embedding bookmarks to make a table of contents: https://m.youtube.com/watch?v=5dv_02v0zzc

Leave Comment

Note to Verbose Commenters
If you can't fit everything you want to say in the comment below then you really should record a response show instead.

Note to Spammers
All comments are moderated. All links are checked by humans. We strip out all html. Feel free to record a show about yourself, or your industry, or any other topic we may find interesting. We also check shows for spam :).

Provide feedback
Your Name/Handle:
Title:
Comment:
Anti Spam Question: What does the letter P in HPR stand for?
Are you a spammer?
What is the HOST_ID for the host of this show?
What does HPR mean to you?