Sometimes somebody send you a pdf with big white margins that only annoy you. This is often when you receive Power Point presentations in pdf format so is almost impossible to print several slides on one page.

PDF Before Conversion

Before

PDF After Conversion

After


To avoid this you can crop the pdf pages and remove those margins.

Getting the bounding box

First execute pdftoppm to get the first slide and then open it with The Gimp.

felipe@funstation $ pdftoppm -f 1 -l 1 -png clase_1_introduccion.pdf test
felipe@funstation $ gimp test-01.png

Now put the cursor on the top-left corner of the desired place to begin the crop. Look at the bottom of The Gimp window and write somewhere the coordinates. Now do the same thing with the bottom-right corner. With those numbers you have the (X,Y) coordinates of the top-left and the width and height of the box.

In my case I get:

Top Left Coordinates of PDF

Top Left

Bottom Right Coordinates of PDF

Bottom Right


   top-left-X = 261
   top-left-Y = 198

   bottom-right-X = 1014
   bottom-right-Y = 1449
   width  = bottom-right-X – top-left-X = 1014 – 261 = 753
   height = bottom-right-Y – top-left-Y = 1449 – 198 = 1251

Cropping and saving the slides as images (png)

felipe@funstation $ pdftoppm -x 261 -W 753 -y 198 -H 1251 -png clase_1_introduccion.pdf clase1_temp

If the images doesn’t look nice (low resolution, check this using a non anti-aliasing viewer) you can try to increase from 150dpi (default value of pdftoppm) to 600dpi.

The ratio between 600 and 150 is 4:1 so you need to do some complex maths:
    new-top-left-X = top-left-X * 4 = 261 * 4 = 1044
    new-top-left-Y = top-left-Y * 4 = 198 * 4 = 792
    new-width  = width  * 4 =   753 * 4 = 3012
    new-height = height * 4 = 1251 * 4 = 5004

So the new command line is:

felipe@funstation $ pdftoppm -r 600 -x 1044 -W 3012 -y 792 -H 5004 -png clase_1_introduccion.pdf clase1_temp

Merging images in a pdf

Using the convert command (ImageMagick) you can reassemble the images in a single pdf:

felipe@funstation $ convert clase1_temp*.png clase_1_introduccion_cropped.pdf

Deleting temporary files

felipe@funstation $ rm -f test-01.png clase1_temp*.png

Notes

  • Using 600dpi instead of 150dpi will increase a bit the size of temporal files and the size of output pdf can increase from 3MB to 14MB. The big problem is that is going to eat huge amounts of ram memory (can be several GB) making your computer a bit unstable (only on convert). Also processing time will increase significantly (only on pdftoppm). I recomend to use only 300dpi.
  • We use the –png flag to avoid get the output files in ppm format. Using this we can save space (in hard drive) in both temp images and final pdf (it really generate an smaller pdf).

Example Files

TODO

  • Convert png to jpg before making the pdf and see the change of quality v/s size