Sunday, November 1, 2009

Creating PDFS from virtually anything w/ linux

So recently I was asked (by my wife) if I would be able to join different web pages together all into 1 file for printing and storage. I thought it would take a very long time but turned out to be quite easy. Here's a lil how to:

  • Print each page/image/document to file (i usually name it an autoincrementing filename aka  1,2,3,4,...)
    • this stores the page as a PostScript file (even works in Windows) 
  • Join all of the PS files together to form a huge PS file
    • you do this using the ps-utility     psmerge
      • psmerge -oOutputFileName file1 file2 file3
        • you can include as many files as you want in order
        • there is NO space between the -o option and the file name
  • turn the PS file into a PDF file
    • use the ps2pdf utility 
      • ps2pdf psfile.ps pdffile.pdf
 This seems pretty straight forward but can be quite tedious if you have MANY MANY files to do at once.

Below is a python code that will make this process easier. It renames files (so they can be sorted properly) and generates the command (quite large at times) to perform the operation

 #!/usr/bin/python
import os
path='/home/paul/printer/dadsWorksheets/multiSpaceShip/psFiles/renamed/'
files = os.listdir(path)
files.sort()
##rename = True
rename = False
if rename:
    for i in files:
        name = i
        if len(name) < 3:
            name = '00%s' % name
        if len(name) > 3:
            name = name[1:]
        print name
        command = 'cp %s/%s %s/renamed/%s' % (path,i,path,name)
        os.system(command)
#print files
merge = True
##merge = False
if merge:
        com = 'psmerge -oCombined'
        #print com
        for i in files:
            com = ' %s %s' % (com,i)
        print com
        os.chdir(path)
        os.system(com)

 its simple and quite crude but it gets the job done quite nicely!!
give it a shot and you'll be storing stuff as PDFs in no time!