- Pdftotext
pdftotext is an
open source command-line utility for convertingPDF files toplain text files —i.e. extracting text data from PDF-protected files. It is freely available and included with manyLinux distributions. It must be installed as part of thexpdf package forMac OS X (fink install Xpdf) or Windows.$ pdftotext file.pdfThis usage produces a text file with the same name as the input file. Wildcards (*), for example
$ pdftotext *pdf
, for converting multiple files, cannot be used because pdftotext expects only one file name. A loop on the shell is needed for batch conversions, as in$ for f in *.pdf > do > pdftotext $f > done
for the
bash shell .The pdftotext program is part of a larger PDF related package called
Xpdf . which can be downloaded from [http://www.foolabs.com/xpdf/download.html foolabs.com] .
Wikimedia Foundation. 2010.