How to count pages in PDF file? Determine number of pages in a PDF file?

How to get number of pages in a PDF file?

I need a command line tool that can determine the number of pages in a pdf and or a library that could be used from PHP.
----------------------------------
Can anyone please help me providing script to get the number of pages in a PDF file?
----------------------------------
i′m interested in your php code for pdf page number count.
----------------------------------
I need a way to count the number of pages of a PDF in PHP. I've done a bit of Googling and the only things I've found either utilize shell/bash scripts, perl, or other languages, but I need something in native PHP. Are there any libraries or examples of how to do this?
----------------------------------

How to get how many pages in a PDF? I read PDF spec. V1.6 and find this:
PDF set a "Page Tree Node" to define the ordering of pages in the document. The tree structure allows PDF applications, using little memory to quickly open a document containing thousands of pages.

If a PDF have 63 pages, the page tree node will like this...

2 0 obj
<< /Type /Pages
/Kidsn [ 4 0 R 10 0 R]
/Count 63 <---- YES, got it
>>
endobj
[P.S] a PDF may not only a pages tree node, The right answer is in "root page tree node", if /Count XX with /Parent XXX node, it not "root page tree node"
SO, You must find the node with /Count XX and Without /Parent terms, and you'll get total pages of PDF

%PDF-1.0 ~ %PDF-1.5 all works

In other words, you would simply have to look for the "/Count ", and the number of pages will be right in front of it.

For example:

Code:

/Type /Pages /Kids [ 2386 0 R 2388 0 R 2389 0 R 2390 0 R 2391 0 R 2392 0 R 2393 0 R ]
/Count 67

So Once you find the "/Type /Pages" inside the text of your PDF file, the "/Count " that follows it will have the number of pages in it.

I have been getting a lot of emails asking me about this issue, saying that there are more then one "/Type /Pages" inside their PDF file.

YES! But only one of those is the ROOT set.

Example Of a ROOT Identifier Set:

Code:

/Type /Pages
/Kids [ 2386 0 R 2388 0 R 2389 0 R 2390 0 R 2391 0 R 2392 0 R 2393 0 R]
/Count 67
>>

Example of what is NOT the root node:

Code:

/Type /Pages
/Kids [ 250 0 R 253 0 R 256 0 R 259 0 R 267 0 R 275 0 R 283 0 R 291 0 R 299 0 R]
/Count 9
/Parent 15770 0 R
>>

Notice that in the second example there is a "/Parent" tag. This means that the second one is NOT the root. The first example however does NOT have a "/Parent", which means it IS the ROOT.

I hope this clarifies it for everyone.

----------------------------------

This is a simple PHP function to get the page count from PHP file, this PHP function will failed if a PDF file not contain “/count” tag,

function getNumPagesPdf($filepath) {
    $fp = @fopen(preg_replace("/\[(.*?)\]/i", "", $filepath), "r");
    $max = 0;
    if (!$fp) {
        return "Could not open file: $filepath";
    } else {
        while (!@feof($fp)) {
            $line = @fgets($fp, 255);
            if (preg_match('/\/Count [0-9]+/', $line, $matches)) {
                preg_match('/[0-9]+/', $matches[0], $matches2);
                if ($max < $matches2[0]) {
                    $max = trim($matches2[0]);
                    break;
                }
            }
        }
        @fclose($fp);
    }

    return $max;
}

----------------------------------

Try this :

<?php
if (!$fp = @fopen($_REQUEST['file'],"r")) {
        echo 'failed opening file '.$_REQUEST['file'];
}
else {
        $max=0;
        while(!feof($fp)) {
                $line = fgets($fp,255);
                if (preg_match('/\/Count [0-9]+/', $line, $matches)){
                        preg_match('/[0-9]+/',$matches[0], $matches2);
                        if ($max<$matches2[0]) $max=$matches2[0];
                }
        }
        fclose($fp);
        echo 'There '.($max<2?'is ':'are ').$max.' page'.($max<2?'':'s').
             ' in '. $_REQUEST['file'].'.';
}
?>

The Count tag shows the number of pages in the different nodes. The parent node has the sum of the others in its Count tag, so this script just looks for the max (that is the number of pages).

----------------------------------

This one does not use 3rd applications,

function getNumPagesInPDF($file) 
{
    if(!file_exists($file))return null;
    if (!$fp = @fopen($file,"r"))return null;
    $max=0;
    while(!feof($fp)) {
            $line = fgets($fp,255);
            if (preg_match('/\/Count [0-9]+/', $line, $matches)){
                    preg_match('/[0-9]+/',$matches[0], $matches2);
                    if ($max<$matches2[0]) $max=$matches2[0];
            }
    }
    fclose($fp);
    return (int)$max;

}

----------------------------------

This is a C# example to get the page count from a PDF file,

public int getNumberOfPdfPages(string fileName)
{
    using (StreamReader sr = new StreamReader(File.OpenRead(fileName)))
    {
        Regex regex = new Regex(@"/Type\s*/Page[^s]");
        MatchCollection matches = regex.Matches(sr.ReadToEnd());

        return matches.Count;
    }
}

----------------------------------

I'm using this code (but it only works with PDF 1.5 or below):

public int getNumberOfPdfPages(string fileName)
{
    using (StreamReader sr = new StreamReader(File.OpenRead(fileName)))
    {
        Regex regex = new Regex(@"Type/Pages/Count [\d]*");
        Match match = regex.Match(sr.ReadToEnd());
        return Int32.Parse(match.Value.Split(' ')[1]);
    }
}

----------------------------------

This C# source code shows how to count pages in a pdf file,

++++++++++++++++++++++++++++++++++++

//Function for finding the number of pages in a given PDF file
string PgCount = string.Empty;
System.IO.FileInfo fextension = new FileInfo(vfileName);
string extension = fextension.Extension;
bool flag = UploadFile(vfileName);
if (extension == ".pdf" || extension == ".PDF")
{
        FileStream fs = new FileStream(vfileName, FileMode.Open, FileAccess.Read);
        StreamReader sr = new StreamReader(fs);
        string pdf = sr.ReadToEnd();
        Regex rx = new Regex(@"/Type\s/Page[^s]");
        MatchCollection match = rx.Matches(pdf);
        if (flag == true)
        {
                PgCount = match.Count.ToString();
        }
}

++++++++++++++++++++++++++++++++++++

I'm using VB.Net instead C#. Below is the function I created in VB based on the above code,

++++++++++++++++++++++++++++++++++++

Imports System.IO
Imports System.Text.RegularExpressions

Private Function pageCountPDF(ByRef pdfFile As FileInfo) As Integer
     ' Function for finding the number of pages in a given PDF file

     pageCountPDF = 0

     If pdfFile.Exists Then
          Dim fs As FileStream = New FileStream(pdfFile.FullName,
              FileMode.Open, FileAccess.Read)
          Dim sr As StreamReader = New StreamReader(fs)
          Dim pdfMagicNumber() As Char = "0000".ToArray

          sr.Read(pdfMagicNumber, 0, 4) ' put the first for characters of
                                        ' the file into the pdfMagicNumber array

          If pdfMagicNumber = "%PDF".ToArray Then 'The first four characters
                                                  ' of a PDF file should start with %PDF
               Dim pdfContents As String = sr.ReadToEnd()
               Dim rx As Regex = New Regex("/Type\s/Page[^s]")
               Dim match As MatchCollection = rx.Matches(pdfContents)
               pageCountPDF = match.Count
          Else
               Throw New Exception("File does not appear to be a PDF file (magic number not found).")
          End If
     Else
          Throw New Exception("File does not exist.")
     End If
End Function

++++++++++++++++++++++++++++++++++++

If all of above functions are failed to retrieve page count from your PDF file, you can use VeryPDF Products to read the page number from a PDF file,

星星 Product #1. You can use PDF Split-Merge Command Line to read the page count from a PDF file, you may download and install PDF Split-Merge Command Line from following URL,

http://www.verypdf.com/pdfpg/pdfpg.exe
http://www.verypdf.com/app/pdf-split-merge/try-and-buy.html

after you installed it, you can run or call following command line to get the page count from a PDF file,

"C:\Program Files (x86)\VeryPDF PDF Split-Merge v3.0\pdfpg.exe" getpagecount D:\downloads\test.pdf

You will get following message with above command line,
-----------------------------
There are 13 pages in D:\downloads\test.pdf
Please purchase PDF Split-Merge on www.verypdf.com to remove this message.
-----------------------------

星星 Product #2. You can use Advanced PDF Tools Command Line product to get the page count from a PDF file,

http://www.verypdf.com/app/advanced-pdf-tools/try-and-buy.html
http://www.verypdf.com/pdfinfoeditor/advanced_pdf_tools_cmd.zip

after you download and unzip it to a folder, you can run following command line to get the page count from a PDF file,

pdftools.exe -r –i D:\help.pdf | findstr /C:"File Pages Count:"

pdftools.exe is not only get page count from a PDF file, but also get the metadata, paper size, document summaries, page layout, etc. information from a PDF file.

星星 Product #3. Use PDF Toolbox Command Line to get the number of PDF file,

http://www.verypdf.com/app/pdftoolbox/try-and-buy.html
http://www.verypdf.com/dl.php?file=pdftoolbox_cmd_win.zip

pdftoolbox.exe D:\test.pdf -getinfo -outfile "D:\_getinfo_out.txt"
findstr /C:NumberOfPages D:\_getinfo_out.txt

    NumberOfPages: 2

星星 Product #4. Use VeryPDF Cloud API to get the page count from a PDF file,

If your PDF file is downloadable, you can use VeryPDF Cloud API to get the page count from a PDF file, for example, the PDF URL is,

http://online.verypdf.com/examples/cloud-api/verypdf.pdf

Now you execute following URL to get the page number for this PDF file,

http://online.verypdf.com/api/?apikey=XXXX-XXXX-XXXX-XXXX&app=getpagecount&infile=http://online.verypdf.com/examples/cloud-api/verypdf.pdf

星星 Product #5. Use VeryPDF Cloud API to get the page count along with other information from a PDF file,

http://online.verypdf.com/api/?apikey=XXXX-XXXX-XXXX-XXXX&app=pdfinfo&infile=http://online.verypdf.com/examples/cloud-api/verypdf.pdf

You will get the information like below, you can strip “pages” and other information easily,

Title:          PDF Tools, Document Process Software, Multimedia Applications and Development Packages - VeryPDF
Creator:       
Producer:       VeryPDF
CreationDate:   D:20130712095842-04'00'
Tagged:         no
Form:           none
Pages:          1
Encrypted:      no
Page size:      595 x 842 pts (A4)
MediaBox:       0.00     0.00   595.00   842.00
CropBox:        0.00     0.00   595.00   842.00
BleedBox:       0.00     0.00   595.00   842.00
TrimBox:        0.00     0.00   595.00   842.00
ArtBox:         0.00     0.00   595.00   842.00
File size:      203433 bytes
Optimized:      no
PDF version:    1.4

VN:F [1.9.20_1166]
Rating: 7.4/10 (5 votes cast)
VN:F [1.9.20_1166]
Rating: +2 (from 4 votes)
How to count pages in PDF file? Determine number of pages in a PDF file?, 7.4 out of 10 based on 5 ratings

Related Posts

This entry was posted in Advanced PDF Tools, PDF Split-Merge, VeryPDF Cloud API and tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *


Verify Code   If you cannot see the CheckCode image,please refresh the page again!