Counting the exact number of pages in any PDF document
'-----------------------------------------------------------------
' IF you have ability then for PDF 1.3 version also
' Open file pdf in binarymode
' Read last 50 lines of that file
' In between somewhere u will find a line
' "/count xx" pages the xx is # of pages
'MADE ON 14TH AUG 06
'-----------------------------------------------------------------------
' open the PDF in binary mode & count the pages
' search for "/N xx"
' or "/Count xx"
Public Sub pagecount(sfilename As String)
On Error GoTo a
Dim nFileNum As Integer
Dim s As String
Dim c As Integer
Dim pos, pos1 As Integer
pos = 0
pos1 = 0
c = 0
' Get an available file number from the system
nFileNum = FreeFile
'OPEN the PDF file in Binary mode
Open sfilename For Binary Lock Read Write As #nFileNum
' Get the data from the file
Do Until EOF(nFileNum)
Input #1, s
c = c + 1
If c <= 10 Then
pos = InStr(s, "/N")
End If
pos1 = InStr(s, "/count")
If pos > 0 Or pos1 > 0 Then
Close #nFileNum
s = Trim(Mid(s, pos, 10))
s = Replace(s, "/N", "")
s = Replace(s, "/count", "")
s = Replace(s, " ", "")
s = Replace(s, "/", "")
For i = 65 To 125
s = Replace(s, Chr(i), "")
Next
pages = Val(Trim(s))
If pages < 0 Then
pages = 1
End If
Close #nFileNum
Exit Sub
End If
'imp only 1000 lines searches
If c >= 1000 Then
GoTo a
End If
Loop
Close #nFileNum
Exit Sub
a:
Close #nFileNum
pages = 1
Exit Sub
End Sub
============================================
I actually went with a combined approach. Since I have exec disabled on my server I wanted to stick with a PHP based solution, so ended up with this:
Code:
function getNumPagesPdf($filepath){
$fp = @fopen(preg_replace("/\[(.*?)\]/i", "",$filepath),"r");
$max=0;
while(!feof($fp)) {
$line = fgets($fp,255);
if (preg_match('/\/Count [0-9]+/', $line, $matches)){
preg_match('/[0-9]+/',$matches[0], $matches2);
if ($max<$matches2[0]) $max=$matches2[0];
}
}
fclose($fp);
if($max==0){
$im = new imagick($filepath);
$max=$im->getNumberImages();
}
return $max;
}
If it can't figure things out because there are no Count tags, then it uses the imagick php extension. The reason I do a two-fold approach is because the latter is quite slow.
==================================================
Try this :
<?php
if (!$fp = @fopen($_REQUEST['file'],"r")) {
echo 'failed opening file '.$_REQUEST['file'];
}
else {
$max=0;
while(!feof($fp)) {
$line = fgets($fp,255);
if (preg_match('/\/Count [0-9]+/', $line, $matches)){
preg_match('/[0-9]+/',$matches[0], $matches2);
if ($max<$matches2[0]) $max=$matches2[0];
}
}
fclose($fp);
echo 'There '.($max<2?'is ':'are ').$max.' page'.($max<2?'':'s').' in '. $_REQUEST['file'].'.';
}
?>
The Count tag shows the number of pages in the different nodes. The parent node has the sum of the others in its Count tag, so this script just looks for the max (that is the number of pages).
You can also use Spool File Page Counter SDK to count the pages in PDF file, Spool File Page Counter SDK can be downloaded from following web page,
http://www.verydoc.com/spool-page-count.html
Advanced PDF Tools Command Line has ability to count the PDF pages too, Advanced PDF Tools Command Line can be downloaded from following web page,
https://www.verypdf.com/pdfinfoeditor/index.html#dl
I am trying out your Pdf Page Counter SDK for VB.Net.
It is possible to look for each page in pdf-file if it is black and white or color?
Thanks for your answer.
Yes, of course, you can use Pdf Page Counter SDK to count the page count for black and white and color PDF pages, please refer to following sample code, ReadInfoFromAllFormats() function is support PCL, PDF, PS, etc. formats.
void main(int argc, char *argv[])
{
if(argc != 2)
{
printf(“%s C:\\test.ps\n”, argv[0]);
printf(“%s C:\\test.pcl\n”, argv[0]);
printf(“%s C:\\test.spl\n”, argv[0]);
return;
}
char *lpInFile = argv[1];
char drive[_MAX_DRIVE];
char dir[_MAX_DIR];
char fname[_MAX_FNAME];
char ext[_MAX_EXT];
_splitpath(lpInFile, drive, dir, fname, ext );
BOOL bIsRenderToPDF = TRUE;
DWORD bwPageCount = 0;
DWORD colorPageCount = 0;
DWORD copyCount = 0;
double nPageWidth = 0;
double nPageHeight = 0;
char szPaperSizeName[200] = {0};
BOOL bRet = FALSE;
ReadInfoSetCode(“XXXXXXXXXXXXXXXXXXX”);
ReadInfoEnableDebug(1);
if(!stricmp(ext, “.ps”) || !stricmp(ext, “.eps”))
bRet = ReadInfoFromPSFile(lpInFile, bIsRenderToPDF, &bwPageCount, &colorPageCount, ©Count, &nPageWidth, &nPageHeight, szPaperSizeName);
else if(!stricmp(ext, “.pcl”))
bRet = ReadInfoFromPCLFile(lpInFile, bIsRenderToPDF, &bwPageCount, &colorPageCount, ©Count, &nPageWidth, &nPageHeight, szPaperSizeName);
else
bRet = ReadInfoFromAllFormats(lpInFile, bIsRenderToPDF, &bwPageCount, &colorPageCount, ©Count, &nPageWidth, &nPageHeight, szPaperSizeName);
printf(“=======================================\n”);
printf(“File = ‘%s’\n”, lpInFile);
printf(“Return Value = %s\n”, bRet?”TRUE”:”FALSE”);
printf(“bIsRenderToPDF = %d\n”, bIsRenderToPDF);
printf(“bwPageCount = %d\n”, bwPageCount);
printf(“colorPageCount = %d\n”, colorPageCount);
printf(“copyCount = %d\n”, copyCount);
printf(“PageWidht = %g\n”, nPageWidth);
printf(“PageHeight = %g\n”, nPageHeight);
printf(“PaperSizeName = ‘%s’\n”, szPaperSizeName);
}
We can also get the color depth for each page in the PCL, PS and PDF formats, please send an email to support@verypdf.com, we will assist you continue.
VeryPDF
Thanks for your very quick answer.
But the code is in C++ i think and i don’t know this.
Can you give it me for VB.
Thanks in advance for your help.
please refer to the VB code at blow,
Private Declare Function ReadInfoFromPSFile Lib “ReadInfo.dll” (ByVal fileName As String, ByVal bIsRenderToPDF As Long, _
ByRef bwPageCount As Long, ByRef colorPageCount As Long, ByRef copyCount As Long, ByRef pagewidth As Double, _
ByRef pageheight As Double, ByVal paperSizeName As String) As Long
Private Declare Function ReadInfoFromPCLFile Lib “ReadInfo.dll” (ByVal fileName As String, ByVal bIsRenderToPDF As Long, _
ByRef bwPageCount As Long, ByRef colorPageCount As Long, ByRef copyCount As Long, ByRef pagewidth As Double, _
ByRef pageheight As Double, ByVal paperSizeName As String) As Long
Private Declare Sub ReadInfoSetCode Lib “ReadInfo.dll” (ByVal strCode As String)
Private Sub Command1_Click()
Dim bIsRenderToPDF As Long
Dim bwPageCount As Long
Dim colorPageCount As Long
Dim copyCount As Long
Dim nPageWidth As Double
Dim nPageHeight As Double
Dim strPaperSizeName As String
Dim nRet As Long
Dim strMsg As String
Dim strFileName As String
bIsRenderToPDF = 1
bwPageCount = 0
colorPageCount = 0
copyCount = 0
nPageWidth = 0
nPageHeight = 0
strPaperSizeName = Space$(300)
strFileName = App.Path & “\test_tiger.eps”
ReadInfoSetCode (“XXXXXXXXXXXXXXXXXXXXXX”)
nRet = ReadInfoFromPSFile(strFileName, bIsRenderToPDF, bwPageCount, colorPageCount, copyCount, _
nPageWidth, nPageHeight, strPaperSizeName)
strMsg = strMsg + “FileName = ” + strFileName + vbCrLf
strMsg = strMsg + “bIsRenderToPDF = ” + CStr(bIsRenderToPDF) + vbCrLf
strMsg = strMsg + “bwPageCount = ” + CStr(bwPageCount) + vbCrLf
strMsg = strMsg + “colorPageCount = ” + CStr(colorPageCount) + vbCrLf
strMsg = strMsg + “copyCount = ” + CStr(copyCount) + vbCrLf
strMsg = strMsg + “PageWidth = ” + CStr(nPageWidth) + vbCrLf
strMsg = strMsg + “PageHeight = ” + CStr(nPageHeight) + vbCrLf
strMsg = strMsg + “PaperSizeName = ” + CStr(strPaperSizeName) + vbCrLf
MsgBox strMsg
strFileName = App.Path & “\test_grid.pcl”
nRet = ReadInfoFromPCLFile(strFileName, bIsRenderToPDF, bwPageCount, colorPageCount, copyCount, _
nPageWidth, nPageHeight, strPaperSizeName)
strMsg = “”
strMsg = strMsg + “FileName = ” + strFileName + vbCrLf
strMsg = strMsg + “bIsRenderToPDF = ” + CStr(bIsRenderToPDF) + vbCrLf
strMsg = strMsg + “bwPageCount = ” + CStr(bwPageCount) + vbCrLf
strMsg = strMsg + “colorPageCount = ” + CStr(colorPageCount) + vbCrLf
strMsg = strMsg + “copyCount = ” + CStr(copyCount) + vbCrLf
strMsg = strMsg + “PageWidth = ” + CStr(nPageWidth) + vbCrLf
strMsg = strMsg + “PageHeight = ” + CStr(nPageHeight) + vbCrLf
strMsg = strMsg + “PaperSizeName = ” + CStr(strPaperSizeName) + vbCrLf
MsgBox strMsg
End Sub
you can also download the test package from following web page,
http://www.verydoc.com/spool-page-count.html
http://www.verydoc.com/ps-and-pcl-info-sdk.zip
this test package contains C#, VB, VB.NET, VC++, SDK/COM interface etc. examples, you can download and test it in your system easily.
you can also run test application in CMD window to determine a PDF page is BW or Color, please refer to following test case,
C:\>E:\ps-and-pcl-info-sdk\bin\C#_ParsingTest.exe D:\temp\TestDoc.pdf
args length is 1
args index 0 is [D:\temp\TestDoc.pdf]
=============================
Page 1 is [Color]
Page 2 is [Color]
Page 3 is [ BW]
Page 4 is [ BW]
Page 5 is [ BW]
Page 6 is [ BW]
Page 7 is [ BW]
=============================
Statistics: bwPageCount=5, colorPageCount=2
File: D:\temp\PoemsTestDoc.pdf
Render To PDF: 1
BW Pages: 5
Color Pages: 2
Width: 0
Height: 0
Paper name:
as you see, you can get BW or color for each page easily.
Dear Support,
thanks for your help. I’am a beginner in programming with VB. And I don’t know, how I can get the colorinformation for each page in a pdf-file like this:
For Each Page in strFileName
ListBox1.Item.Add(Page, bwPageCount, colorPageCount, PageWidth, PageHeight)
Can you help me
In the demo version, the following information is printed to console only,
=============================
Page 1 is [Color]
Page 2 is [Color]
Page 3 is [ BW]
Page 4 is [ BW]
Page 5 is [ BW]
Page 6 is [ BW]
Page 7 is [ BW]
=============================
after you purchased it, we will send a new version of SDK to you, you will able to get color information for each page from SDK easily, the demo version hasn’t this function yet.
Dear Support,
is there a Limitation of Pages in your SDK?
Thanks for your answer
Can you please let us know what product are you using? because the different product has different limitation in the trial version.
I use the ps-and-pcl-info-sdk with C#_ParsingTest.exe from VB.Net and now it works.
But if in path or filename a space the script don’t run.
Can you tell me why
You need use “” to include input and output filenames.