Testing that generated PDFs are visually the same

| 3 min read

We need to generate various PDF files for the project I’m currently working on. Some of them are around 50 pages long. With such large documents, missing that we made a change that messes with the output has a high probability of happening. We wanted to automate discovering mistakes and avoid the tedious task of looking at no-so-interesting PDFs - most of them are contracts - after every change and decided to add some automated tests. Bonus point if our tests are capable of discovering that the newly created PDFs are not the same as the expected ones but also to show us what is different. And we did it.

We created a custom assertion we can use inside PhpUnit. That assertion is in a trait, which means we can use it in multiple testing classes.
That assertion uses diff-pdf, a CLI tool for visually comparing PDFs. If you want to use the assertion, you must install it.

The assertion is inspired by Approval Testing and only takes the path to the new PDF as a parameter. It then compares that file with a stored PDF with the new one. If they are not visually similar the assertion fails.

Here is an example of its usage:

class GeneratePDFTest extends TestCase  
{
use PdfAssertion;

/**
* @test
*/

public function generate_the_PDF_document_with_the_information_we_want(): void
{
$PDFGenerator = new PDFGenerator();

$pdfFilePath = $PDFGenerator->generateFor('Charles');

$this->verifyPDF($pdfFilePath);
}
}

First, use the PdfAssertion trait, and then, in your tests, generate a PDF and call the verifyPDF method with the path of the newly generated PDF.

When you run the test, the generated PDF is copied to a file ending with .received.pdf in the approval directory at the same level as the test file. That file is compared with the file ending with .approved.pdf in that same directory. If they match, the test passes; if they don’t, the test fails. You must keep the .approved.pdf as it is the reference for future test runs. The *.received.pdf pattern can be added to your .gitignore.

The first time you run the test, you’ll get an error explaining that no file was previously approved and that you need to review the generated file and create an approved file if you are satisfied with its content. The error message even gives you the copy command to run to create the approved file.

If the two files do not match and the test fails, the tool will produce a file ending with .diff.pdf showing you the differences between them.
That’s super convenient. There is no need to read all that long contract to discover what changed!
These *.diff.pdf should also be excluded from versioning.

Below is a diff of a PDF supposed to render “Bravo Charles” that actually rendered “Bravo Champion.” The end of “Charles” is red colored, and the end of “Champion” is in cyan.

A visual diff example

Here is the code of the trait:


trait PdfAssertion
{
/**
* @var int[]
*/

private array $PDFVerifiedInTestCount = [];

private function verifyPDF(string $pdfFilePath): void
{
self::assertFileExists($pdfFilePath, 'The PDF file doesn\'t exist');

$this->incrementPDFVerificationInTestCount();

$this->copyToReceivedFile($pdfFilePath);

$this->ensureApprovalDirectoryExists();

$this->ensureApprovedFileExists();

$this->verifyThatPDFsAreMatching();
}

private function verifyThatPDFsAreMatching(): void
{
$approvedPDFFilePath = $this->approvedFileName();
$receivedPDFFilePath = $this->receivedFileName();
$diffedPDFFileName = $this->diffedFileName();

$command = sprintf(
'diff-pdf --output-diff="%s" "%s" "%s" 2>&1',
$diffedPDFFileName,
$approvedPDFFilePath,
$receivedPDFFilePath
);

exec($command, $output, $resultCode);

self::assertEquals(
0,
$resultCode,
sprintf(
<<<EOS
The generated PDF file is not the same as the control pdf. Diff is visible here: %s
EOS,
$diffedPDFFileName
)
);
}

private function ensureApprovedFileExists(): void
{
$approvedPDFFilePath = $this->approvedFileName();

$receivedFileName = $this->receivedFileName();

self::assertFileExists($approvedPDFFilePath, sprintf(<<<EOS
No approved file exist for PDF %1\$s.
Please review that file and if you are satisfied with its content, copy it to %2\$s.
cp "%1\$s" "%2\$s"
EOS,
$receivedFileName,
$approvedPDFFilePath
));
}

private function ensureApprovalDirectoryExists(): void
{
$approvalDirectory = $this->getSnapshotDirectory();

if (!is_dir($approvalDirectory)) {
mkdir($approvalDirectory);
}
}

private function incrementPDFVerificationInTestCount(): void
{
$testName = $this->cleanTestName();

if (!array_key_exists($testName, $this->PDFVerifiedInTestCount)) {
$this->PDFVerifiedInTestCount[$testName] = 0;
}

$this->PDFVerifiedInTestCount[$testName]++;
}

private function currentPDFVerificationCount(string $testName): int
{
return $this->PDFVerifiedInTestCount[$testName];
}

private function copyToReceivedFile(string $pdfFilePath): void
{
copy($pdfFilePath, $this->receivedFileName());
}

private function approvedFileName(): string
{
return $this->fileName('approved');
}

private function diffedFileName(): string
{
return $this->fileName('diff');
}

private function receivedFileName()
{
return $this->fileName('received');
}

private function fileName(string $suffix): string
{
$approvalDirectory = $this->getSnapshotDirectory();

$cleanTestName = $this->cleanTestName();

return $approvalDirectory ./. $cleanTestName ._. $this->currentPDFVerificationCount(
$cleanTestName
) ... $suffix ..pdf’;
}

private function getSnapshotDirectory(): string
{
return dirname((new ReflectionClass($this))->getFileName()).
DIRECTORY_SEPARATOR.
‘approval’;
}

private function cleanTestName(): string
{
return self::cleanFilename($this->nameWithDataSet());
}

private static function cleanFilename(string $raw): string
{
$file = preg_replace("([^\w\s\d\-_~,;\[\]\(\).])u", '', $raw);

$file = preg_replace("([\.]{2,})", '', $file);

return $file;
}
}

Part of the code is stolen from Spatie’s Snaphots Assertion library. Thank them!

If you look at the code, you’ll notice that the error messages are super helpful. They give you the link to the diff.pdf when the PDFs are not matching and the copy command call you should make after reviewing your first PDF generation.

I hope this code will help you save some precious time you’re currently spending looking at boring generated documents!

Cleaning some legacy code is intellectually rewarding; it can even be fun when you use your automated refactoring tools, not to mention that it greatly reduces the risk of mistakes. If you need some help to get started or to deal with your legacy codebase, let's chat and see how I can help.

Whenever you're ready, here is how I can help you: