AWS Lambda Function – Watermarking a PDF via S3 Trigger in Python

I’m also on Twitter 🙂

In the previous chapter I talked about the process of watermarking a PDF by sending it to Lambda function in a POST request.

In this lesson we’ll automatically Trigger Lambda function when a PDF is uploaded to a S3 bucket and watermark it, then we’ll upload it to a different bucket.

Previous chapter:

[Part 1] AWS Lambda Function – Watermark a PDF

Download a free copy

Overview

  1. Setup a Role with permissions to access & write to S3 Bucket and Cloudwatch Logs and get the Lambda function to use this Role. If we don’t do this the Lambda function won’t be able to access S3 bucket using boto3 library or write to Cloudwatch Logs.
  2. We’ll then create a Lambda function that can run Python 3.11.
  3. Finally we’ll set a Trigger, this will trigger our Lambda function whenever something is uploaded to the S3 bucket.

Create Role & Permissions

To put this simply we’ll create a Role that can access our S3 bucket.

Go to: IAM -> Roles -> Create Role.

“Trusted Entity Type” -> “AWS Service”

From “Common use Cases” select Lambda & go to Next page!

Now lets attach the permissions to the Role.

On “Add Permissions” page search for “AmazonS3FullAccess”, check it. Search for “CloudWatchFullAccess” and check that too.

These are the permissions our Lambda function needs to do what it needs to do!

(You can always create your own permissions by going into Policies and then attach it to the Role. But for sake of this tutorial let’s just select default “AmazonS3FullAccess” and “CloudWatchFullAccess”).

Next page!

Finally on “Name, review, and create” page give Role a name “s3lambda” and hit Create Role!

Roles and permissions are set! this means our Lambda function can now access S3 bucket!

By the way, you can always check what your role is allowed to do on Policy Simulator page

Create Lambda Function

We’ll now create a lambda function called “watermarkPdf” and tie the “s3lambda” Role to it!

If you’re curious about the lambda function, its here:

import json
from io import BytesIO
from pypdf import PdfReader
from pypdf import PdfWriter
import boto3;
import io
import logging

def lambda_handler(event, context):
    s3Client = boto3.client('s3')

    # Watermark image pdf
    watermarkImage = s3Client.get_object(Bucket='supremecodr-pdfs', Key='watermark.pdf')
    watermarkPdfImage = watermarkImage['Body'].read()
    pdfToWatermark = event['Records'][0]['s3']['object']['key'] # Name of the PDF user uploaded and should be watermarked
    
    logging.getLogger().setLevel(logging.INFO);
    
    
    if pdfToWatermark != 'watermark.pdf': # Ignore watermark image
        origPdfObj = s3Client.get_object(Bucket='supremecodr-pdfs', Key=pdfToWatermark)
        origPdfPdfReader = PdfReader(io.BytesIO(origPdfObj['Body'].read())).pages
        newPdfWriter = PdfWriter()
    
        for page in origPdfPdfReader:
            basePdfPdfReader = PdfReader(io.BytesIO(watermarkPdfImage)).pages[0]
            basePdfPdfReader.merge_page(page)
            newPdfWriter.add_page(basePdfPdfReader)
    
        with BytesIO() as bytes:
            newPdfWriter.write(bytes)
            bytes.seek(0)
            s3Client.upload_fileobj(bytes, "supremecodr-watermarked-pdfs", pdfToWatermark)
    else:
        logging.info("Ignoring the watermark image pdf: " + pdfToWatermark);

Go to Lambda -> Create a Function, and give it a name “watermarkPdf”.

Select runtime Python 3.11

Under “Permissions” check “Use an existing role”, and select the role we just created! we named it “s3lambda”. Hit Click Create Function.

Dependency!

Our code has one dependency that Python 3.11 runtime isn’t able to provide by default! that is Pypdf.

You can download dependencies and the Lambda code here here.

(You can always install your dependencies in the same folder, archive it and upload)

Upload the archive to Lambda.

The uploaded code and dependencies should look like this

The lambda function has now been uploaded! Permissions set! Lets now add a Trigger, this will trigger the function whenever something is uploaded to a s3 bucket.

S3 Buckets

Go ahead and create two separate S3 buckets: “supremecodr-pdfs” and “supremecodr-watermarked-pdfs”. Leave all settings on default.

Trigger Lambda Function

We’ll be uploading PDFs to “supremecodr-pdfs” so this is the bucket we want to trigger the Lambda function. The Trigger will invoke the function whenever something is uploaded to the bucket!

This means if the function watermarks a PDF and reuploads it to same bucket, the Trigger will invoke the function again, this will go on forever!

For this reason we’ll keep the final PDF in the other bucket called “supremecodr-watermarked-pdfs”, which has no triggers attached!

Let’s now set a Trigger on “supremecodr-pdfs”.

On Lambda function page, expand the “Function Overview” if it’s hidden and click “Add Trigger”.

On next page select source as S3.

Then Select the bucket we should listen to, as in which bucket do you have to drop the PDF into to trigger Lambda function. It’s “supremecodr-pdfs”, so select that.

Select “All object create events” event in “Event Types” and Save it!

Trigger is now set!

Upload PDFs

There’s one last thing left, we need to upload the watermark image “watermark.pdf” to “supremecodr-pdfs” so our lambda function can use it to watermark other PDFs! If you look at the Python code above you’ll see that!

Upload “watermark.pdf” to “supremecodr-pdfs”. This will obviously trigger the lambda function, but you’ll see in above code that I’ve added a condition so it doesn’t try to watermark that PDF!

Testing

Lastly, upload an example PDF to “supremecodr-pdfs” and it should create a new watermarked PDF and automatically upload it to our “supremecodr-watermarked-pdfs” bucket under same name!

Final result:

That’s all for this tutorial!

Processing a 10 MB 200+ pages PDF

I set the following configs: Timeout 10 mins, Memory 1GB and disk 512MB.

Result:

Leave a comment

Your email address will not be published. Required fields are marked *