6 minute read

I absolutely hate when someone starts prematurely extracting stuff into micro services. Long live the great monolith architecture and all that. But there comes a point where because of technology limitations, you have to think outside of the box. Well this is one of those times, although there was nothing inherently wrong with the approach we had, we hit various limitations on AWS (mostly disk throughput, but also the kerfuffle of calling JavaScript from Ruby to open Chrome and then hit a Ruby endpoint which runs JavaScript through a Ruby wrapper again). The crazy thing was that this worked, for quite a long time, until it didn’t. I’ll leave the debugging story for another day but this was the first thing that was deemed worthy of extracting into its separate “service”.

So how did I do it? It was really easy and straightforward to be honest. Packaging up some nodejs libraries, adding specific fonts we are using and pushing everything to aws using the aws cli on my computer. Because I want to keep things secure and don’t want to expose my endpoints to the internet, I’m calling the lambda through the AWS Ruby library and using an IAM server role to pass on the credentials. Again, another story of keeping your AWS closed down as much as possible and technically inaccessible to anyone but your app.

I’m using a stripped down chromium instance (there are some package size limitations on AWS Lambda which you can avoid using S3 as an intermediary, but it’s advisable to keep the layer size as small as possible).

Assuming you have your aws-cli set up and the credentials in place, let’s start building the lambda function.

  1. Create the app folder (duh) cd projects && mkdir lambda-puppeteer && cd lambda-puppeteer
  2. Run npm init and fill in all the project details. Enter index.mjs as the main option
  3. Add and install the required packages: npm add @sparticuz/chromium puppeteer-core

Your package.json should look something like this now:

{
  "name": "lambda-puppeteer",
  "version": "1.0.0",
  "description": "AWS Lambda function to convert HTML to PDF using Puppeteer and Chromium",
  "license": "MIT",
  "author": Donald Duck,
  "main": "index.mjs",
  "scripts": {
  "test": "echo \"Error: no test specified\" && exit 1"
  },
  "dependencies": {
    "@sparticuz/chromium": "^138.0.1",
    "puppeteer-core": "^24.12.1"
  }
}

Okay, now that we have taken care of dependencies, let’s add the actual function code. Let’s say you want to send some already rendered html to the function and get a PDF out of it. The function can be modified to accept the URL and then do its best to output the URL to PDF. However this is something for another article (or just send me an email and I can help you set it up).

import chromium from "@sparticuz/chromium";
import puppeteer from "puppeteer-core";
import fs from "fs";

export const handler = async (event) => {
  let result = null;
  let browser = null;

  try {
    // Launch the browser with extra options
    browser = await puppeteer.launch({
      args: [
        ...chromium.args,
        "--disable-gpu",
        "--font-render-hinting=none",
        "--allow-file-access-from-files",
      ],
      defaultViewport: chromium.defaultViewport,
      executablePath: await chromium.executablePath(),
      headless: true,
      ignoreHTTPSErrors: true,
      devtools: false,
    });

    const page = await browser.newPage();
    await page.setContent(event.html, { waitUntil: "networkidle0" });

    // Save the PDF to a temporary file (with additional options)
    const pdfPath = "/tmp/output.pdf";
    let pdfOptions = event.options;
    pdfOptions.path = pdfPath;
    await page.pdf(pdfOptions);

    // Read the PDF file as a buffer
    const pdfBuffer = fs.readFileSync(pdfPath);
    console.log("PDF generated successfully:", pdfBuffer.slice(0, 100)); // Log the first 100 bytes for debugging

    result = pdfBuffer; // Return the raw PDF buffer

  }
  catch (error) {
      console.error("Error generating PDF:", error);
      return { statusCode: 500, body: error.toString() };
  }
  finally {
      // close the browser whatever the outcome
      if (browser !== null) {
      await browser.close();
    }
  }

  return {
    statusCode: 200,
    headers: {
      "Content-Type": "application/pdf",
    },
    body: result.toString("base64"), // Return base64-encoded PDF
    isBase64Encoded: true,
  };
};

Now that all code is in place, let’s deploy it and make sure we can call it from our application. First you have to understand how all this is structured. AWS Lambda has something that they call “layers” and you can have multiple layers attached to the same function. This way anything you need loaded in the /usr folder in your function can be there. Let’s say you need additional fonts to render the app, put them in a folder named fonts, zip it up and upload + attach as a new layer to the function that needs them. A note about fonts, aws lambda runner has the basic linux fonts installed but they won’t work for non-latin languages and emoji support is minimal. Check out the Noto font family on google and download the stuff you need.

But be careful since Lambda has a 250MB hard limit so all uncompressed layers plus the runner script can’t be more than 250MB combined. More on that topic in another post.

You can even have multiple node_modules layers running in the same function. Keeping it as simple as possible and avoiding version mismatches I opted to leverage S3 to go around the 50MB layer size limit. So first we are going to create the chrome + puppeteer layer.

Since I’m a big fan of automating all the grunt work using whatever is most convenient (and bash is pretty convenient on unix systems), I made a script to package everything up and deploy a new lambda “layer” to AWS Lambda.

# !/bin/bash

# Installs npm dependencies from the package

npm install

# Creates a nodejs directory, moves node_modules into it

rm -rf nodejs && mkdir -p nodejs && mv node_modules nodejs/

# Zip the nodejs folder

zip -r puppeteer-chrome.zip nodejs

# Upload the zip file to S3 (temporary step because the image is too big for normal upload)

aws s3 cp puppeteer-chrome.zip s3://YOUR_S3_BUCKET_NAME/puppeteerLayers/puppeteer-chrome.zip

# fetch chromium and puppeteer versions for our description

chromium_version=$(jq '.dependencies."@sparticuz/chromium"' package.json)
puppeteer_version=$(jq '.dependencies."puppeteer-core"' package.json)

description="Puppeteer-core version: ${puppeteer_version}, Chromium version: ${chromium_version}"

# publish lambda layer version

bucketName="YOUR_S3_BUCKET_NAME" &&
aws lambda publish-layer-version --layer-name puppeteer-chromium-layer --description "$description" --content "S3Bucket=${bucketName},S3Key=puppeteerLayers/puppeteer-chrome.zip" --compatible-runtimes nodejs24.x --compatible-architectures x86_64

Now that we have a puppeteer + chrome layer uploaded (hopefully you got a successful response), we can create our Lambda function using the index.mjs code like this (be sure to have a lambda execution role created in the AWS Console)

  1. zip index.mjs function.zip
  2. aws lambda create-function --function-name lambda-puppeteer --zip-file function.zip --handler index.handler --runtime nodejs24.x --role arn:aws:iam::YOUR_LAMBDA_EXECUTION_ROLE

Now we only need to attach the correct layer (we are assuming there is only one puppeteer/chrome layer at this time) to the function. aws lambda update-function-configuration --function-name lambda-puppeteer --layers arn:aws:lambda:AWS_REGION:AWS_ACCOUNT_ID:layer:puppeteer-chromium-layer:1

Our function is ready to run and you can test it from your application using the AWS SDK of your choice. Since I’m using Ruby on Rails, I’m gonna paste a quick example using the Ruby SDK.

class LambdaPdf
  attr_reader :html, :options

  def initialize(html, options = {})
    @html = html
    @options = options
  end

  def to_pdf
    response = lambda_client.invoke(
      function_name: 'lambda-puppeteer',
      invocation_type: 'RequestResponse', # Synchronous invocation
      log_type: 'Tail', # optional: to include logs in the response
      payload: { html:, options: }.to_json
    )

    # Parse the response
    response_payload = JSON.parse(response.payload.read)

    # Check if the response is base64 encoded
    if response_payload['isBase64Encoded']
      begin
        pdf_content = Base64.decode64(response_payload['body'])
      rescue ArgumentError => e
        raise 'ERROR: Decoding base64 PDF content not successful:', e.message
      end
    else
      raise 'ERROR: Response is not base64 encoded', e.message
    end

    pdf_content

  end

  private

  def lambda_client
    @lambda_client ||= Aws::Lambda::Client.new
  end
end

Comments