Creating HAR files with Lighthouse

HAR (HTTP Archive) is a JSON file containing all information about a browser’s interactions with a page. This file is often used for performance analysis. Earlier this year, I shared what kind of information we can get from and today we will automate the HAR creation.

HAR Viewer

There are different ways to automate the HAR creation: puppeteer-har is a NPM package you can add in your tooling or if you are not from the JavaScript world you can use Selenium ↗︎.

I was using puppeteer-har for a few months but then I noticed that the HAR was missing a few files in specific scenarios (ex. a React app with Loadable and React Router). For this reason, I decided to look for analternative and this is how I found the chrome-har-capturer package.

This package works like a charm; it creates a HAR file following the HAR 1.2 spec and all that I need to provide is an array of raw events that comes from the Chrome Debugging Protocol ↗︎. Who provides the raw events? Lighthouse!

Let's take a look at the implementation (from my lighthouse-examples GitHub repository):

const lighthouse = require('lighthouse');
const chromeLauncher = require('chrome-launcher');
const { fromLog } = require('chrome-har-capturer');
const { writeFileSync } = require('fs');
(async () => {
  const chrome = await chromeLauncher.launch({chromeFlags: ['--headless']});
  const options = {port: chrome.port, emulatedFormFactor: 'desktop'};
  const url = ''
  const { artifacts: { devtoolsLogs: { defaultPass } } } = await lighthouse(url, options);
  const har = await fromLog(url, defaultPass);
  writeFileSync('page.har', JSON.stringify(har));
  await chrome.kill();

In my other posts, I shared how to use the lighthouse() function to get all kinds of information: from web vitals metrics to page screenshots. What I didn't mention was the function also keeps the artifacts created by the DevTools protocols. This is what we are storing in line 10 and this is the array of raw events that chrome-har-capturer needs to generate a HAR file.

In line 12, we use the fromLog function to build the HAR object, which we store in the file system in the following line. If you are curious about how the fromLog function works, I would recommend reading the package source-code, in special one of their tests.

Next, the generated HAR is stored in page.har. and we can use it in the HAR Viewer for performance analysis.

Why do we need this?

We can extract a lot of valuable information from HAR files, such as:

  • Protocols being used in the page (http 1.1, http 2, h3-29);
  • Compressed/uncompressed asset sizes;
  • Request timing information (ex.: waiting and downloading times);

With this information, we can identify bottlenecks (ex.what is the slowest request of that URL), find low-hanging fruit (ex.asset compression is one flag away in your build system tool) and prioritize tasks in order to improve performance on our pages.

Can you think of different uses? Let me know in the comments!



Like this content? Buy me a coffeeor share around:

0 Like

0 Reply & Share