HTTP Archive (Har) Analysis

4 min readJun 5, 2019

Part two:

HTTP Archive (Har) — Pandas Exploratory Analysis

TLDR; How to read a HAR file and make charts out of the curated data.

medium.com

On this blog, I show how to query the contents of a har output to quickly analyze all request/response headers for the session captured. You can use this as a base to query all the requests in a session for any headers or other data you might need within the HTTP Archive.

Google Chrome Dev Tools is great for inspecting Site traffic, debugging and so much more but sometimes I find my self needing more data from it.

Chrome DevTools | Tools for Web Developers | Google Developers

Get started with Google Chrome's built-in web developer tools.

developers.google.com

Let’s use Medium.com as an example:

Medium seems to be using Cloudflare as its CDN (based on the response headers) and Cloudflare uses the response header “cf-ray” to indicate the DC that the request passed through. If I wanted to have a quick way to view what DC was used for the requests and validate if one or more DC were used. I would have to look at each request in Dev Tools and look at the response headers… but who has time for that?

Thankfully Google Chrome gives us the capability to export the network session to a .har file and it has all the request/response headers we will need to get all the DC’s used by using the tool jq and awk.

We will be doing the following steps:

Export Session to har.
Parse the har file with JQ
Extract DC name with AWK

Export Session to har.

Open Dev Tools and refresh your site, once you do this you will see the network tab populated with all the requests.

Right click on any of them and select “Save all as HAR with content”, this will export the session data to a .har file to the directory of your choosing.

Parse the har file with JQ

JQ is a JSON processor that allows us to easily query for the fields we need and also condition what data we get.

First, install it if you don’t have it. For mac, you can use brew:

$ brew install jq

The following will look at all the requests with the domain “medium.com” in the downloaded har file and get the value of the response header “cf-ray”.

$cat ~/Downloads/medium.com.har | jq '.log.entries[] | select(.request.url | contains("medium.com")) | .response.headers[] | select(.name | match("cf-ray";"i")) | .value'

Let’s take it apart:

If we look at the har file created it has an array “entries” that has all the requests and it has the responses within it.

We will start by telling JQ to get all the requests/responses within entries.

jq '.log.entries[]

But only look at the ones that have site.com as part of the URL.

| select(.request.url | contains("medium.com"))

Once we have the requests we want to extract the data from we again select the response headers

| .response.headers[]

Then look for the cf-rat header (ignoring Case)

| select(.name | match("cf-ray";"i"))

And finally, extract the values.

| .value

We now have successfully extracted the header information we want, but I want to go a bit further and extract the DC name from the values that as a Hash that is part of it as seen below.

Extract DC name with AWK

$awk -F'-' '{gsub(/"/, "", $2); print $2}' | sort | uniq -c | sort -n

Let’s take it apart:

With AWK we are going to set “-” as the delimiter with the option “-F ‘/’ ” to easily split the string and use “gsub(/”/, “”, $2)” to clean the output by removing the quotes from the string from the second field that we are going to use.

Since we split the string we have 2 fields:

Hash: “4e1e3b4b1b3cc5f0
DC Name: EWR”

Finally, we have the data we want. But in my case, I sorted and counted the instances of each DC to provide additional insights. (The example would have been better if more than one DC was used :) but you get the point.)