If you can write JavaScript, it's more useful to look at web pages as suggestions than as hard, read-only resources. At the end of the day, a page loaded in your browser is malleable: you have the power to change and take what you see.

A small example

The other day, I needed to tally question views on a Stack Overflow page. The numbers are in plain sight, and I could have grabbed them manually, but doing so would be tedious and error prone.

JavaScript can knock this work out easily and with prescision.

All you need to start is a way to uniquely identify the HTML elements that contain the info you need. In this situation, I usually find it helpful to use the “Inspect Element” tool in a browser’s developer tools to discover classes or IDs on the elements I want to parse.

In this case, I found a class of views on all of the elements with the view counts.

So step one is to grab all of those elements:

$ document.querySelectorAll(".views");

// NodeList [<div class="views ">, <div class="views ">, <div class="views ">, <div class="views ">, <div class="views ">, …] (15)

I know that I’ll eventually want collapse all of the captured views numbers into a single amount, a total. In JavaScript, when you need to tally values from a list into a single value, Array#reduce is your friend.

But #querySelectorAll returns a NodeList, which doesn’t have the #reduce method on its prototype.

Thanks to ES2015, it’s easy to convert a NodeList into an Array:

$ Array.from(document.querySelectorAll(".views"));

// [<div class="views ">, <div class="views ">, <div class="views ">, <div class="views ">, <div class="views ">, …] (15)

Now you just use #reduce as normal:

Array.from(document.querySelectorAll(".views"))
  .reduce((total, el) => {
    return total + Number(el.innerHTML.replace(/\sviews/, ""));
  }, 0);

Note that Number(el.innerHTML.replace(/\sviews/, "")) strips the " views" substring off the end of each element's string (e.g., "35 views" becomes "35"), and then converts the leftover string to a number (e.g., "35" becomes 35).

Time investment

This might seem like overkill, and maybe it is in some situations. But my total time to do this was maybe 3-4 minutes.

That’s perhaps slightly longer than it would have taken me to tally the count manually on a single page, but with added benefits:

  1. It’s more accurate, as long as you’re using good judgement to assess the results (i.e., did it do what you think it did?)
  2. On the other side, you now have a makeshift tool to repeat the same operation on other pages

In other words, you get some peace of mind, you can move orders of magnitude faster after the up-front scripting, and you avoid the tedium of doing things manually.

Limitations

Scraping a web page, which is essentially what this is, always comes with some tradeoffs.

One limitation in this particular case is that Stack Overflow will stylize large numbers. For example, "4,100" is displayed as "4.1k".

This wouldn't be too hard to overcome; it just depends on how deep you want to go for a one-off tool like this.


You can bend the web to suit your needs. If you can write JavaScript, you’ve already got powerful tools at your disposal to get what you need, even for small tasks.