/ Regex

Illuminating acronyms by searching webpages with regex

Like most of us, I come across a great many acronyms as part of my work, in contexts ranging from programming to business and beyond. I find new acronyms all the time that I don't understand at first glance. In some cases, the same acronym will have a different meaning depending on the context.

Often I either end up visually scanning the webpage I'm on for the meaning of the acronym, or using web search. I might have to click and visually scan a webpage or two to find the meaning behind the cluster of letters in question.

Instead of all that visual scanning, I've started searching the text in webpages using regex[1].

Trying it out

Take "NASA", for example. Even if you don't know what it stands for, you know that you're probably looking for:

  1. A word that starts with "N"
  2. And is followed by other letters
  3. Which are followed by a space

Which gets you to a regular expression like this:

\bN\w+\s

... where:

  • \b is a word boundary
  • \w+ is one or more word characters
  • \s is a whitespace character

Then you repeat that for the rest of the letters in the acronym.

Now let's say the whole incantation in the holy tongue of regular expressions:

\bN\w+\sA\w+\sS\w+\sA\w+

That's quite a regular expression we've cobbled together.

You don't need a perfect match

That regex might seem long, but chances are you'll find your match before you get anywhere near typing the whole expression.

Which is good, because the regular expression above won't actually match what NASA stands for, which is "National Aeronautics and Space Administration". We didn't account for the "and", because we didn't know it would be there.

This is fine. In this kind of situation, we don't need a perfect match; we're just trying to find information hiding within a document using a method more reliable than scrolling-and-eyeballing. It's good to play around with the regex to see how you can zero in on the info you seek.

To use the NASA Wikipedia article as a playground, while the regex above won't match NASA, it doesn't matter, because while I'm typing out the regex, the regex search tool is highlighting words for me.

By the time I get to this (i.e., just adding an "A" to our first regex):

\bN\w+\sA

... the regex search tool has already narrowed down to 36 possible matches from a 12,000+ word article:

nasa-wikipedia

That's few enough options that I wouldn't mind clicking through to see if what I'm looking for is there.


Obviously, both the NASA acronym and its wiki page are contrived examples, since you might already know what it stands for, and even if you didn't, your search engine will tell you without your having to click on anything:

nasa-duck-duck-go

But using regex to search within web pages will come in handy at some point if you keep it in mind; understanding acronyms is one place where regex can help.


  1. You'll need a regex find extension in your browser. I'm using "Chrome Regex Search". ↩︎