An Introduction into Regular Expressions

Teamer Tibebu
6 min readNov 9, 2020

According to Wikipedia, Regular Expressions (henceforth referred to as RegExp) “is a sequence of characters that define a search pattern. Usually such patterns are used by string-searching algorithms for ‘find’ or ‘find and replace’ operations on strings, or for input validation”. For most programmers though, RegExp look like a lot of gobbledygook which is defined as “a language that is meaningless or is made unintelligible by excessive use of abstruse technical terms; nonsense. But fear not, in the scope of this article we will take a somewhat shallow, but informative, dive into the basics of RegExp that will help turn that “gobbledygook” into a helpful tool on your programmers tool belt.

RegExp form a small, seperate language that is part of JS and many other languages and systems. It is a string of text that allows you to create patterns that help match, locate, and manage text. RegExp can either be used from the command line or directly in a text editor, like VSCode.

In Javascript, a RegExp are a type of object of the type RegExp. Thus, they have an object prototype and associated methods and properties. This also means you may encounter them as either a constructor or a literal, as such:

  1. new RegExp(‘abc’)
  2. /abc/

Some common use cases are validating user input, renaming files, finings links/URLs on a page or email. In JS, some common string methods that make use of RegExp are match, replace, and split. Ex: str.match(RegExp) which returns an array of matches or null if none are found. str.replace(Regexp, relacerValue) which returns a new string with the value in the 2nd parameter having replaced whatever was matched by the RegExp.

RegExp Cheat Sheet

Here’s an indispensable tool that will help us get a foundational understanding of RegExp. Use this as a reference as we take our journey:

What Do Regular Expressions Look Like?

Before we dive into the details, let’s take a quick look at an example: /Medi[a-xA-Z]*/

This small snippet of text describes a pattern of strings that begin with the substring of “Medi”. This RegExp would match “Medium”, “Media”, “Medical”, and “Medi”. See the pattern right! Note here that RegExp are case sensitive and “medium” would not match, but there is a way to get around that to make your matches a bit more dynamic. We’ll go over that in a bit.

Options/Flags

By default, comparing your input string with any literal string characters using RegExp patterns are case sensitive and white spaces are interpreted as literal white-space characters. As I mentioned before, there are options/flags to modify these and many other aspects of the default behavior of RegExp. These options/flags can be used inline as part of your RegExp pattern.

Although there are more, we will look at two of the most used options/flags in the scope of this article, namely:

  1. i — this flag makes your RegExp case insensitive. If we look back to our previous example, /Medi[a-zA-Z]*/i would match all the strings we pointed out prior, “Medium”, “Media”, “Medical”, and “Medi”, but would also match all of their lowercase versions, “medium”, “media”, “medical”, and “medi”
  2. g — this flag will match ALL occurrences of the specified RegExp pattern. This is important to note, without the use of the “g” flag, only the first occurrence of the pattern will be matched.

So for example if we have a string of “I am writing this article on Medium, which is not a medical device but rather a medium for media.” (disregard the nonsensical nature of the sentence haha), a RegExp pattern of “/Medi[a-zA-Z]*/ig” would match Medium, medical, medium, and media.

Brackets

The use of brackets in a RegExp pattern indicates a set of characters to match and any single character between the brackets will match successfully.

Carrots

Used side by side with this, a carrot (^) can be used to negate what is between the brackets.

Example Use of Carrot (^)

Hyphens

Making use of a hyphen ( — ) in between square brackets, you can connect any two characters to create a range or a set of characters to be matched. Ex: [a-z] or [0–9]. Note that the hyphen must connect characters with their own character class, meaning letters with letters and numbers with numbers. [a-9] would throw a syntax error.

Example Use of Hyphen ( — )

Shortcuts for Common Ranges

There are some helpful shortcuts for creating some of the most common ranges:

  • \d matches any digit — equivalent to [0–9]
  • \D matches anything except digits — equivalent to [^0–9]
  • \w matches word characters (in JS that equates to any letter, number, and underscores — equivilant to [a-zA-Z0–9_]
  • \W matches anything except word characters — equivalent to [^a-zA-Z0–9_]

Curly Braces { }

Curly braces are used within a RegExp to specify an exact amount of things to match, supplied after an expression:

If we want to make the above example more dynamic, we could use a comma to specify more than one amount, rather than specifically specifying 2. (i.e. {2,} specifies two or more times…{2,4} specifies two to four times)

Pipes |

Similar to how we use pipes in JavaScript, they represent “or” in RegExp. So a string of “kiwi, strawberry, banana, mango”.match(/strawberry|mango/) would match “strawberry”.

Question Mark ?

This metacharacter tells our pattern that something can occur either 0 or 1 times, essentially making it an optional criteria.

Asterisk *

Similar to the use of the question mark, the asterisk is used to indicate that a pattern can occur 0 or more times. This is equivalent to {0,}.

Plus Sign +

Similar to the two above, the plus sign is used to indicate that a pattern can occur 1 or more times. This is equivalent to {1,}.

Conclusion

Although this is only an introductory dive into regular expressions, I hope this article has helped you in understanding RegExp at least a little more. Although there is a lot more to learn, this should serve as a starting point. Be sure to use the cheat sheet given at the beginning as you experiment further with RegExp.

--

--