Crack regular expressions

By Vishnu Jayan on November 10, 2016

Regular expression is very useful for validating emails,phone numbers based on countries, postal-zip codes etc, searching a string, file name and more. Even its useful,writing a new regular expression is quite difficult to freshers like me. In fact it’s very easy to understand and very easy to create a new regex as you needed. Here some tricks and tips for crack the regex, I used and suggested to my friends.

Let us look into some points. So what is a regular expression?

“Regular expression or regex is a sequence of symbols and characters expressing a string or pattern to be searched for within a longer piece of text.”

It’s the simple answer I got after googling it. It sounds very simple. If I wanted to find a string “fox” from “the quick brown fox jumps over the lazy dog”, then we can use a simple regex that matches to the word “fox”.

Then how we process the regular expressions?

There is e some piece of software called regular expression engine to process the regex. They trying to match the patterns with the given strings. They ensure the pattern is right and then matches the correct strings.
There are many regular expression engines are available. Each one is different in working and string matching patterns. Some commonly used engines are Perl,PCRE, PHP, POSTFIX etc

Then let’s look the structure of the regex. The primary attention goes to characters. Characters mean what we used to create a regex. Commonly we used ASCII, including letters, numbers and special characters. Unicode is also used to match in other languages.

Now let’s crack Regular expression..
We can search a string by direct. We can search exact string, like find option of the text editors and word processors.

Figure 1 shows the letters as regular expression
Figure 1 shows the letters as regular expression

 

Here we search “abc” in the string and result is highlighted. Also, we can provide a number or special character as search pattern.

Figure 2 shows the numbers as regular expression
Figure 2 shows the numbers as regular expression

 

Figure 3 shows the special characters as regular expression
Figure 3 shows the special characters as regular expression

 

Simple. Uh?
Next, we can look into simple deeper. How to create a simple pattern,
First, let’s find the pattern to find any digits. ‘\d’ is the keyword used to find the digits between 0 to 9. ‘\’ is used to distinguish from letter ‘d’. Similarly, we can find all non-numeric value by ‘\D’.

Figure 4 shows the '\d' and '\D'
Figure 4 shows the ‘\d’ and ‘\D’

 

Catch it? Then let’s move to next important thing. Wildcard, sound familiar in card games. Yes. This is a character we can substitute for all other character and is denoted by ‘.’. That is we can represent any digit, any letter, special character, or whitespace with a ‘.’ . As we learned early, if we can find a ‘.’ in our string, then use ‘\.’ .

Figure 5 shows the wild card
Figure 5 shows the wild card

 

Let’s move on to next section. Matching a particular list of characters
For this purpose, we use [](Brackets). The characters we need to find is enclosed in this [ ].
eg; [a,b,c] match with  a,b or c.
Similarly, we can find another interesting symbol here .^(hat). It’s used to exclude the letters inside the [ ].
eg: [^ a,b,c] means excluding a,b or c, all the remaining will match for this regex.

Figure 6 shows the [ ] in regular expression. First string is matched because it start with a, then wildcard, then not p, q or r. Second string failed the conditions.
Figure 6 shows the [ ] in a regular expression. First, a string is matched because it starts with a, then wildcard, then not p, q or r. Second string failed the conditions.

It’s easy,Isn’t? Now we can specify the range of characters instead of a set of characters in the [ ]. It’s more minify our regular expression.

Figure 7 shows the example for the range of characters. First part include the characters form a to d, then 1 to 9 and at last exclude p to t
Figure 7 shows the example for the range of characters. First part include the characters from a to d, then 1 to 9 and at last exclude p to t

 

Tip : [A-Za-Z0-9] commonly denoted as ‘\w’. This is used to check the entered string is English or not.

Let’s learn how repetition avoids in the regular expression.
For example, I can validate the zeros in 1,000,000. Then it’s very unfair to use ‘\0\0\0\0\0\0’.
Instead, we can use 0{6}. { } used to denote how many time the character or pattern repeats.
Let have a look

Figure 8 shows the above example. A digit after 6 zero is valid
Figure 8 shows the above example. A digit after 6 zero is valid

 

Here also we can specify the limits, means a minimum value and a maximum value. To obtain that we can specify the lower and upper limit of the count like{2,6}. This means minimum repetition of 2 and maximum repetition of 6

Figure 9 shows the above example with limits
Figure 9 shows the above example with limits

 

Is it ok? Then let’s move to another simple thing. Kleene star and Kleene plus.&
Don’t worry!, It’s easy. Kleene star is simply denoted by ‘*’ and Kleene plus is denoted by ‘+’.
The difference is ‘*’ is denote zero to infinite count and ‘+’ denote 1 to infinite count.Let’s have a look.

Figure 10 shows the Kleene star and Kleene plus
Figure 10 shows the Kleene star and Kleene plus

 

Now let’s learn a new thing. Optionality. It’s denoted by ‘?’. If we can match ‘?’, then use ‘\?’.

It actually adds none or preceding character.

Figure 11 shows the '?' functionality
Figure 11 shows the ‘?’ functionality

 

Next, one I starts with or end with. This is a common type operation we found in the regular expression. Start with denoted by ‘^’ and end with by ‘$’

Figure 12 shows the start and end
Figure 12 shows the start and end

 

These are the things we have to understand properly to generate a regex. Grouping of these rules is possible in regex by using ( ). Let try it yourself.
Hope you enjoy it.

Leave a Reply

SCROLL TO TOP