My Most Used RegEx

It occurred to me the other day while working on a script for a customer that I use this regular expression frequently:

(?<=TOKEN1).*?(?=TOKEN2)

It is very useful when parsing information out of web pages, or when finding elements in web pages.

What it does is pull out all the text between TOKEN1 and TOKEN2. Those could be other pieces of text, or html characters, or whatever.

As an example, recently I wrote a script which loops through all rows in an HTML table, and pulls out an order number, then looks this order number up in an Excel sheet. The order number appeared in a table cell along with other information. It was the first item inside an <i> (italics) tag and was followed by a space and then a hyphen. So I used this to pull it out of the row:

RegEx>(?<=<i>).*?(?= -),this_row,0,matches,nm,0

See how it looks for everything between the '<i>' and ' -' (space then hyphen).

The next thing my code needed to do was find the ID of the single input field in the same row. This input was used to enter the order quantity, obtained from the Excel sheet. The ID is not something we know up front but it's the only input field in the row. So I did this:

RegEx>(?<=id=").*?(?="),theInput,0,matches,nm,0

In other words, pull the text between id=" and ", which gives us the input's ID value. We can then use that later to identify and fill the input field.

Regular Expressions are daunting at first. But eventually you find a small number of patterns help in many situations. This is one that I often find useful.

What's your oft-used regular expression?

Still need help? Contact Us Contact Us