Grep and Regex an HTML File

Question

I have an HTML file with thousands of lines, but something is repeated.

CODE=12345-ABCDE-12345-ABCDE</div>...<!--This line goes on for hundreds of characters-->

Now, The line starts with "CODE=" every time, and the length of the code is the same every time. The following 28 characters are either letters, numbers, or dashes.

cat mysite.html | grep "CODE="

But I'd like a regex to display everything on the line BEFORE</div>

Thanks!

Simeon Visser · Accepted Answer · 2013-12-21 20:54:02Z

1

You can use cut instead:

cat myfile.html | cut -c 6-28

This shows the characters 6 - 28 of each line. This makes use of the fact that the length of CODE= is known as well as the length of the code that follows.

answered Dec 21, 2013 at 20:54

Simeon Visser

123k19 gold badges192 silver badges185 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Goodies Over a year ago

Thanks for the tip! This worked like a charm: cat mysite.html | grep "CODE=" | cut -c 6-29

Chris Over a year ago

@Goodies You don't need to use cat here. grep "CODE=" mysite.html is the same as cat mysite.html | grep "CODE=".

ray · Accepted Answer · 2013-12-22 02:53:53Z

0

You can use sed also:

sed -rn 's@^(CODE=[A-Za-z0-9\-]{23})</div>.*@\1@p' file

Match any line staring with CODE= followed by 23 characters containing either letters, numbers, or dashes, followed by </div>

edited Dec 22, 2013 at 2:53

answered Dec 22, 2013 at 1:48

ray

4,2951 gold badge20 silver badges12 bronze badges

Collectives™ on Stack Overflow

Grep and Regex an HTML File

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related