0

I have an HTML file with thousands of lines, but something is repeated.

CODE=12345-ABCDE-12345-ABCDE</div>...<!--This line goes on for hundreds of characters-->

Now, The line starts with "CODE=" every time, and the length of the code is the same every time. The following 28 characters are either letters, numbers, or dashes.

cat mysite.html | grep "CODE="

But I'd like a regex to display everything on the line BEFORE</div>

Thanks!

2 Answers 2

1

You can use cut instead:

cat myfile.html | cut -c 6-28

This shows the characters 6 - 28 of each line. This makes use of the fact that the length of CODE= is known as well as the length of the code that follows.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the tip! This worked like a charm: cat mysite.html | grep "CODE=" | cut -c 6-29
@Goodies You don't need to use cat here. grep "CODE=" mysite.html is the same as cat mysite.html | grep "CODE=".
0

You can use sed also:

sed -rn 's@^(CODE=[A-Za-z0-9\-]{23})</div>.*@\1@p' file

Match any line staring with CODE= followed by 23 characters containing either letters, numbers, or dashes, followed by </div>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.