0

I need to search multiple strings in a html file and then exclude the searched portion of that string and save rest of the portion to a file.

My file is like

<td colspan="2" class="suite-unknown">
<td colspan="2" class="suite-fail">
<span style="margin: 2px; padding: 1px">&nbsp;</span>TCS-209
<span style="margin: 2px; padding: 1px">&nbsp;</span>[TC-001] User validates login
<td colspan="2" class="suite-unknown">
<td colspan="2" class="suite-pass">
<span style="margin: 2px; padding: 1px">&nbsp;</span>TCS-210
<span style="margin: 2px; padding: 1px">&nbsp;</span>[TC-002] user close browser

I tried many options : Failed options :

sed -n ('/<span style="margin: 2px; padding: 1px/p'|'/td colspan="2" class="suite-/p') report.html

Another one :

sed -n '/\/<span style="margin: 2px; padding: 1px\|*td colspan="2" class="suite/p' report.html 

My keywords for search are : <span style="margin: 2px; padding: 1px and td colspan="2" class="suite.

And then once its searched i need to exclude the search keywords of the string and print the rest.

Means output be like :

-unknown
-fail
TCS-209
[TC-001] User validates login
unknown
pass
TCS-210
[TC-002] user close browser

Please help

3
  • 1
    It's usually better to use HTML-aware tools to parse HTML. Commented Sep 21, 2018 at 9:35
  • which one is that ? @choroba Commented Sep 21, 2018 at 9:51
  • I often use xsh which is based on libxml, it can handle html if it's not too terrible. There are probably many more. Commented Sep 21, 2018 at 11:10

1 Answer 1

1
sed -n 's/^ *<td colspan="2" class="suite\(.*\)">/\1/p;s/^ *<span style="margin: 2px; padding: 1px.*<\/span>//p' myfile

This is not the best way to extract information from HTML, but it will do for something as simple as this.

curl -s 'https://raw.githubusercontent.com/aruiz-caritsqa/wdio-html-format-reporter/master/wdio-report.html' | sed  -n 's/^ *<td colspan="2" class="suite\(.*\)">/\1/p;s/^ *<span style="margin: 2px; padding: 1px.*<\/span>//p'

gives me

-unknown
some example tests for a readme.md demo
-pass
should be a passing test
-fail
should have a failing test
-pass
Full page screenshot
Sign up to request clarification or add additional context in comments.

6 Comments

It didnt work for me. It printed everything in that page
Then perhaps your file is not like what you've put in your question. I copied it, pasted it into a file, ran the command above and obtained the result you desired. Perhaps you can upload your real file somewhere and share it with us?
you can find the file here. Thanks for the help.
In your example, you didn't have leading spaces as in the file. I'll correct my command.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.