I've got a text file containing the html-source of a web page. There are lines with "data-adid="...". These lines I'd like to capture. Therefore, I use:
Id=$(grep -m 10 -A 1 "data-adid" Textfile)
to get the first ten results. The variable Id contains the following:
<arcicle class="aditem" data-adid="1234567890" <div class="aditem-image"> --
<arcicle class="aditem" data-adid="2134567890" <div class="aditem-image"> --
<arcicle class="aditem" data-adid="2134567890" <div class="aditem-image"> --
...
I would like to get the following output:
id="1234567890" id="2134567890" id="3124567890"
When using the grep command, I only managage to get the numbers, e.g.
Id2=$(echo $Id | grep -oP '(?<=data-ad=").*?(?=")')
gets 1234567890 2134567890 3124567890
When trying
Id2=$(echo $Id | grep -oP '(?<=data-ad).*?(?=")')
this will only give me id= id= id=
How could the code be change to get the desired output?
....and put more clear input that will give us better understanding of your questionthe html-source of a web pagethen use a html (ie. xml) aware tool to extract the data.xmllintorxmlstarlet