I used the curl command to download an html file from homeoint.org/books/boericmm/d.htm and saved it into a file.
The relevant part looks like this:
<p><font size="2"><a href="d/dam.htm" target="_top">DAM</a> ------>
DAMIANA (TURNERA)<br>
<a href="d/daph.htm" target="_top">DAPH</a> ------> DAPHNE INDICA<br>
<a href="d/dig.htm" target="_top">DIG</a> ------> DIGITALIS PURPUREA
(DIGITALIS)<br>
<a href="d/dios.htm" target="_top">DIOS</a> ------> DIOSCOREA VILLOSA<br>
<a href="d/diosm.htm" target="_top">DIOSM</a> ------> DIOSMA LINCARIS<br>
<a href="d/diph.htm" target="_top">DIPH</a> ------> DIPHTHERINUM<br>
<a target="_top" href="d/dol.htm">DOL</a> ------> DOLICHOS PRURIENS
(DOLICHOS PURIENS - MUCUNA)<br>
<a href="d/dor.htm" target="_top">DOR</a> ------> DORYPHORA
DECEMLINEATA (DORYPHORA)<br>
<a href="d/dros.htm" target="_top">DROS</a> ------> DROSERA
ROTUNDIFOLIA (DROSERA)<br>
<a href="d/dubo-m.htm" target="_top">DUBO-M</a> ------> DUBOISIA
MYOPOROIDES (DUBOISIA)<br>
<a href="d/dulc.htm" target="_top">DULC</a> ------> DULCAMARA<br>
</font></p>
I need to grep the value from
">" to "<br>"
I need output to be:-
DAMIANA (TURNERA)
DAPHNE INDICA
DIGITALIS PURPUREA (DIGITALIS)
DIOSCOREA VILLOSA
DIOSMA LINCARIS
DIPHTHERINUM
DOLICHOS PRURIENS (DOLICHOS PURIENS - MUCUNA)
DORYPHORA DECEMLINEATA (DORYPHORA)
DROSERA ROTUNDIFOLIA (DROSERA)
DUBOISIA MYOPOROIDES (DUBOISIA)
DULCAMARA
i am trying to use the grep command
cat d.htm | grep -o -P '(?<=> ).*(?=<br>)'
but my output is not complete.