I am trying to get the data from URL.below is the URL Format.
What I am trying to do
1)read line by line and find if the line contains the desired keyword.
3)If yes then store the previous line's content "GETCONTENT" in a list
<http://www.example.com/XYZ/a-b-c/w#>DONTGETCONTENT
a <http://www.example.com/XYZ/mount/v1#NNNN> ,
<http://www.w3.org/2002/w#Individual> ;
<http://www.w3.org/2000/01/rdf-schema#label>
"some content , "some url content ;
<http://www.example.com/XYZ/log/v1#hasRelation>
<http://www.example.com/XYZ/data/v1#Change> ;
<http://www.example.com/XYZ/log/v1#ServicePage>
<https://dev.org.net/apis/someLabel> ;
<http://www.example.com/XYZ/log/v1#Description>
"Some API Content .
<http://www.example.com/XYZ/model/v1#GETBBBBBB>
a <http://www.w3.org/01/07/w#BBBBBB> ;
<http://www.w3.org/2000/01/schema#domain>
<http://www.example.com/XYZ/data/v1#xyz> ;
<http://www.w3.org/2000/01/schema#label1>
"some content , "some url content ;
<http://www.w3.org/2000/01/schema#range>
<http://www.w3.org/2001/XMLSchema#boolean> ;
<http://www.example.com/XYZ/log/v1#Description>
"Some description .
<http://www.example.com/XYZ/datamodel-ee/v1#GETAAAAAA>
a <http://www.w3.org/01/07/w#AAAAAA> ;
<http://www.w3.org/2000/01/schema#domain>
<http://www.example.com/XYZ/data/v1#Version> ;
<http://www.w3.org/2000/01/schema#label>
"some content ;
<http://www.w3.org/2000/01/schema#range>
<http://www.example.com/XYZ/data/v1#uuu> .
<http://www.example.com/XYZ/datamodel/v1#GETCCCCCC>
a <http://www.w3.org/01/07/w#CCCCCC ,
<http://www.w3.org/2002/07/w#Name>
<http://www.w3.org/2000/01/schema#domain>
<http://www.example.com/XYZ/data/v1#xyz> ;
<http://www.w3.org/2000/01/schema#label1>
"some content , "some url content ;
<http://www.w3.org/2000/01/schema#range>
<http://www.w3.org/2001/XMLSchema#boolean> ;
<http://www.example.com/XYZ/log/v1#Description>
"Some description .
below is the code i tried so far but it is printing all the content of the file
import re
def read_from_url():
try:
from urllib.request import urlopen
except ImportError:
from urllib2 import urlopen
url_link = "examle.com"
html = urlopen(url_link)
previous=None
for line in html:
previous=line
line = re.search(r"^(\s*a\s*)|\#GETBBBBBB|#GETAAAAAA|#GETCCCCCC\b",
line.decode('UTF-8'))
print(previous)
if __name__ == '__main__':
read_from_url()
Expected output:
GETBBBBBB , GETAAAAAA , GETCCCCCC
Thanks in advance!!