0

I have a text file full of URLs and text i want to extract URLs that start with

thumbnailUrl\": \

I used this code

def get_net_target(page):
    start_link=page.find("thumbnailUrl")
    start_quote=page.find('"',start_link)
    end_quote=page.find('"',start_quote+1)
    url=page[start_quote+1:end_quote]
    print url

my_file = open("data.txt")
page = my_file.read()

print(get_net_target(page))

I want output like this

https://tse3.mm.bing.net///th?id=OIP.Mcbb568859281f5bc7a7f64d8c58d4895H1&pid=Api\
https:\\/\\/tse1.mm.bing.net\\/th?id=OIP.M7ff1f4e880bac2c244c0b6a286cee669o2&pid=Api\

....

but I get only:

None

Few lines of data are...

webSearchUrl\": \"https:\\/\\/www.bing.com\\/cr?IG=4588890DDF1744A79DAEC3DB4C5C87D0&CID=3C16AFB87BB96F70283EA5B77A886E24&rd=1&h=RUc0BARkL2P78A5CI7XPWqhCYAA2XaQLP-fHGdfODEY&v=1&r=https%3a%2f%2fwww.bing.com%2fimages%2fsearch%3fview%3ddetailv2%26FORM%3dOIIRPO%26q%3dshoaibmalik%26id%3d97C5A1ECB43BCDC1B5739F49555CE0C75CEDF83F%26simid%3d607996336242885612&p=DevEx,5006.1\", \"thumbnailUrl\": \"https:\\/\\/tse2.mm.bing.net\\/th?id=OIP.Me19820ab68b4bcc7ec82756b2b5ecffbo1&pid=Api\", \"datePublished\": \"2011-07-08T12:00:00\", \"contentUrl\": \"http:\\/\\/www.bing.com\\/cr?IG=4588890DDF1744A79DAEC3DB4C5C87D0&CID=3C16AFB87BB96F70283EA5B77A886E24&rd=1&h=gA9S9qCIF1jvD5yA4V9VOqfrJUxdW2_wyacSDR15Yc8&v=1&r=http%3a%2f%2fwww.forumpakistan.com%2fimages%2fcelebrity-profiles%2fShoaib-Malik-1.jpg&p=DevEx,5008.1\", \"hostPageUrl\": \"http:\\/\\/www.bing.com\\/cr?IG=4588890DDF1744A79DAEC3DB4C5C87D0&CID=3C16AFB87BB96F70283EA5B77A886E24&rd=1&h=IODAmtxi3pYzDGhiJcJgCv0fWHEq8hlJauGxRW5o2c4&v=1&r=http%3a%2f%2fok-khan.blogspot.com%2f2011%2f07%2fshoaib-malik.html&p=DevEx,5007.1\", \"contentSize\": \"48445 B\", \"encodingFormat\": \"jpeg\", \"hostPageDisplayUrl\": \"ok-khan.blogspot.com\\/2011\\/07\\/shoaib-malik.html\", \"width\": 500, \"height\": 647, \"thumbnail\": {\"width\": 231, \"height\": 300}, \"imageInsightsToken\": \"ccid_4Zggq2i0*mid_97C5A1ECB43BCDC1B5739F49555CE0C75CEDF83F*simid_607996336242885612\", \"imageId\": \"97C5A1ECB43BCDC1B5739F49555CE0C75CEDF83F\", \"accentColor\": \"3A6491\"}, {\"name\": \"Pakistani Crickert Player: Shoaib Malik\", \"webSearchUrl\": \"https:\\/\\/www.bing.com\\/cr?IG=4588890DDF1744A79DAEC3DB4C5C87D0&CID=3C16AFB87BB96F70283EA5B77A886E24&rd=1&h=4qc04BUbtNDwiCHco5m3IY_YFqKVaY2q8ZWhX-DvFQs&v=1&r=https%3a%2f%2fwww.bing.com%2fimages%2fsearch%3fview%3ddetailv2%26FORM%3dOIIRPO%26q%3dshoaibmalik%26id%3dF690295FD18526BA8225367169A0664405923A09%26simid%3d608039315980946676&p=DevEx,5012.1\", \"thumbnailUrl\": \"https:\\/\\/tse3.mm.bing.net\\/th?id=OIP.Mcbb568859281f5bc7a7f64d8c58d4895H1&pid=Api\", \"datePublished\": \"2012-12-24T12:00:00\", \"contentUrl\": \"http:\\/\\/www.bing.com\\/cr?IG=4588890DDF1744A79DAEC3DB4C5C87D0&CID=3C16AFB87BB96F70283EA5B77A886E24&rd=1&h=9psh5pXKn2R_2Zn4-iMzpjDFePVuLSNVJhbVjf2uTI0&v=1&r=http%3a%2f%2fi1.tribune.com.pk%2fwp-content%2fuploads%2f2010%2f10%2fshoaib-malik-640x480.jpg&p=DevEx,5014.1\", \"hostPageUrl\": \"http:\\/\\/www.bing.com\\/cr?IG=4588890DDF1744A79DAEC3DB4C5C87D0&CID=3C16AFB87BB96F70283EA5B77A886E24&rd=1&h=-cUvEUoDmZ1OAI-PVQc4MOfS-ELdt5Im521SJ2ZP4j8&v=1&r=http%3a%2f%2fpakistanicricketplayr44410.blogspot.com%2f2012%2f12%2fshoaib-malik.html&p=DevEx,5013.1\", \"contentSize\": \"51986 B\", \"encodingFormat\": \"jpeg\", \"hostPageDisplayUrl\": \"pakistanicricketplayr44410.blogspot.com\\/2012\\/12\\/shoaib-malik.html\", \"width\": 640, \"height\": 480, \"thumbnail\": {\"width\": 300, \"height\": 225}, \"imageInsightsToken\": \"ccid_y7VohZKB*mid_F690295FD18526BA8225367169A0664405923A09*simid_608039315980946676\", \"imageId\": \"F690295FD18526BA8225367169A0664405923A09\", \"accentColor\": \"98AE1D\"}, {\"name\": \"Pakistani Cricket Players: Shoaib Malik\", \"webSearchUrl\": \"https:\\/\\/www.bing.com\\/cr?IG=4588890DDF1744A79DAEC3DB4C5C87D0&CID=3C16AFB87BB96F70283EA5B77A886E24&rd=1&h=n2Lkz5bg7h-AgbmZE4SnL-_AFBcCgc-_vaiVeAuC84s&v=1&r=https%3a%2f%2fwww.bing.com%2fimages%2fsearch%3fview%3ddetailv2%26FORM%3dOIIRPO%26q%3dshoaibmalik%26id%3d320A83F8A63DED3BD4B4EF926CAA3BE901F9DEA2%26simid%3d608028569977424814&p=DevEx,5018.1\", \"thumbnailUrl\": \"https:\\/\\/tse3.mm.bing.net\\/th?id=OIP.Mb6ca65eda578c80e71f4c3b3193c5b67H1&pid=Api\", \"datePublished\": \"2011-04-17T12:00:00\", \"contentUrl\": \"http:\\/\\/www.bing.com\\/cr?IG=4588890DDF1744A79DAEC3DB4C5C87D0&CID=3C16AFB87BB96F70283EA5B77A886E24&rd=1&h=TwpcQHy-RdAJUStMisg6zBtjt_j60EStRFRAJS1D69Q&v=1&r=http%3a%2f%2fimages.teamtalk.com%2f08%2f10%2f800x600%2fShoaib-Malik_1264846.jpg&p=DevEx,5020.1\", \"hostPageUrl\": \"http:\\/\\/www.bing.com\\/cr?IG=4588890DDF1744A79DAEC3DB4C5C87D0&CID=3C16AFB87BB96F70283EA5B77A886E24&rd=1&h=xICbhyFdmUBblBavcA3pXPdpbOa-1bJuBvP5H6Z0kms&v=1&r=http%3a%2f%2fcricketplayerspk.blogspot.com%2f2011%2f04%2fshoaib-malik.html&p=DevEx,5019.1\", \"contentSize\": \"51243 B\", \"encodingFormat\": \"jpeg\", \"hostPageDisplayUrl\": \"cricketplayerspk.blogspot.com\\/2011\\/04\\/shoaib-malik.html\", \"width\": 800, \"height\": 600, \"thumbnail\": {\"width\": 300, \"height\": 225}, \"imageInsightsToken\": \"ccid_tspl7aV4*mid_320A83F8A63DED3BD4B4EF926CAA3BE901F9DEA2*simid_608028569977424814\", \"imageId\": \"320A83F8A63DED3BD4B4EF926CAA3BE901F9DEA2\", \"accentColor\": \"416838\"}, {\"name\": \"Shoaib Malik in line for Test comeback after 5 years - Sports\", \"webSearchUrl\": \"https:\\/\\/www.bing.com\\/cr?IG=4588890DDF1744A79DAEC3DB4C5C87D0&CID=3C16AFB87BB96F70283EA5B77A886E24&rd=1&h=7CIa0gvwncEquihLMmMIvtYAAUYZutf8EQr57d8EDO0&v=1&r=https%3a%2f%2fwww.bing.com%2fimages%2fsearch%3fview%3ddetailv2%26FORM%3dOIIRPO%26q%3dshoaibmalik%26id%3d8045A5C7203C2203C8238D9E00905FCB328BD4D9%26simid%3d608033376034882300&p=DevEx,5024.1\", \"thumbnailUrl\": \"https:\\/\\/tse2.mm.bing.net\\/th?id=OIP.M65fe5bf16283dc466e93650fbaef1205o1&pid=Api\", \"datePublished\": \"2015-10-06T04:07:00\", \"contentUrl\": \"http:\\/\\/www.bing.com\\/cr?IG=4588890DDF1744A79DAEC3DB4C5C87D0&CID=3C16AFB87BB96F70283EA5B77A886E24&rd=1&h=F2RLPPSfrErnxq7OZt_3mbKbvpJITet7f_kGd90aKlg&v=1&r=http%3a%2f%2fimages.mid-day.com%2fimages%2f2015%2foct%2f6Shoaib-Malik-1.jpg&p=DevEx,5026.1\", \"hostPageUrl\": \"http:\\/\\/www.bing.com\\/cr?IG=4588890DDF1744A79DAEC3DB4C5C87D0&CID=3C16AFB87BB96F70283EA5B77A886E24&rd=1&h=3V02TER99J6fm2eshh_cv4NCdJELV1DpI1pOmALtDMQ&v=1&r=http%3a%2f%2fwww.mid-day.com%2farticles%2fshoaib-malik-in-line-for-test-comeback-after-5-years%2f16586181&p=DevEx,5025.1\", \"contentSize\": \"119997 B\", \"encodingFormat\": \"jpeg\", \"hostPageDisplayUrl\": \"www.mid-day.com\\/articles\\/shoaib-malik-in-line-for-test-comeback...\", \"width\": 670, \"height\": 746, \"thumbnail\": {\"width\": 269, \"height\": 300}, \"imageInsightsToken\": \"ccid_Zf5b8WKD*mid_8045A5C7203C2203C8238D9E00905FCB328BD4D9*simid_608033376034882300\", \"imageId\": \"8045A5C7203C2203C8238D9E00905FCB328BD4D9\", \"accentColor\": \"304987\"}, {\"name\": \"Gallery > Cricketers > Shoaib Malik > Shoaib Malik high quality! Free ...\", \"webSearchUrl\": \"https:\\/\\/www.bing.com\\/cr?IG=4588890DDF1744A79DAEC3DB4C5C87D0&CID=3C16AFB87BB96F70283EA5B77A886E24&rd=1&h=A9FD1ucKtYszoNQZ2KEhYMvgMwvJ6AA5d-DFInyr9I4&v=1&r=https%3a%2f%2fwww.bing.com%2fimages%2fsearch%3fview%3ddetailv2%26FORM%3dOIIRPO%26q%3dshoaibmalik%26id%3dB7AD00B57D67FD1664C7BBA404FF6E2679019517%26simid%3d608007657767896024&p=DevEx,5030.1\", \"thumbnailUrl\": \"https:\\/\\/tse3.mm.bing.net\\/th?id=OIP.M5d9fb4d528228cb5c8b9748bff10365bo1&pid=Api\", \"datePublished\": \"2013-05-18T00:44:00\", \"contentUrl\": \"http:\\/\\/www.bing.com\\/cr?IG=4588890DDF1744A79DAEC3DB4C5C87D0&CID=3C16AFB87BB96F70283EA5B77A886E24&rd=1&h=7jwPNSK-kjHNAXQmqBqznMWCB3u4YPz0uHDFoJizw1U&v=1&r=http%3a%2f%2fpak101.com%2fgallery%2fCricketers%2fShoaib_Malik%2f2011%2f9%2f22%2fShoaib_Malik_Picture_9_xmnqf.jpg&p=DevEx,5032.1\", \"hostPageUrl\": \"http:\\/\\/www.bing.com\
4
  • please reformat your code and check your indentation Commented Jan 19, 2017 at 19:33
  • Please supply a few lines of your data file that fail, so we can reproduce the problem. Commented Jan 19, 2017 at 19:41
  • As long we do not know how the input looks like nobody can verify your code. Please paste some example lines of your data.txt Commented Jan 19, 2017 at 19:45
  • please check my code and data now and give suitable solution Commented Jan 20, 2017 at 10:00

1 Answer 1

1

This code demonstrates two approaches. The first parallels your and the second shows an easier way involving the use of regular expressions.

It's worth learning the first way but the trick is to keep your place in the string that you're parsing.

data = '''webSearchUrl\": \"https:\\/\\/w ... p:\\/\\/www.bing.com"'''
data = data.replace ('\/', '/')

print ('Using roughly your approach ...')

start = 0
while True:
    p = data[start:].find('thumbnailUrl')
    if p == -1: break
    q = data[start+p+12:].find('http')
    r = data[start+p+q+12:].find('"')
    print (data[start+p+q+12:start+p+q+r+12])
    start = start+p+q+r+12

print ('Using a regular expression ...')

from re import compile

thumbNailRE = compile(r'thumbnailUrl":\s+"([^"]+)')
for match in thumbNailRE.findall(data):
    print (match)

Outputs are identical:

Using roughly your approach ...
https://tse2.mm.bing.net/th?id=OIP.Me19820ab68b4bcc7ec82756b2b5ecffbo1&pid=Api
https://tse3.mm.bing.net/th?id=OIP.Mcbb568859281f5bc7a7f64d8c58d4895H1&pid=Api
https://tse3.mm.bing.net/th?id=OIP.Mb6ca65eda578c80e71f4c3b3193c5b67H1&pid=Api
https://tse2.mm.bing.net/th?id=OIP.M65fe5bf16283dc466e93650fbaef1205o1&pid=Api
https://tse3.mm.bing.net/th?id=OIP.M5d9fb4d528228cb5c8b9748bff10365bo1&pid=Api
Using a regular expression ...
https://tse2.mm.bing.net/th?id=OIP.Me19820ab68b4bcc7ec82756b2b5ecffbo1&pid=Api
https://tse3.mm.bing.net/th?id=OIP.Mcbb568859281f5bc7a7f64d8c58d4895H1&pid=Api
https://tse3.mm.bing.net/th?id=OIP.Mb6ca65eda578c80e71f4c3b3193c5b67H1&pid=Api
https://tse2.mm.bing.net/th?id=OIP.M65fe5bf16283dc466e93650fbaef1205o1&pid=Api
https://tse3.mm.bing.net/th?id=OIP.M5d9fb4d528228cb5c8b9748bff10365bo1&pid=Api
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.