Python: extract text from string

Question

I try to extract text from url request, but not all dict contain key with text, and when I try to use {k: v[0] for k, v in parse_qs(str).items()} to urls, I lose a lot of requests, so I try str = urllib.unquote(u[0]). After that I get strings like

смотреть лучше не бывает&clid=1955453&win=176
Jade+Jantzen&ie=utf-8&oe=utf-8&gws_rd=cr&ei=FQB0V9WbIoahsAH5zZGACg
как+скрыть+лопоухость&newwindow=1&biw=1366&bih=657&source=lnms&sa=X&sqi=2&pjf=1&ved=0ahUKEwju5cPJy83NAhUPKywKHVHXBesQ_AUICygA&dpr=1
смотреть лучше не бывает&clid=1955453&win=176
2&clid=1976874&win=85&msid=1467228292.64946.22901.24595&text=как выбрать смартфон
маскаи гейла&lr=10750&clid=1985551-210&win=213

And I want to get

смотреть лучше не бывает
Jade Jantzen
как скрыть лопоухость
смотреть лучше не бывает
как выбрать смартфон
маскаи гейла

Is any way to extract that?

ElmoVanKielmo · Accepted Answer · 2016-09-26 10:29:40Z

1

Just split by & and take the first part:

txt = urllib.unquote(u[0]).split("&")[0]

And don't use str as a variable name - it's a built-in type name in Python.

EDIT: Unfortunatelly this 2&clid=1976874&win=85&msid=1467228292.64946.22901.24595&text=как выбрать смартфон line has a different pattern than the others. There's no common way to handle this one together with the others. I was tempted to use regex to match Cyrillic characters but Jade Jantzen wouldn't match. So for this one line, where the desired text is at the end, something like

txt = urllib.unquote(u[0]).split("=")[-1]

would work. Still you didn't provide any actual criteria for desired text. As humans we can say how to transform what you get into what you want from this specific sample. But without clear rules of what to match, we can't provide a complete solution.

I'm aware that some (again some) of the lines have "+" in place of " ". This can possibly be solved with .replace("+", " ").

edited Sep 26, 2016 at 10:29

answered Sep 26, 2016 at 10:09

ElmoVanKielmo

11.4k2 gold badges35 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Petr Petrov Over a year ago

and can you say, if string looks like 213&msid=1466344978.51184.22872.22654&text=дэрил диксон

ElmoVanKielmo Over a year ago

I overlooked this one line. There will be no generic way to handle this one together with the others. For this one, the split should happen on = and the last part should be taken.

Collectives™ on Stack Overflow

Python: extract text from string

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related