Let say I have this string which contains html a tag:
<a href="abgeordnete-1128-0----w8397.html" class="small_link">Berlin-Treptow-Köpenick</a>
How do I use regex in ruby to extract the text of "Berlin-Treptow-Köpenick" ?
Thanks! :)
You can use:
html = '<a href="abgeordnete-1128-0----w8397.html" class="small_link">Berlin-Treptow-Köpenick</a>'
html[/>(.*)</, 1]
#=> "Berlin-Treptow-Köpenick"
When your HTML partials are more complex then I recommend using a libraries like Nokogiri:
html = '<a href="abgeordnete-1128-0----w8397.html" class="small_link">Berlin-Treptow-Köpenick</a>'
require 'nokogiri'
Nokogiri::HTML(html).text
#=> "Berlin-Treptow-Köpenick"
I have made the assumption that the string to be extracted is comprised of alphanumeric characters--including accented letters--and hyphens, and that the string immediately follows the first instance of the character '>'.
string =
'<a href="abgeordnete-1128-0----w8397.html" class="small_link">Berlin-Treptow-Köpenick</a>'
r = /
(?<=\>) # match '>' in a positive lookbehind
[\p{Alnum}-]+ # match >= 0 alphameric character and hyphens
/x # extended or free-spacing mode
string[r] #=> "Berlin-Treptow-Köpenick"
Note that /A-Za-z0-9/ does not match accented characters such as 'ö'.
Alternatively, one can use the POSIX syntax:
r = /(?<=\>)[[[:alnum:]]-]+/
ActionController::Base.helpers.strip_tags(html)
this base helper return only text
html = "<a href=\" https://something.com/\"></a><br><strong style=\"color: red;\"><em><del>this</del></em></strong> <strong style=\"color: red;\"><em style=\"color: red;\">works</em></strong"
and this will be returned "this works"
'-'following the character '>', but the reader cannot determine if that would always be the case. Also, when you give an example, it is helpful to assign all input objects to variables (e.g., str = "<a href...") so that readers can refer to those variables in answers and comments without having to define them.