Extract text between two tags using regex in Ruby

Question

Let say I have this string which contains html a tag:

<a href="abgeordnete-1128-0----w8397.html" class="small_link">Berlin-Treptow-Köpenick</a>

How do I use regex in ruby to extract the text of "Berlin-Treptow-Köpenick" ?

Thanks! :)

Possible duplicate of RegEx match open tags except XHTML self-contained tags — Holger Just
– Holger Just, Commented Nov 29, 2015 at 21:19
You should specify the extraction rule. For example, it appears from the example that it is the text comprised of alphanumeric characters and '-' following the character '>', but the reader cannot determine if that would always be the case. Also, when you give an example, it is helpful to assign all input objects to variables (e.g., str = "<a href...") so that readers can refer to those variables in answers and comments without having to define them. — Cary Swoveland
– Cary Swoveland, Commented Nov 29, 2015 at 22:04
I know this question is pretty old, but I think its still worth noting: Your title clearly states that you want to extract text from between 2 tags, but the question does not. Furthermore, you don't specify what those tags are. — user16452228
– user16452228, Commented Jan 5, 2022 at 17:14

spickermann · Accepted Answer · 2022-01-04 10:56:46Z

4

You can use:

html = '<a href="abgeordnete-1128-0----w8397.html" class="small_link">Berlin-Treptow-Köpenick</a>'

html[/>(.*)</, 1]
#=> "Berlin-Treptow-Köpenick"

When your HTML partials are more complex then I recommend using a libraries like Nokogiri:

html = '<a href="abgeordnete-1128-0----w8397.html" class="small_link">Berlin-Treptow-Köpenick</a>'

require 'nokogiri'

Nokogiri::HTML(html).text
#=> "Berlin-Treptow-Köpenick"

edited Jan 4, 2022 at 10:56

answered Nov 29, 2015 at 21:21

spickermann

108k9 gold badges115 silver badges147 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Andrew Zelenets Over a year ago

This is awesome, but looks like a magic :) Could you please provide the docs or some description how it works?

Cary Swoveland · Accepted Answer · 2015-11-30 06:47:36Z

2

I have made the assumption that the string to be extracted is comprised of alphanumeric characters--including accented letters--and hyphens, and that the string immediately follows the first instance of the character '>'.

string =
'<a href="abgeordnete-1128-0----w8397.html" class="small_link">Berlin-Treptow-Köpenick</a>'

r = /
    (?<=\>)       # match '>' in a positive lookbehind
    [\p{Alnum}-]+ # match >= 0 alphameric character and hyphens
    /x            # extended or free-spacing mode

string[r] #=> "Berlin-Treptow-Köpenick"

Note that /A-Za-z0-9/ does not match accented characters such as 'ö'.

Alternatively, one can use the POSIX syntax:

r = /(?<=\>)[[[:alnum:]]-]+/

edited Nov 30, 2015 at 6:47

answered Nov 29, 2015 at 21:42

Cary Swoveland

111k6 gold badges69 silver badges105 bronze badges

Comments

user2931706 · Accepted Answer · 2015-11-29 21:23:27Z

1

string = '<a href="abgeordnete-1128-0----w8397.html" class="small_link">Berlin-Treptow-Köpenick</a>'

string.scan(/<[a][^>]*>(.+?)<\/[a]>/).flatten

answered Nov 29, 2015 at 21:23

user2931706

1951 silver badge11 bronze badges

Comments

user18470550 · Accepted Answer · 2022-03-15 08:55:10Z

0

ActionController::Base.helpers.strip_tags(html)

this base helper return only text

html = "<a href=\" https://something.com/\"></a> <del>this</del> works</strong"

and this will be returned "this works"

answered Mar 15, 2022 at 8:55

user18470550

1

Collectives™ on Stack Overflow

Extract text between two tags using regex in Ruby

4 Answers 4

1 Comment

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related