regex string between two strings

Question

How can I get the text between two constant text?

Example:

<rate curr="KRW" unit="100">19,94</rate>

19,94

is between

"<rate curr="KRW" unit="100">"

and

"</rate>"

Other example:

ABCDEF

getting substring between AB and EF= CD

he com̡e̶s

fredley
– fredley

2012-01-30 12:45:53 +00:00
Commented Jan 30, 2012 at 12:45 — fredley
– fredley, Commented Jan 30, 2012 at 12:45
What language/tool are you using?

Qtax
– Qtax

2012-01-30 12:48:25 +00:00
Commented Jan 30, 2012 at 12:48 — Qtax
– Qtax, Commented Jan 30, 2012 at 12:48

hsz · Accepted Answer · 2012-01-30 12:46:21Z

5

Try with:

/<rate[^>]*>(.*?)<\/rate>/

However it is better NOT TO USE REGEX WITH HTML.

answered Jan 30, 2012 at 12:46

hsz

153k63 gold badges268 silver badges320 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user669677 Over a year ago

I'm expecting something like /<rate curr="KRW" unit="100">(.*?)</rate> not the general form, despite of it may works

Will Yu · Accepted Answer · 2013-04-24 15:44:02Z

2

The way I do it is using the match all

matched = Regex.Matches(result, @"(?<=<rate curr=\"KRW\" unit=\"100\">)(.*?)(?=</rate>)");

Then get one by one using match[i].Groups[1].value

answered Apr 24, 2013 at 15:44

Will Yu

5425 silver badges12 bronze badges

Comments

Oliver · Accepted Answer · 2012-01-30 12:48:37Z

1

If you're analyzing HTML, you're probably better off going with javascript and .innerHTML(). Regex is a bit overkill.

answered Jan 30, 2012 at 12:48

Oliver

2,2325 gold badges25 silver badges31 bronze badges

1 Comment

cmbuckley Over a year ago

+1 and I would express the same sentiment with PHP and strip_tags.

Prashant Bhate · Accepted Answer · 2012-01-30 14:58:56Z

If you want a generic solution, i.e to find a string between two strings You may use Pattern.quote() [or wrap string with \Q and \E around] to quote start and end strings and use (.*?) for a non greedy match.

See an example of its use in below snippet

@Test
public void quoteText(){
    String str1 = "<rate curr=\"KRW\" unit=\"100\">";
    String str2 = "</rate>";

    String input = "<rate curr=\"KRW\" unit=\"100\">19,94</rate>"
                      +"<rate curr=\"KRW\" unit=\"100\"></rate>"
                      +"<rate curr=\"KRW\" unit=\"100\">19,96</rate>";

    String regex = Pattern.quote(str1)+"(.*?)"+Pattern.quote(str2);
    System.out.println("regex:"+regex);

    Pattern p = Pattern.compile(regex);
    Matcher m = p.matcher(input);
    while(m.find()){
        String group = m.group(1);
        System.out.println("--"+group);
    }

Output

regex:\Q<rate curr="KRW" unit="100">\E(.*?)\Q</rate>\E
--19,94
--
--19,96

Note:Though its not recommended to use regex to parse entire HTML, I think there is no harm in conscious use of regex while treating HTML as plain text

Joshua Pinter · Accepted Answer · 2013-09-13 20:45:36Z

0

The simple regex matching string you're looking for is:

(?<=<rate curr=\"KRW\" unit=\"100\">)(.*?)(?=</rate>)

In Ruby, for example, this would translate to:

string = '<rate curr="KRW" unit="100">19,94</rate>'

string.match("(?<=<rate curr=\"KRW\" unit=\"100\">)(.*?)(?=</rate>)").to_s
# => "19,94"

Thanks to Will Yu.

answered Sep 13, 2013 at 20:45

Joshua Pinter

47.9k23 gold badges261 silver badges258 bronze badges

Comments

Alexandros · Accepted Answer · 2012-01-30 13:03:47Z

-1

I suggest that you use an HTML parser. The grammar that defines HTML is a context-free grammar, which is fundamentally too complex to be parsed by regular expressions. Even if you manage to write a regular expression that will achieve what you want, but will probably fail on some corner cases.

For instance, what if you are expected to parse the following HTML?

<rate curr="KRW" unit="100"><rate curr="KRW" unit="100">19,94</rate></rate>

A regular expression may not handle this corner case properly.

answered Jan 30, 2012 at 13:03

Alexandros

3,0841 gold badge25 silver badges37 bronze badges

1 Comment

user669677 Over a year ago

It will never look like that in this case, and if it would, I dont care. I just want to know how to use regexp to find the text between two texts.

Collectives™ on Stack Overflow

regex string between two strings

6 Answers 6

1 Comment

Comments

1 Comment

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

1 Comment

Comments

1 Comment

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related