1

I'm trying to extract the string (201 & 202) from the html response code below. So far I have tried the following regex

punumber=(.+)

but the problem is that there are many instances of the punumber on the page and gets me stuff that I dont need.
The string i need are inside the <h3 class="content-title">.

So can someone please help me write a regex to extract the punumber within the h3 class only?

<h3 class="content-title">
<!--  change when this is completed -->
    <a href="/container/recentIssue.jsp?punumber=201">
    Title 1
    </a>
</h3>

<h3 class="content-title">
<!--  change when this is completed -->
    <a href="/container/mostRecentIssue.jsp?punumber=202">
    Title 1
    </a>                                    
</h3>
2
  • @Alies Belik - Thanks for the edits but I had deliberately left the spaces and new lines in the above code because that is how the actual html response looks like. Also the numbers of additional newlines are different in each <h3 class="content-title"> block. Commented Jan 18, 2013 at 14:32
  • Ok, when you suppose that's necessary you can rollback your post ti the initial version. Commented Jan 18, 2013 at 14:42

3 Answers 3

5

This works for me:

Reference Name : test
Regexp : punumber=([^"]+?)"

Template : $1$

Match No : -1

(this will get all values) NV_punumber

With -1, JMeter will create:

  • ${test_1} => 201

  • ${test_2} => 202

Sign up to request clarification or add additional context in comments.

6 Comments

nice one I like the creation of separate variables +1
Thanks @PMD UBIK-INGENIERIE - when I use punumber=([^"]+?)" I get about 77 instances of the pattern, What I want are the only ones inside the <h3> tag and there are 25 instances of those.
You mean you have the same <a href="/container/mostRecentIssue.jsp?punumber=WWW"> within <h3> and some not within it, and you only wnat the one within <h3> right ? Then with only regexp extractor it will be rather hard. You could try to first extract content within h3 then use Regexp Extractor and make it work on variable instead of sample response. Another option is to use new JMeter 2.9 feature based on JQuery /CSS selectors, as it's not released yet you should use nightly build (read build instructions). It seems to me more appropriate to your use case
@PMD - I used two regex extractors, one to extract just the <h3> and the next one to extract the punumber. The first one looks like <h3 class="content-title">(.+)\n.*\n.*\n.*\n.*\n.*\n.*\n.*\n.* which extracts the whole tag content. Is there a better way to express the regex?
Did you try this : h3 class="content-title">(.+?)</h3
|
2

Here is the regex that works for me :

punumber=(\d+)

If you're parsing html you should consider using something else other than regex to extract info like jsoup.

Anyways here is the jmeter test file attached with dummy sampler(with regex post processor) simulating your case and debug sampler that gets the result you want.

http://pastebin.com/Uti8Pv9E

2 Comments

Your regepx is better than mine :-)
Thanks @Ant, when I use punumber=(\d+) I get about 77 instances of the pattern, What I want are the only ones inside the <h3> tag and there are 25 instances of those. I will also give the jsoup stuff a shot. Thanks again.
0

You can possibly combine in this case XPath Extractor with structured query (to get all href values with punumber from ONLY instances inside <h3> tags) together with extracting then punumber value from href in ForEach Controller loop.

. . .
YOUR HTTP REQUEST
    XPath Extractor
    Use Tidy = true
    Reference Name = punum
    XPath Query = //h3[@class="content-title"]/a[text()="Title 1"]/@href
    Default value = NOT_FOUND
ForEach Controller
Input variable prefix = punum
Output variable name = pnum
Add "_" before number = true
    User Parameters
    cnt = ${__counter(FALSE,)}
    Regular Expression Extractor
    Apply to = Jmeter Variable = pnum
    Reference Name = punumber_${cnt}
    Regular Expression = punumber=(\d+)
    Template = $1$
    Match No. = 1
    Default value = NOT_FOUND
    ...
. . .
  1. XPath Extractor will give you hrefs values of all the <a> items under <h3> tag as punum_1,punum_2,...,punum_N vars.
  2. Foreach Controller takes one after another punum_X var, refers it as pnum, applies to it RegEx Extractor to get punumber value and stores extracted value as punumber_1, punumber_2,...,punumber_N (using counter defined in User Parameters and incremented each step).

NOTE: Since here XPath Extractor is used to parse HTML (not XML) response ensure that Use Tidy (tolerant parser) option is CHECKED (in XPath Extractor's control panel).

Same test-plan available here: http://db.tt/dnACZtGL (I've used @ant's one from his answer, thank him).

1 Comment

I found using two regex extractors easy but this one worked as well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.