1

i want to extract specific links from a website.

The links look like that:

<a href="1494761,offer-mercedes-used.html">

The links are always the same - except the brandname (mercedes in this case).

This works fine so far but only delivers the first part of the link:

preg_match_all('/((\d{7}),offer-)/s',$inhalt,$results);

And this delivers the first link with the whole website :(

preg_match_all('/((\d{7}).*html)/s',$inhalt,$results);

Any ideas?

Note that i use preg_match_all() and not preg_match().

Thanks, Chama

2 Answers 2

1

While .*? would do (= less greedy), in both cases you should specify a more precise pattern.

Here [\w.-]+ would do. But [^">]+ might also be feasible, if the HTML source is consistent (or you specifically wish to ignore other variations).

preg_match_all('/((\d{7}),offer-[\w.-])/s',$inhalt,$results);
Sign up to request clarification or add additional context in comments.

Comments

1

Trying to parse xml/html with regex generally isn't a good idea, but if you're sure it will always be formatted well, this should return any links in the content.

/<a href="([^">]+)">/

This will more closely match only the example pattern you gave, but not sure what variations you might have

/<a href="([0-9]{7},offer-[a-z]+-used\.html)">/
// [7 numbers],offer-[at least one letter]-used.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.