PHP - Extract data from string with regex

Question

I need help to do this operation. I Have a string like this:

<!doctype html> <html> <head> <meta charset="utf-8"> <title>Formatting the report</title><meta http-equiv="refresh" content="5;url=/file/xslt/download/?fileName=somename.pdf"> </head>

I need to extract the fileName parameter. How to do this?

I thing that is possible with regex, but I do not know well this.

Thanks!

Mandatory link to read (twice): stackoverflow.com/a/1732454/393701 — SirDarius
– SirDarius, Commented Feb 13, 2014 at 10:36
@SirDarius Did you read it (twice)? And did you read the question? Do you think he want's to write a html parser or has a clearly definable problem which can easily be solved by using a quick regex? It's fatiguing and annoying reading this thrown in piece over and over again where it is absolutely unfitting. — Jonny 5
– Jonny 5, Commented Feb 13, 2014 at 11:11
@Jonny5 This link has an obvious value, if only for its humoristic stance. The problem I have with this specific question lies within its title. Extract data from string with regex. The question can be solved with a regular expression, but there is a clear assumption that it is the best way to do so, so no other solution should be even considered. The input string here is HTML, so it is probably better to properly locate the content attribute first, and then use a regexp on the attribute value only. — SirDarius
– SirDarius, Commented Feb 13, 2014 at 11:30

user3064914 · Accepted Answer · 2014-02-13 11:53:03Z

1

Try this..

This will capture the filename

The Pattern is given below

/fileName=(.+?)\"/

<?php
$subject = "<!doctype html> <html> <head> <meta charset="utf-8"> <title>Formatting the report</title><meta http-equiv="refresh" content="5;url=/file/xslt/download/?fileName=somename.pdf"> </head>";
$pattern = '/fileName=(.+)"/';
preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE, 2);
print_r($matches);
?>

$1->Contains the file name

demo

edited Feb 13, 2014 at 11:53

answered Feb 13, 2014 at 10:29

user3064914

9591 gold badge7 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

carlo9987 Over a year ago

This work, but in the output there is the end part of tag (">), how I can remove this? This is the output: somename.pdf">

user3064914 Over a year ago

$1 will have somename.pdf see the demo.

carlo9987 Over a year ago

With $1 I have that problem: filename and this ">

carlo9987 Over a year ago

I extract the filename without extension (extension is not necessary for me) with this: '/fileName=(.+).pdf/'. Thank you very much!

Jonny 5 Over a year ago

By default quantifiers are greedy, to make them ungreedy (lazy), add a ? after the quantifier e.g. (.*?) or (.+?) to eat up as few as possible to meet ". Instead could use the U (PCRE_UNGREEDY) modifier.

JamesG · Accepted Answer · 2014-02-13 10:22:34Z

0

Try something along the lines of:

$str = '<!doctype html> <html> <head> <meta charset="utf-8"> <title>Formatting the report</title><meta http-equiv="refresh" content="5;url=/file/xslt/download/?fileName=somename.pdf"> </head>';

preg_match('@fileName=(.*)"@', $str, $matches);

print_r($matches);

answered Feb 13, 2014 at 10:22

JamesG

1,7182 gold badges28 silver badges42 bronze badges

Comments

Mahmoud.Eskandari · Accepted Answer · 2014-02-13 10:25:16Z

0

php simple html dom is clean and good way for trace html and find html elements by selector's like Jquery selectors.

answered Feb 13, 2014 at 10:25

Mahmoud.Eskandari

1,4783 gold badges22 silver badges37 bronze badges

Collectives™ on Stack Overflow

PHP - Extract data from string with regex

3 Answers 3

5 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related