0

I need help to do this operation. I Have a string like this:

<!doctype html> <html> <head> <meta charset="utf-8"> <title>Formatting the report</title><meta http-equiv="refresh" content="5;url=/file/xslt/download/?fileName=somename.pdf"> </head>

I need to extract the fileName parameter. How to do this?

I thing that is possible with regex, but I do not know well this.

Thanks!

3
  • Mandatory link to read (twice): stackoverflow.com/a/1732454/393701 Commented Feb 13, 2014 at 10:36
  • 1
    @SirDarius Did you read it (twice)? And did you read the question? Do you think he want's to write a html parser or has a clearly definable problem which can easily be solved by using a quick regex? It's fatiguing and annoying reading this thrown in piece over and over again where it is absolutely unfitting. Commented Feb 13, 2014 at 11:11
  • @Jonny5 This link has an obvious value, if only for its humoristic stance. The problem I have with this specific question lies within its title. Extract data from string with regex. The question can be solved with a regular expression, but there is a clear assumption that it is the best way to do so, so no other solution should be even considered. The input string here is HTML, so it is probably better to properly locate the content attribute first, and then use a regexp on the attribute value only. Commented Feb 13, 2014 at 11:30

3 Answers 3

1

Try this..

This will capture the filename

The Pattern is given below

/fileName=(.+?)\"/

<?php
$subject = "<!doctype html> <html> <head> <meta charset="utf-8"> <title>Formatting the report</title><meta http-equiv="refresh" content="5;url=/file/xslt/download/?fileName=somename.pdf"> </head>";
$pattern = '/fileName=(.+)"/';
preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE, 2);
print_r($matches);
?>

$1->Contains the file name

demo

Sign up to request clarification or add additional context in comments.

5 Comments

This work, but in the output there is the end part of tag (">), how I can remove this? This is the output: somename.pdf">
$1 will have somename.pdf see the demo.
With $1 I have that problem: filename and this ">
I extract the filename without extension (extension is not necessary for me) with this: '/fileName=(.+).pdf/'. Thank you very much!
By default quantifiers are greedy, to make them ungreedy (lazy), add a ? after the quantifier e.g. (.*?) or (.+?) to eat up as few as possible to meet ". Instead could use the U (PCRE_UNGREEDY) modifier.
0

Try something along the lines of:

$str = '<!doctype html> <html> <head> <meta charset="utf-8"> <title>Formatting the report</title><meta http-equiv="refresh" content="5;url=/file/xslt/download/?fileName=somename.pdf"> </head>';

preg_match('@fileName=(.*)"@', $str, $matches);

print_r($matches);

Comments

0

php simple html dom is clean and good way for trace html and find html elements by selector's like Jquery selectors.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.