1

I have a string which I store book pages. It's something like this:

///0///
Page1 Text
///1///
Page2 Text
///2///
Page3 Text
///3///

I want to extract page texts (Page1 Text, Page2 Text, Page3 Text). Here is the Regular Expression which is am using:

$format = "%///\d*///(.*)///\d*///%";
preg_replace_callback($format, "process_page", $text);

According to this page I can use other character than / in the start and end of the expression. So I used % to simplify my pattern, so I don't have to use escape character like this \/

It seems okay to me, but it return nothing. Can somebody please tell me where is the problem?

1
  • Why not to delete every line that begins with ///? Commented Mar 8, 2011 at 16:28

3 Answers 3

2

I think preg_split might be a better option for you:

$text = '
Page1 Text
///1///
Page2 Text
///2///
Page3 Text
';

$format = "%///\d+///%";
$arr = preg_split($format, $text);

// $arr = Array
// ( 
//     [0] => Page1 Text
//
//     [1] => 
// Page2 Text
// 
//     [2] => 
// Page3 Text
// )

Each page is now in it's own array element.

Sign up to request clarification or add additional context in comments.

1 Comment

Works like a charm. I'm playing with this more than 10 hour with no result. preg_split is fantastic function which unfortunately I was not familiar with. Thanks!
2

I think you need the s modifier: $format = "%///\d*///(.*)///\d*///%s";

s (PCRE_DOTALL)

If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded. This modifier is equivalent to Perl's /s modifier. A negative class such as [^a] always matches a newline character, independent of the setting of this modifier.

I'm not sure what you're tryingto do but personally I wouldn't use regex for this. you know the exact string to look for (eg ///4///) and from there the end string (///5/// or end of file). A simle substr with strpos might be a better option.

2 Comments

Actually a book can contain up until 1000 pages. Regular expression is much easier.
@Mani, what do you mean with "easier"? Is it faster?
2

I would use something like preg_spilt (see Tim Cooper's answer).

But for your RegEx, try this:

$format = "%///\d+///(.*?)(?=///\d+///)%s";

With Look-around assertion and s-modifier.

1 Comment

I am newbie in Regular Expression. I learn about Assertion recently, But I forgot to use it here. Thank for the tip.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.