0

If I have a string like this:

$str = '[tr]Kapadokya[/tr][en]Cappadocia[/en][de]Test[/de]';

I want that

$array = array(
'tr' => 'Kapadokya',
'en' => 'Cappadocia',
'de' => 'Test');

How do I do this?

1
  • Please keep in mind that the SO-community in general likes to see a bit more effort before asking a question: What did you try to solve the problem? Did you do some basic research? .... Commented Feb 19, 2015 at 10:35

1 Answer 1

2

With a few assumptions about the actual syntax of your BBCode-ish string the following (pc) regular expression might suffice.

<?php
$str = '[tr]Kapadokya[/tr][en]Cappadocia[/en][de]Test[/de]';

$pattern = '!
    \[
        ([^\]]+)
    \]
    (.+)
    \[
      /
        \\1
    \]
!x';

/* alternative, probably better expression (see comments)
$pattern = '!
    \[            (?# pattern start with a literal [ )
        ([^\]]+)  (?# is followed by one or more characters other than ] - those characters are grouped as subcapture #1, see below )
    \]            (?# is followed by one literal ] )
    (             (?# capture all following characters )
      [^[]+       (?# as long as not a literal ] is encountered - there must be at least one such character )
    )
    \[            (?# pattern ends with a literal [ and )
      /           (?# literal / )
      \1          (?# the same characters as between the opening [...] - that's subcapture #1  )
    \]            (?# and finally a literal ] )
!x';     // the x modifier allows us to make the pattern easier to read because literal white spaces are ignored
*/

preg_match_all($pattern, $str, $matches);
var_export($matches);

prints

array (
  0 => 
  array (
    0 => '[tr]Kapadokya[/tr]',
    1 => '[en]Cappadocia[/en]',
    2 => '[de]Test[/de]',
  ),
  1 => 
  array (
    0 => 'tr',
    1 => 'en',
    2 => 'de',
  ),
  2 => 
  array (
    0 => 'Kapadokya',
    1 => 'Cappadocia',
    2 => 'Test',
  ),
)

see also: http://docs.php.net/pcre

Sign up to request clarification or add additional context in comments.

4 Comments

There is no need to double escape the backreference and you should change .+ to [^[]+ or at least .+? to limit the backtracking (and if several translations are on the same line).
a) yes, probably; I never would though because I was raised a C-developer where it was (back in those days, sigh :D) an error. But you're right. b) no, I don't want to; that's the liberty I put into "a few assumptions" ;-) But yes, using [^[]+ can easily be considered better.
We have indeed few informations about the exact syntax of this bbcode-like, but it seems to be consecutives translations of the same term, so I assumed, my turn, that these tags can't contain nested tags, but the future is the door to all possible.
And your points are completely valid and therefore incoporated into the answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.