0

I have a html string like...

<match id="18" srs="ICC Womens World Cup Qualifier, 2010" mchDesc="BANW vs PMGW" mnum="4th Match">

Using php how i can split/decode/parse this string as a accessible object(key value pair) such as....

array(
    "id"=>"18", 
    "srs"=>"ICC Womens World Cup Qualifier, 2010", 
    "mchDesc"=>"BANW vs PMGW", 
    "mnum"=>"4th Match"
);

Output:

Array
(
    [id] => 18
    [srs] => ICC Womens World Cup Qualifier, 2010
    [mchDesc] => BANW vs PMGW
    [mnum] => 4th Match
)
3

2 Answers 2

4

Using DOMDocument and DOMAttr:

$str = '<match id="18" srs="ICC Womens World Cup Qualifier, 2010" mchDesc="BANW vs PMGW" mnum="4th Match">';
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($str);

$result = [];

foreach($dom->getElementsByTagName('match')->item(0)->attributes as $attr) {
    $result[$attr->name] = $attr->value;
}

print_r($result);

The main advantage is that it doesn't care if attributes values are enclosed between single or double quotes (or no quotes at all), if there are spaces before or after the equal sign.

Sign up to request clarification or add additional context in comments.

Comments

2

This Should Work.

(\w+)\=\"([a-zA-Z0-9 ,.\/&%?=]+)\"

Code PHP:

<?php
$re = '/(\w+)\=\"([a-zA-Z0-9 ,.\/&%?=]+)\"/m';
$str = '<match id="18" srs="ICC Womens World Cup Qualifier, 2010" mchDesc="BANW vs PMGW" mnum="4th Match">
';

preg_match_all($re, $str, $matches);

$c = array_combine($matches[1], $matches[2]);

print_r($c);

Output:

Array
(
    [id] => 18
    [srs] => ICC Womens World Cup Qualifier, 2017
    [mchDesc] => BANW vs PMGW
    [mnum] => 4th Match, Group B
    [type] => ODI
    [vcity] => Colombo
    [vcountry] => Sri Lanka
    [grnd] => Colombo Cricket Club Ground
    [inngCnt] => 0
    [datapath] => google.com/j2me/1.0/match/2017/
)

Ideone: http://ideone.com/OQ7Ko1

Regex101: https://regex101.com/r/lyMmKF/7

6 Comments

everything ok but when string become "<match id="18" type="ODI" srs="ICC Womens World Cup Qualifier, 2017" mchDesc="BANW vs PMGW" mnum="4th Match, Group B" vcity="Colombo" vcountry="Sri Lanka" grnd="Colombo Cricket Club Ground" inngCnt="0" datapath="google.com/j2me/1.0/match/2017/…">" then datapath cannot parse.
@masumbillah fixed. ^_^ and added support for "google.com/j2me/1.0/match/2017/index.php?=potato" and "google.com/j2me/1.0/match/2017/index.php?=potato&?watermelon=true"
If a pattern starts with \w+, you don't need to put a word boundary before. But if you want to reduce the number of steps, you can put one at the very beginning of the pattern (outside of the parenthesis).
but i can not parse datapath="dhttp://google.com/j2me/1.0/match/2017/2017_ICC_WOMENS_WORLDCUP_QUALIFIER/BANW_PMGW_FEB07/" @Edulynch
I think re should be (\w+)\=\"([a-zA-Z0-9 ,._:\/&%?=]+)\" @Edulynch
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.