13

I want to parse shortcode like Wordpress with attributes:

Input:

[include file="header.html"]

I need output as array, function name "include" and attributes with values as well , any help will be appreciated.

Thanks

2
  • 3
    Please try my library, it's standalone and already battle-tested in production: github.com/thunderer/Shortcode . If you need anything, let me know! Commented Sep 12, 2015 at 15:12
  • @TomaszKowalczyk thanks! :) Commented Mar 20, 2017 at 10:58

7 Answers 7

11

Here's a utility class that we used on our project It will match all shortcodes in a string (including html) and it will output an associative array including their name, attributes and content

final class Parser {

    // Regex101 reference: https://regex101.com/r/pJ7lO1
    const SHORTOCODE_REGEXP = "/(?P<shortcode>(?:(?:\\s?\\[))(?P<name>[\\w\\-]{3,})(?:\\s(?P<attrs>[\\w\\d,\\s=\\\"\\'\\-\\+\\#\\%\\!\\~\\`\\&\\.\\s\\:\\/\\?\\|]+))?(?:\\])(?:(?P<content>[\\w\\d\\,\\!\\@\\#\\$\\%\\^\\&\\*\\(\\\\)\\s\\=\\\"\\'\\-\\+\\&\\.\\s\\:\\/\\?\\|\\<\\>]+)(?:\\[\\/[\\w\\-\\_]+\\]))?)/u";

    // Regex101 reference: https://regex101.com/r/sZ7wP0
    const ATTRIBUTE_REGEXP = "/(?<name>\\S+)=[\"']?(?P<value>(?:.(?![\"']?\\s+(?:\\S+)=|[>\"']))+.)[\"']?/u";

    public static function parse_shortcodes($text) {
        preg_match_all(self::SHORTOCODE_REGEXP, $text, $matches, PREG_SET_ORDER);
        $shortcodes = array();
        foreach ($matches as $i => $value) {
            $shortcodes[$i]['shortcode'] = $value['shortcode'];
            $shortcodes[$i]['name'] = $value['name'];
            if (isset($value['attrs'])) {
                $attrs = self::parse_attrs($value['attrs']);
                $shortcodes[$i]['attrs'] = $attrs;
            }
            if (isset($value['content'])) {
                $shortcodes[$i]['content'] = $value['content'];
            }
        }

        return $shortcodes;
    }

    private static function parse_attrs($attrs) {
        preg_match_all(self::ATTRIBUTE_REGEXP, $attrs, $matches, PREG_SET_ORDER);
        $attributes = array();
        foreach ($matches as $i => $value) {
            $key = $value['name'];
            $attributes[$i][$key] = $value['value'];
        }
        return $attributes;
    }
}

print_r(Parser::parse_shortcodes('[include file="header.html"]'));

Output:

Array
(
    [0] => Array
        (
            [shortcode] => [include file="header.html"]
            [name] => include
            [attrs] => Array
                (
                    [0] => Array
                        (
                            [file] => header.html
                        )
                )
        )
)
Sign up to request clarification or add additional context in comments.

3 Comments

This is great.. but is there a way to do it as a replace.. it just transforms the shortcode and it removes any other text that may be around it..
@REPTILE The question requested that it will parse the shortcode and the output it as an associative array. You can then take each element and produce whatever output you like. You could do something like str_replace($shortcode, $compiled_shortcode, $string) That will search for the shortcode, and replace it with the $compiled_output that you have produced inside a string. Normally the string is the entire html like the post_content
does unfortunately not work when value is only 1 char long (e.g. id=8 or id="8"). I think I fixed it: regex101.com/r/sZ7wP0/6
4

Using this function

$code = '[include file="header.html"]';
$innerCode = GetBetween($code, '[', ']');
$innerCodeParts = explode(' ', $innerCode);

$command = $innerCodeParts[0];

$attributeAndValue = $innerCodeParts[1];
$attributeParts = explode('=', $attributeAndValue);
$attribute = $attributeParts[0];
$attributeValue = str_replace('"', '', $attributeParts[1]);

echo $command . ' ' . $attribute . '=' . $attributeValue;
//this will result in include file=header.html

$command will be "include"

$attribute will be "file"

$attributeValue will be "header.html"

3 Comments

I can see error: Fatal error: Call to undefined function GetBetween() in wp_parse.php on line 4
Read the first line of my answer, you need to paste this code into your file: function GetBetween($content,$start,$end){ $r = explode($start, $content); if (isset($r[1])){ $r = explode($end, $r[1]); return $r[0]; } return ''; }
@ShahzabAsif I didn't test my code so let me know if it works for you.
3

This is actually tougher than it might appear on the surface. Andrew's answer works, but begins to break down if square brackets appear in the source text [like this, for example]. WordPress works by pre-registering a list of valid shortcodes, and only acting on text inside brackets if it matches one of these predefined values. That way it doesn't mangle any regular text that might just happen to have a set of square brackets in it.

The actual source code of the WordPress shortcode engine is fairly robust, and it doesn't look like it would be all that tough to modify the file to run by itself -- then you could use that in your application to handle the tough work. (If you're interested, take a look at get_shortcode_regex() in that file to see just how hairy the proper solution to this problem can actually get.)

A very rough implementation of your question using the WP shortcodes.php would look something like:

// Define the shortcode
function inlude_shortcode_func($attrs) {
    $data = shortcode_atts(array(
        'file' => 'default'
    ), $attrs);

    return "Including File: {$data['file']}";
}
add_shortcode('include', 'inlude_shortcode_func');

// And then run your page content through the filter
echo do_shortcode('This is a document with [include file="header.html"] included!');

Again, not tested at all, but it's not a very hard API to use.

1 Comment

It would have been better if you would have included a working example.
3

I also needed this functionality in my PHP framework. This is what I've written, it works pretty well. It works with anonymous functions, which I really like (it's a bit like the callback functions in JavaScript).

<?php
//The content which should be parsed
$content = '<p>Hello, my name is John an my age is [calc-age day="4" month="10" year="1991"].</p>';
$content .= '<p>Hello, my name is Carol an my age is [calc-age day="26" month="11" year="1996"].</p>';

//The array with all the shortcode handlers. This is just a regular associative array with anonymous functions as values. A very cool new feature in PHP, just like callbacks in JavaScript or delegates in C#.
$shortcodes = array(
    "calc-age" => function($data){
        $content = "";
        //Calculate the age
        if(isset($data["day"], $data["month"], $data["year"])){
            $age = date("Y") - $data["year"];
            if(date("m") < $data["month"]){
                $age--;
            }
            if(date("m") == $data["month"] && date("d") < $data["day"]){
                $age--;
            }
            $content = $age;
        }
        return $content;
    }
);
//http://stackoverflow.com/questions/18196159/regex-extract-variables-from-shortcode
function handleShortcodes($content, $shortcodes){
    //Loop through all shortcodes
    foreach($shortcodes as $key => $function){
        $dat = array();
        preg_match_all("/\[".$key." (.+?)\]/", $content, $dat);
        if(count($dat) > 0 && $dat[0] != array() && isset($dat[1])){
            $i = 0;
            $actual_string = $dat[0];
            foreach($dat[1] as $temp){
                $temp = explode(" ", $temp);
                $params = array();
                foreach ($temp as $d){
                    list($opt, $val) = explode("=", $d);
                    $params[$opt] = trim($val, '"');
                }
                $content = str_replace($actual_string[$i], $function($params), $content);
                $i++;
            }
        }
    }
    return $content;
}
echo handleShortcodes($content, $shortcodes);
?>

The result:
Hello, my name is John an my age is 22.
Hello, my name is Carol an my age is 17.

1 Comment

Great piece of code - thanks! I did notice that if you change the explode like to $temp = explode('" ', $temp); then you can have spaces in the quoted values.
0

I have modified above function with wordpress function

function extractThis($short_code_string) {
    $shortocode_regexp = "/(?P<shortcode>(?:(?:\\s?\\[))(?P<name>[\\w\\-]{3,})(?:\\s(?P<attrs>[\\w\\d,\\s=\\\"\\'\\-\\+\\#\\%\\!\\~\\`\\&\\.\\s\\:\\/\\?\\|]+))?(?:\\])(?:(?P<content>[\\w\\d\\,\\!\\@\\#\\$\\%\\^\\&\\*\\(\\\\)\\s\\=\\\"\\'\\-\\+\\&\\.\\s\\:\\/\\?\\|\\<\\>]+)(?:\\[\\/[\\w\\-\\_]+\\]))?)/u";
    preg_match_all($shortocode_regexp, $short_code_string, $matches, PREG_SET_ORDER);
    $shortcodes = array();
    foreach ($matches as $i => $value) {
       $shortcodes[$i]['shortcode'] = $value['shortcode'];
       $shortcodes[$i]['name'] = $value['name'];
       if (isset($value['attrs'])) {
        $attrs = shortcode_parse_atts($value['attrs']);
        $shortcodes[$i]['attrs'] = $attrs;
       }
       if (isset($value['content'])) {
        $shortcodes[$i]['content'] = $value['content'];
       }
    }
    return $shortcodes;
  }

I think this one help for all :)

Comments

0

Updating the @Duco's snippet, As it seems like, it's exploding by spaces which ruins when we have some like

[Image source="myimage.jpg" alt="My Image"]

To current one:

function handleShortcodes($content, $shortcodes){
    function read_attr($attr) {
        $atList = [];

        if (preg_match_all('/\s*(?:([a-z0-9-]+)\s*=\s*"([^"]*)")|(?:\s+([a-z0-9-]+)(?=\s*|>|\s+[a..z0-9]+))/i', $attr, $m)) {
            for ($i = 0; $i < count($m[0]); $i++) {
                if ($m[3][$i])
                    $atList[$m[3][$i]] = null;
                else
                    $atList[$m[1][$i]] = $m[2][$i];
            }
        }
        return $atList;
    }
    //Loop through all shortcodes
    foreach($shortcodes as $key => $function){
        $dat = array();
        preg_match_all("/\[".$key."(.*?)\]/", $content, $dat);

        if(count($dat) > 0 && $dat[0] != array() && isset($dat[1])){
            $i = 0;
            $actual_string = $dat[0];
            foreach($dat[1] as $temp){
                $params = read_attr($temp);
                $content = str_replace($actual_string[$i], $function($params), $content);
                $i++;
            }
        }
    }
    return $content;
}
$content = '[image source="one" alt="one two"]';

Result:

array( 
  [source] => myimage.jpg,
  [alt] => My Image
)

Updated (Feb 11, 2020)
It appears to be following regex under preg_match only identifies shortcode with attributes

preg_match_all("/\[".$key." (.+?)\]/", $content, $dat);

to make it work with as normal [contact-form] or [mynotes]. We can change the following to

preg_match_all("/\[".$key."(.*?)\]/", $content, $dat);

Comments

0

I just had the same problem. For what I have to do, I am going to take advantage of existing xml parsers instead of writing my own regex. I am sure there are cases where it won't work

example.php

<?php

$file_content = '[include file="header.html"]';

// convert the string into xml
$xml = str_replace("[", "<", str_replace("]", "/>", $file_content));

$doc = new SimpleXMLElement($xml);

echo "name: " . $doc->getName() . "\n";
foreach($doc->attributes() as $key => $value) {
    echo "$key: $value\n";
}
$ php example.php 
name: include
file: header.html

to make it work on ubuntu I think you have to do this

sudo apt-get install php-xml

(thanks https://drupal.stackexchange.com/a/218271)

If you have lots of these strings in a file, then I think you can still do the find replace, and then just treat it all like xml.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.