Insert HTML formatted string into another string

Question

I have two strings. One of them contains  tag, is completely lowercase and doesn't contain delimiters or common words like 'the', 'in', etc. while the other isn't. An example:

$str1 = 'world <em>round</em>';
$str2 = 'World - is Round';

I want to make the $str2 as 'World - is Round', by comparing which lowercase word in the $str1 contains the  tag. So far, I've done the following, but is fails if number of words aren't equal in both strings.

public static function applyHighlighingOnDisplayName($str1, $str2) {
    $str1_w = explode(' ', $str1);
    $str2_w = explode(' ', $str2);
    for ($i=0; $i<count($str1_w); $i++) {
       if (strpos($str1_w[$i], '<em>') !== false) {
            $str2_w[$i] = '<em>' . $str2_w[$i] . '</em>';
       }
    }
    return implode(' ', $str2_w);
}

$str1 = '<em>cup</em> <em>cakes</em>' & $str2 = 'Cup Cakes':

applyHighlighingOnDisplayName($str1, $str2) : '<em>Cup</em> <em>Cakes</em>': Correct

$str1 = 'cup <em>cakes</em>' & $str2 = 'The Cup Cakes':

applyHighlighingOnDisplayName($str1, $str2) : 'The <em>Cup</em> Cakes: Incorrect

How should I change my approach?

Can you fix the formatting on your question so it's clearer what is code and what isn't? Also, will both words ALWAYS be in $str2 -- i.e. do you need to check that the non- word is present? — i alarmed alien
– i alarmed alien, Commented Oct 29, 2014 at 14:25
You could try using a regular expression to find the word that is wrapped by . — benjrb
– benjrb, Commented Oct 29, 2014 at 14:26

motanelu · Accepted Answer · 2014-10-29 15:37:54Z

1

Like others said, regex is the solution. Here is a working example with detailed comments:

$string1 = 'world <em>round</em>';
$string2 = 'World is - Round';

// extract what's in between <em> and </em> - it will be stored in $matches[1]
preg_match('/<em>(.+)<\/em>/i', $string1, $matches);

if (!$matches) {
    echo 'The first string does not contain <em>';
    exit();
}

// replace what we found in the previous operation
$newString = preg_replace('/\b' . preg_quote($matches[1], '\b/') . '/i', '<em>$0</em>', $string2);
echo $newString;

Details at:

Later edit - cover multiple cases:

$string1 = 'world <em>round</em> not <em>flat</em>';
$string2 = 'World is - Round not Flat! Round, ok?';

// extract what's in between <em> and </em> - it will be stored in $matches[1]
preg_match_all('/<em>(.+?)<\/em>/i', $string1, $matches);

if (!$matches) {
    echo 'The first string does not contain <em>';
    exit();
}

foreach ($matches[1] as $match) {
    // replace what we found in the previous operation
    $string2 = preg_replace('/\b' . preg_quote($match) . '\b/i', '<em>$0</em>', $string2);
}

echo $string2;

edited Oct 29, 2014 at 15:37

answered Oct 29, 2014 at 14:36

motanelu

4,0251 gold badge16 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user188995 Over a year ago

@motnelu: What if the string contains multiple instances of tags? For eg.: $str1 = 'The Cup Cakes' $str2 = 'cup cakes'?

motanelu Over a year ago

It will work, since the replacement is done in a case insensitive manner (see requirement in the question).

user188995 Over a year ago

@motanelu: Can I also apply  to cases where there are special characters like '? For eg.: users and User's?

Community · Accepted Answer · 2020-06-20 09:12:55Z

1

Your current method is dependent on the number of words in the strings; a better solution would be to use regular expressions to do the matching for you. The following version will work safely even if you have emphasized words that are substrings of other emphasized words (e.g. "cat" and "cat's cradle" or "cat-litter").

function applyHighlighingOnDisplayName($str1, $str2) {

    # if we have strings surrounded by <em> tags...
    if (preg_match_all("#<em>(.+?)</em>#", $str1, $match)) {

        ## sort the match strings by length, descending
        usort($match[1], function($a,$b){ return strlen($b) - strlen($a); } );

        # all the match words are in $match[1]
        foreach ($match[1] as $m) {
            # replace every match with a string that is very unlikely to occur
            # this prevents \b matching the start or end of <em> and </em>
            $str2 = preg_replace("#\b($m)\b#i",
                "ZZZZ$1ZZZZ",
                $str2);
        }
        # replace ZZZZ with the <em> tags
        return preg_replace("#ZZZZ(.*?)ZZZZ#", "<em>$1</em>", $str2);
    }
    return $str2;
}

$str1 = 'cup <em>cakes</em>';
$str2 = 'Cup Cakes';

print applyHighlighingOnDisplayName($str1, $str2) . PHP_EOL;

Output:

Cup <em>Cakes</em>
The Cup <em>Cakes</em>

Two strings with no 'd words:

$str1 = 'cup cakes';
$str2 = 'Cup Cakes';

print applyHighlighingOnDisplayName($str1, $str2) . PHP_EOL;

Output:

Cup Cakes

Now somethings rather trickier: lots of short words where one word is a substring of all the other words:

$str1 = 'i if in i\'ve is it';

$str2 = 'I want to make the str2 as "World - is Round", by comparing which lowercase word in the str1 contains the em tag. So far, I\'ve done the following, but it fails if number of words aren\'t equal in both strings.';

Output:

I want to make the str2 as "World - is Round", by comparing which lowercase word in the str1 contains the em tag. So far, I've done the following, but it fails if number of words aren't equal in both strings.

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Oct 29, 2014 at 14:35

i alarmed alien

9,5403 gold badges30 silver badges40 bronze badges

3 Comments

user188995 Over a year ago

What if the string contains multiple instances of  tags? For eg.: $str1 = 'The Cup Cakes' $str2 = 'cup cakes'?

user188995 Over a year ago

@i alarmed alien: Can I also apply  to cases where there are special characters like '? For eg.: users and User's?

i alarmed alien Over a year ago

@user188995 As you can see in the example I posted, you can run it on strings with ' in them, but it assumes that you're not removing characters like ' that are grammatically important. users and user's have very different meanings--I'm not sure it's wise to strip these words of their semantics.

Marc B · Accepted Answer · 2014-10-29 14:27:08Z

0

It's because your highlighting code is expecting a 1:1 correspondence between word positions in the two strings:

cup <em>cakes</em>
 1        2
Cup     Cakes

but on your incorrect sample:

cup <em>cakes</em>
 1        2            3
The      Cup         Cakes

e.g. you find  at word #2, so you highlight word #2 in the other string - but in that string, word #2 is Cup.

A better algorithm would be to strip the html from your original string, so you end up with just cup cakes. Then you look for cup cakes in the other string, and highlight the second word of that location. That'll compensate for any "motion" within the string caused by extra (or fewer) words.

answered Oct 29, 2014 at 14:27

Marc B

362k44 gold badges433 silver badges508 bronze badges

Collectives™ on Stack Overflow

Insert HTML formatted string into another string

3 Answers 3

3 Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related