17

After TONS of research, I have found how to parse emoji in realtime using the Twemoji library.

Now, I need to figure out how to identify if there's emoji within some text, grab the position of that emoji and execute the parsing function.

Some example text can be

It is a great day 😀.

Need to find the 😀 within the whole string and use the following function to get its hex code, return the surrogate pairs and parse with the Twemoji library.

function entityForSymbolInContainer(selector) {
    var code = data.message.body.codePointAt(0);
    var codeHex = code.toString(16);
    while (codeHex.length < 4) {
        codeHex = "0" + codeHex;
    }

    return codeHex;
}

// Get emoji hex code
    var emoji = entityForSymbolInContainer(data.message.body);
// For given an HEX codepoint, returns UTF16 surrogate pairs
    var emoji = twemoji.convert.fromCodePoint(emoji);
// Given a generic string, it will replace all emoji with an <img> tag
    var emoji = twemoji.parse(emoji);

I am using the following check to see if there's emoji within the text. Problem is that for a simple grinning face (😀) it doesn't alert me. However, if I type in the "shirt and tie" (👔) it will alert me to that.

var string = "It is a great day 😀.";
var emojiRegex = /([\uE000-\uF8FF]|\uD83C[\uDF00-\uDFFF]|\uD83D[\uDC00-\uDDFF])/g;

if (string.match(emojiRegex)) {
    alert("emoji found");
}

Please help on the issue of the regex not picking up the emoji. After that, I should be able to just find that within the string.

Thank you!

3
  • 😀 is U+1F600, which is encoded in UTF-16 as \uD83D\uDE00. Your regular expression does not consider that to be an emoji. (It stops at \uD83D\uDDFF.) Commented May 7, 2016 at 14:41
  • Ok, thanks for the clarification. Do you know of a resource for a more complete regex? Commented May 7, 2016 at 14:54
  • 1
    Before you can develop an algorithm to detect emoji, you first need to have a clear definition as to what an emoji is. Whoever wrote that regex didn't consider U+1F600 to be an emoji. Commented May 7, 2016 at 16:33

7 Answers 7

10

Nowadays with ES2018 we can use Unicode Property Escapes in a regex match:

\p{…}

For simple emojis it would be:

"Be kind 😊, smile".match(/\p{Emoji}+/gu)

For emojis including glyphs glued with ZERO WIDTH JOINER like 👨‍👩‍👧‍👦 it can be:

"My Family 👨‍👩‍👧‍👦".match(/[\p{Emoji}\u200d]+/gu)
Sign up to request clarification or add additional context in comments.

1 Comment

The second regex breaks when multiple emojis are together. The following works: \p{Emoji}(\u200d\p{Emoji})*, because it only matches an emoji list concatenated with zero-width joiners, rather than just having them mixed in anywhere.
10

In 2021 the best way to do this is using the support for unicode in regular expressions that brought ES6.

It's as simple as using this regular expression:

/(\p{Emoji_Presentation}|\p{Extended_Pictographic})/gu

For example, this simple function will replace with blanks all of the emojis in a string:

function removeEmojis(str) {
    var emojiRE = /(\p{Emoji_Presentation}|\p{Extended_Pictographic})/gu;
    return str.replace(emojiRE, '');
}

removeEmojis('This ❌ h🅰s some 😱 emojis inside'); //'This  hs some  emojis inside'

It uses both the Emoji_Representationand the Extended_Pictographic properties so it doesn't count numbers, # and * in the search, as indicated by the Unicode standard.

We can use the abbreviations for these properties too, for a shorter regular expression:

/(\p{EPres}|\p{ExtPict})/gu

You can test drive it here:

function removeEmojis(str) {
    var emojiRE = /\p{EPres}|\p{ExtPict}/gu;
    return str.replace(emojiRE, '');
}

var testStr = 'This ❌ h🅰s some 😱 emojis inside';
console.log('Test string: ' + testStr);
console.log('Result: ' + removeEmojis(testStr));

1 Comment

This is good, but doesn't include emojis that use zero-width joiners. I ended up using: /(\p{EPres}|\p{ExtPict})(\u200d(\p{EPres}|\p{ExtPict}))*/gu
6

This post gives a very comprehensive regex for matching emojis with a very good explanation. He bases his regex on the one published by lodash library.

(?:[\u2700-\u27bf]|(?:\ud83c[\udde6-\uddff]){2}|[\ud800-\udbff][\udc00-\udfff]|[\u0023-\u0039]\ufe0f?\u20e3|\u3299|\u3297|\u303d|\u3030|\u24c2|\ud83c[\udd70-\udd71]|\ud83c[\udd7e-\udd7f]|\ud83c\udd8e|\ud83c[\udd91-\udd9a]|\ud83c[\udde6-\uddff]|[\ud83c[\ude01-\ude02]|\ud83c\ude1a|\ud83c\ude2f|[\ud83c[\ude32-\ude3a]|[\ud83c[\ude50-\ude51]|\u203c|\u2049|[\u25aa-\u25ab]|\u25b6|\u25c0|[\u25fb-\u25fe]|\u00a9|\u00ae|\u2122|\u2139|\ud83c\udc04|[\u2600-\u26FF]|\u2b05|\u2b06|\u2b07|\u2b1b|\u2b1c|\u2b50|\u2b55|\u231a|\u231b|\u2328|\u23cf|[\u23e9-\u23f3]|[\u23f8-\u23fa]|\ud83c\udccf|\u2934|\u2935|[\u2190-\u21ff])

https://medium.com/@thekevinscott/emojis-in-javascript-f693d0eb79fb

Comments

3

This determines if there is an emoji in the comment.

var unified_emoji_ranges = ['\ud83c[\udf00-\udfff]','\ud83d[\udc00-\ude4f]','\ud83d[\ude80-\udeff]'];

var reg = new RegExp(unified_emoji_ranges.join('|'), 'g');

var string = "It is a great day 😀.";

if (string.match(reg)) {
    alert("emoji found");
}

2 Comments

Please add an explanation :)
didn't work for me with ✅
1

You can match any valid Emoji character with \p{RGI_Emoji} Unicode property class (mind using the v flag):

let text = "It is a great day 😀."
text = text.replace(/\p{RGI_Emoji}/vg,(m) => {
      return '\\u'+m.split("").map(x => x.charCodeAt(0).toString(16)).join('\\u');
    })
console.log(text)

The result is It is a great day \ud83d\ude00..

Comments

0

The problem:

JavaScript defines strings as sequences of UTF-16 code units, not as sequences of characters or code points.

(quoted from source below)

You have to set up the RegExp with surrogate pairs:

I have found a good solution/exlanation here parsing emoji unicode in javascript that does without an extra library. And here's an online Surrogate Pair Calculator.

And in your case:

/\uD83D\uDE00/

regex101

Comments

0

In case anyone is still looking for a solution in JS to find emoji's in string.

Can use the following library (emoji-regex).

Here is an example converting all the emojis to Unicode hexadecimal numerical representation of character in a given string:

import emojiRegex  from 'emoji-regex/RGI_Emoji.js';
const emojiRegexPattern = emojiRegex();
const stringThatMightHaveEmojis = ...; //some string that can contain emoji's..

stringThatMightHaveEmojis.replace(emojiRegexPattern,(m, idx) => {
      return `${m.codePointAt(0).toString(16)}]`;
    })

There are more examples in the documentation of the library.

Plus a helpful article I stumbled upon explaining parsing emoji's, codePointAt can be found here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.