Javascript extract strings that match regex from text

Question

I'm trying to get a set of strings from a paragraph that match the format of 4chan's quotes: >>1111111 where it starts with >> followed by 7 digits.

>>1111000
>>1111001
Yes, I agree with those sentiments.

Both >>1111000 and >>1111001 would be extracted from the text above which I would then split into the digits after.

Do you already have any written code? And how will you treat the coincidences at concurrences? Probably you will have to use an indexed collection — xakal
– xakal, Commented Nov 23, 2020 at 9:37
Repetitions of quotes will be checked against before storing in the database as one quotation is enough from a post. — SS-Salt
– SS-Salt, Commented Nov 23, 2020 at 9:52
I am aware of the matches() function in JS, but have no knowledge on the most appropriate regex for the pattern. — SS-Salt
– SS-Salt, Commented Nov 23, 2020 at 9:53

Brian Lee · Accepted Answer · 2020-11-23 09:43:32Z

1

You can use the following which will match lines starting with 2 > characters followed by 7 digits:

const regex =/^[>]{2}[\d]{7}$/gm;

const text = `>>1234567
>>6548789
foo barr`;

const matches = text.match(regex);

console.log(matches);

answered Nov 23, 2020 at 9:43

Brian Lee

18.3k3 gold badges45 silver badges55 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

SS-Salt Over a year ago

Only seems to work on strings in a specific format. When text is refactored as '>>1234567 >>6548789 foo barr' (same line, separated by whitespace), matches return null.

Brian Lee Over a year ago

If you change the regex to /[>]{2}[\d]{7}/gm it will not explicitly look for lines starting with >> and ending with 7 digits, so it should work if it's all on one line.

spyshiv · Accepted Answer · 2020-11-23 09:45:07Z

1

You can use this regex

/[>]{2}[0-1]{7}/

edited Nov 23, 2020 at 9:45

answered Nov 23, 2020 at 9:35

spyshiv

1781 silver badge8 bronze badges

1 Comment

SS-Salt Over a year ago

How do I make it so that the regex accepts those patterns when in the same line as words but separated by whitespaces?

Flo H · Accepted Answer · 2020-11-23 11:07:46Z

There appears to be some answers, but since it's a topic I would like to understand better here are my two cents. In the past this answer has helped me a lot and online regex sites are also great, such as this one

 

      <!DOCTYPE html>
    <html lang="en">
    <head>
        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width, initial-scale=1.0">
        <meta http-equiv="X-UA-Compatible" content="ie=edge">
        <title>Parse Test</title>
    
    </head>
    <body>
        <div >
            <text id="ToParse">
    
                >>1111000 <br>
                >>1111001 <br>
                Yes, I agree with those sentiments.
    
            </text>
        </div>
    
    <script>
    
     try {
        var body = document.getElementById('ToParse').innerHTML;
        console.log(body);
    
    
    } catch (err) {
        console.log('empty let body,' + " " + err);
    }
    
    
    function parseBody () {
    
        // from HTML
    // function parseBody (body) {
        // const regex = /(&gt;&gt;)([0-9]*)\w+/gm;
    
        // from JS
        const regex = /(>>)([0-9]*)\w+/gm;
        const body = `            >>1111000 <br>
                    >>1111001 <br>
                    Yes, I agree with those sentiments.`;
    
        let m;
    
        while ((m = regex.exec(body)) !== null) {
            // This is necessary to avoid infinite loops with zero-width matches
            if (m.index === regex.lastIndex) {
                regex.lastIndex++;
            }
    
            // The result can be accessed through the `m`-variable.
            m.forEach((match, groupIndex) => {
                console.log(`Found match, group ${groupIndex}: ${match}`);
            });
        }
    
    };
    
    parseBody(body);
    
    
    
    
    // </script>
    
    
    
    
    </body>
    </html>

benhatsor · Accepted Answer · 2020-11-23 11:14:23Z

0

Like @spyshiv said, you can match the string like so:

var string = '>>1111000';
var matches = string.match(/[>]{2}[0-1]{7}/);
console.log(matches);

answered Nov 23, 2020 at 11:14

benhatsor

2,0539 silver badges22 bronze badges

Collectives™ on Stack Overflow

Javascript extract strings that match regex from text

4 Answers 4

2 Comments

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related