1

Context

I'm new to regex (still practicing) and I'm trying to extract script src or link href values from tags (for education purposes).

Given following html

<!-- hello -->
    <script src="1.js"></script>
    <script src="2.js"></script>
    <link rel="stylesheet" href="3.css"/>
<!-- world -->

I'd want to get

array of: 1.js, 2.js and 3.css

I've tried

This is the regex I've did so far, but meh.

/(?:<!-- hello -->\s*?)([\s\S]*?)(?:\s?<!-- world -->)/gmi

Of course, I have to replace [\s\S]*? with something better but I've tried a lot of combinations and none of them worked.

Regards.

Update

Only scripts between <!-- xxx --><!-- xxx--> tags should be matched.

Following should not match:

<!-- foo-->
    <script src="4.js"></script>
    <script src="5.js"></script>
    <link rel="stylesheet" href="6.css"/>
<!-- bar-->
6
  • 3
    You have completed Step 1. Now, proceed to Step 2, just use the /(?:src|href)="([^"]*)"/g and grab Group 1 values. Commented Aug 28, 2017 at 7:47
  • without completing first step, it's not possible to grab directly other ones? Commented Aug 28, 2017 at 7:56
  • In JS, no....... Commented Aug 28, 2017 at 7:56
  • Oh, understood! Please post it as an answer and I'll accept that. Commented Aug 28, 2017 at 7:58
  • I suggest that you write it yourself, this way you will learn better. Just use my comment for guidance. Commented Aug 28, 2017 at 7:59

3 Answers 3

1

use the regex

<!-- hello -->([\S\s]+)<!-- world -->

to extract the valid string in the first group see the regex demo

then you can use the regex

<(?:script src.*|link.*href)="(\w+\.\w+)

all matches would be in group 1

see the regex demo

Sign up to request clarification or add additional context in comments.

1 Comment

you can extract text between <!-- xxx --> and then apply this regex on the resulting string
1

So, as Wiktor Stribiżew mentioned, it should be done in steps, because it's not possible in JS to directly get the result.

First you'll have to grab elements between <!-- xxx --><!-- xxx --> tags and then use global search on the result.

Thanks Wiktor Stribiżew!

2 Comments

If it's the correct solution, mark it as the answer :)
can't yet, at least for two days :-(
0

If you have an element like

<name attribute=value attribute="value" attribute='value'>

this regex could be used to find successively each attribute name and value

(\S+)=["']?((?:.(?!["']?\s+(?:\S+)=|[>"']))+.)["']?

Applied on:

<a href=test.html class=xyz>
<a href="test.html" class="xyz">
<a href='test.html' class="xyz">

it would yield:

'href' => 'test.html'
'class' => 'xyz'

1 Comment

You can try something like regexr.com Or, regex101.com As per your need :-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.