RegEx help to replace partial url (Notepad++)

Question

I have text like (similar) to this throughout my file:

<td>
[<a href="/abc123/handouts/files/directory1/somename.pdf" target="_blank">Slides</a> ]  [ [<a href="/abc123/handouts/files/directory2/somename2.pdf" target="_blank">Handout</a> ]</td>

<td>
[<a href="/abc123/handouts/files/directory3/somename343.pdf" target="_blank">Slides</a> ]  [ <a href="/abc123/handouts/files/directory5/somename2324.pdf" target="_blank">Handout</a> ]
</td>

Everything after the "/abc123/handouts/files/" text will be different (directory and .pdf name)

I cant seem to fully figure out how to replace JUST the "directory3/somename343.pdf" portion with say: "XXXXXXX"

my attempts have either produced nothing, or have removed the rest of the line after the first match?

my attempt:

Search For:

<a href="/abc123/handouts/files/.*."

Replace with:

<a href="/abc123/handouts/files/xxxxxxx"

leaves me with this:

[ <a href="/abc123/handouts/files/xxxxxxx">Handout</a> ]

completely removing the first line (link)?

What am I doing wrong? and more so, how is it done correctly?

Thanks!

What language are you using? Is sed ok?

helloV
– helloV

2014-12-23 18:58:55 +00:00
Commented Dec 23, 2014 at 18:58 — helloV
– helloV, Commented Dec 23, 2014 at 18:58

brandonscript · Accepted Answer · 2014-12-23 19:35:25Z

2

Your regular expression is greedy (the * without a ?) so it matches everything, even after the .pdf. To make it non-greedy:

<a href="\/abc123\/handouts\/files\/.*?"

Will match everything inside the quotes, but not including the final quote. Then replace with:

<a href="/abc123/handouts/files/xxxxxxx"

Here's regex101 for you to see: https://regex101.com/r/oY8pI8/2

edited Dec 23, 2014 at 19:35

answered Dec 23, 2014 at 19:00

brandonscript

73.7k35 gold badges178 silver badges240 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

whispers Over a year ago

answer accepted. Thanks direct and to the point. (good to know about the needed ? in my attempt!)

Anthony Roberts · Accepted Answer · 2014-12-23 19:19:20Z

0

Javascript version for string replacement.

var re = /"(\/abc123\/handouts\/files\/)((?:[a-zA-Z0-9]*\/)*[a-zA-Z]*.[A-ZA-z]{3,4})"/;
var str = '"/abc123/handouts/files/directory1/somename.pdf"';
var newstr = str.replace(re, '"$1XXXXX"');
alert(newstr);

In essence the above code is broken up into 3 parts. Initial grab

"(/abc123/handouts/files/)

Non capturing group to find further folders

(?:[a-zA-Z0-9]*\/)*

Specific document format

[a-zA-Z]*.[A-ZA-z]{3,4}

Noting that the final folder and document format are wrapped together within a group

((?:[a-zA-Z0-9]*\/)*[a-zA-Z]*.[A-ZA-z]{3,4})

Captures will thus be ordered as follows 0 - Entire match 1 - Initial folder match 2 - Trailing directory and path match

edited Dec 23, 2014 at 19:19

answered Dec 23, 2014 at 19:10

Anthony Roberts

1615 bronze badges

1 Comment

whispers Over a year ago

aarrrgh.. sorry guys. I should have known better to add Notepad++ in the title (sorry). I just discovered regex101.com today, but I didnt really get how it worked? (like how you replace things?).. @remus - I'll give your suggestion a try.. thanks!

Collectives™ on Stack Overflow

RegEx help to replace partial url (Notepad++)

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related