1

I have text like (similar) to this throughout my file:

<td>
[<a href="/abc123/handouts/files/directory1/somename.pdf" target="_blank">Slides</a> ]  [ [<a href="/abc123/handouts/files/directory2/somename2.pdf" target="_blank">Handout</a> ]</td>

<td>
[<a href="/abc123/handouts/files/directory3/somename343.pdf" target="_blank">Slides</a> ]  [ <a href="/abc123/handouts/files/directory5/somename2324.pdf" target="_blank">Handout</a> ]
</td>

Everything after the "/abc123/handouts/files/" text will be different (directory and .pdf name)

I cant seem to fully figure out how to replace JUST the "directory3/somename343.pdf" portion with say: "XXXXXXX"

my attempts have either produced nothing, or have removed the rest of the line after the first match?

my attempt:

Search For:

<a href="/abc123/handouts/files/.*."

Replace with:

<a href="/abc123/handouts/files/xxxxxxx"

leaves me with this:

[ <a href="/abc123/handouts/files/xxxxxxx">Handout</a> ]

completely removing the first line (link)?

What am I doing wrong? and more so, how is it done correctly?

Thanks!

1
  • What language are you using? Is sed ok? Commented Dec 23, 2014 at 18:58

2 Answers 2

2

Your regular expression is greedy (the * without a ?) so it matches everything, even after the .pdf. To make it non-greedy:

<a href="\/abc123\/handouts\/files\/.*?"

Will match everything inside the quotes, but not including the final quote. Then replace with:

<a href="/abc123/handouts/files/xxxxxxx"

Here's regex101 for you to see: https://regex101.com/r/oY8pI8/2

Sign up to request clarification or add additional context in comments.

1 Comment

answer accepted. Thanks direct and to the point. (good to know about the needed ? in my attempt!)
0

Javascript version for string replacement.

var re = /"(\/abc123\/handouts\/files\/)((?:[a-zA-Z0-9]*\/)*[a-zA-Z]*.[A-ZA-z]{3,4})"/;
var str = '"/abc123/handouts/files/directory1/somename.pdf"';
var newstr = str.replace(re, '"$1XXXXX"');
alert(newstr);

In essence the above code is broken up into 3 parts. Initial grab

"(/abc123/handouts/files/)

Non capturing group to find further folders

(?:[a-zA-Z0-9]*\/)*

Specific document format

[a-zA-Z]*.[A-ZA-z]{3,4}

Noting that the final folder and document format are wrapped together within a group

((?:[a-zA-Z0-9]*\/)*[a-zA-Z]*.[A-ZA-z]{3,4})

Captures will thus be ordered as follows 0 - Entire match 1 - Initial folder match 2 - Trailing directory and path match

1 Comment

aarrrgh.. sorry guys. I should have known better to add Notepad++ in the title (sorry). I just discovered regex101.com today, but I didnt really get how it worked? (like how you replace things?).. @remus - I'll give your suggestion a try.. thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.