1

edit I've realized I made a mistake when explaining myself. Apologies for that.

Most of the artifacts come from this path:

D:\Folder1\Folder2\Folder3\Folder4\Folder5\

then breaks into Artifact folders and its sub-folders like this:

D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder\Artifact\Artifact-1.0\data.xxx
D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder\Artifact\Artifact-1.1\data.xxx
D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder\Artifact\Artifact-1.2\data.xxx

I would appreciate help with following thing:

I have this list (around 5k rows) of paths to different artifacts and they have different versions, to give you an example:

D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder\Artifact\Artifact-1.0\data.xxx
D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder\Artifact\Artifact-1.1\data.xxx
D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder\Artifact\Artifact-1.2\data.xxx
D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder2\Artifact\Artifact-1.1\data.xxx
D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder2\Artifact\Artifact-1.2\data.xxx
D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder2\Artifact\Artifact-1.3\data.xxx
D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder3\Artifact\Artifact-1.2\data.xxx
D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder3\Artifact\Artifact-1.3\data.xxx

And my goal to achieve is this:

D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder\Artifact\Artifact-1.0\data.xxx
D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder2\Artifact\Artifact-1.1\data.xxx
D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder3\Artifact\Artifact-1.2\data.xxx

Basically to scope it down to just 1 version.

I've tried using ^(.*)(\n\1)+$ and $1. but that obviously didn't work. So I was wondering if you have an idea how to approach this. Greatly appreciate help, thanks!

2
  • Does it mean you have duplicated line pairs with a different number at the end (followed with \)? Commented Apr 1, 2022 at 13:38
  • Yes, that is correct Wiktor. Commented Apr 1, 2022 at 13:39

3 Answers 3

2

You can use

Find what: ^(.*\.)(\d+)\\[^\\\n]+(\n\1\d+\\[^\\\n]+)+$
  Replace: $1$2\\

See the regex demo. Details:

  • ^ - start of a line (it is the default ^ behavior in Visual Studio Code)
  • (.*\.) - Group 1: any one or more chars other than line break chars as many as possible and then a .
  • (\d+) - Group 2:
  • \\ - a \ char
  • [^\\\n]+ - one or more chars other than \ and a line break
  • (\n\1\d+\\[^\\\n]+)+ - Group 3 capturing one or more sequences of a line break and then the value captured into Group 1, one or more digits, a \ char and then one or more chars other than \ and a line break
  • $ - end of a line.
Sign up to request clarification or add additional context in comments.

12 Comments

Wiktor, I've tested out code you provided however it seems to not work. It might be related to my fault and that is most of the content is on this path D:\Folder\Folder2\Folder3\Folder4\Folder5` then it breaks into subfolders and just then into artifacts so the path looks like this D:\Folder\Folder2\Folder3\Folder4\Folder5\Folder6\Artifact\Artifact-1.0` however it might have more subfolders before it leads to artifact. Hope I explained myself correctly now.
@crNh Sorry, that is not clear, you need to update the question.
just done that, sorry.
@crNh That's fine, see my update.
A small change to Wiktor's example seems to be what you want: see regex101.com/r/BKTY9u/1
|
1

Here is another attempt, see regex101 demo.

The basic idea is to isolate someText-\d?. in capture group 2.

Then look for $2 in following lines. What precedes $2 or follows $2 in those following lines can vary.

Find: ^(.*\\(?=.*\\))(.*-\d+\.)(.*\\?.*)(\n.*\2.*)*
Replace: $1$2$3

So here is the most interesting part: ^(.*\\(?=.*\\))(.*-\d+\.)

This will get your Artifact-1. or Artifact-17. or someText-2. into capture group 2. Because using a positive lookahead (?=.*\\) the following group 2 (.*-\d+\.) will be in the last directory only. And then (.*\\?.*) gathers the rest of that line into group 3.

Finally (\n.*\2.*)* checks to see if there is a backreference to group 2, \2, in any following lines. [Technically, that backreference could be anywhere in a line, even the beginning, that can be fixed if necessary - let me know if you need that for your data. See safer regex101 demo if 'someText-/d.' could appear anywhere and should be ignored if not last directory and use that find.]

3 Comments

This is perfect Mark, thank you! If you want, you can send over the other version with backreference just to see difference but this is it :)
You can see what I meant in regex101.com/r/nSoLFl/1. Where if you just had artifact-1.1\data.xxx in a line without the preceding path info for example or not in the last directory, and didn't want to do anything to such a line. Then the find regex is a little more complicated.
great, I will stick with the first option you gave me. Thanks once again :)
1

You can not use a single capture group for the whole line using ^(.*), as you want to repeat only the part before the last dot using a backreference and that will not work capturing the whole line.

Therefore you have to capture the digits in the first match in a separate capture group to keep it in the replacement.

If you want to match all following lines with the same text before the last dot, you can use a repeating group:

^\s*(.*\.)(\d+\\[^\\\r\n]*)(?:\r?\n\s*\1\d*\\[^\\\r\n]*)+

The pattern matches:

  • ^ Start of string
  • \s* Match optional whitespace chars
  • (.*\.) Capture group 1, match till the last dot
  • (\d+\\[^\\\r\n]*) Capture group 2, match 1+ digits, \ and optional chars other than \ or a newline
  • (?: Non capture group
    • \r?\n\s*\1 Match a newline and a backreference to group 1
    • \d+\\[^\\\r\n]* Same pattern as in the first part
  • )+ Close the non capture group and repeat 1+ times

See a regex demo.

In the replacement use the 2 capture groups $1$2

The replacement will look like

D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder\Artifact\Artifact-1.0\data.xxx
D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder2\Artifact\Artifact-1.1\data.xxx
D:\Folder1\Folder2\Folder3\Folder4\Folder5\ArtifactFolder3\Artifact\Artifact-1.2\data.xxx

8 Comments

sorry, wasn't on point with my explanation. Updated my question, please check it again.
@crNh Do you mean that there can be more chars other than `\` after it like this? regex101.com/r/OmMwGg/1
updated it once again, to make it crystal clear. But yes that's it.
@crNh Like this ^\s*(.*\.)(\d+)(\\[^\\\r\n]*)(?:\n\1\d+\\[^\\\r\n]*)+ regex101.com/r/7G0pqG/1
@crNh Should the \data.xxx also be the same for all parts? Or only the part before the last dot?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.