0

I have a .csv file that I need to add regex matches in each line as new columns after the original columns, here is a part of the .csv file:

"Event";"User";"Description"   
"stock_change";"[email protected]";"Change Product Teddy-Bear (Shop ID: AR832H0823)"
"stock_update";"[email protected]";"Update Product 30142_Pen (Shop ID: GI8759)"

Here is the two Regex Patterns I want to add their extracted results from each row as new columns (one column for each)

(?<=Product\s)\w.*?(?=\s*\(Shop)

(?<=Shop ID:\s)\w.*?(?=\))

The Result on the data should be Like this (Header Row is not important):

"stock_change";"[email protected]";"Change Product Teddy-Bear (Shop ID: AR832H0823)";"Teddy-Bear";"AR832H0823"  
"stock_update";"[email protected]";"Update Product 30142_Pen (Shop ID: GI8759)";"30142_Pen";"GI8759"

Sorry I'm very basic in Batch Scripting, thanks in advance

3 Answers 3

1

Windows batch does not have a native regex find/replace utility. The only regex utility is FINDSTR, and that is extremely limited and non-standard, and it can only print out entire lines that match the search - it cannot print out just the matching portion.

You could use PowerShell.

But I would use JREPL.BAT - a purely script based utility (hybrid JScript/batch) that works on any Windows machine from XP onward. It uses ECMA regular expressions, so no look-behind, but it has plenty of power to do the task.

jrepl "Product\s(\S+?)\s*\(Shop ID:\s(.*?)\)\q$" "$&;\q$1\q;\q$2\q" /a /x /f test.csv /o -

The /a switch discards unchanged lines, which effectively removes the header line. The /o - option overwrites the original file with the output. The /x switch enables extended escape sequences, thus enabling \q for ".

Use call jrepl if you put the command in a batch script.

Full documentation is available from the command line via jrepl /?, or jrepl /?? for paged output.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks dbenham, I was surprised when I saw that JREPL tool was developed by you :) , wondering if I can use the same tool for this issue too? stackoverflow.com/questions/35828741/…
0

You can do it with this GNU sed command:

sed -r 's/^.*Product (.+) \(Shop ID: (.+)\)"$/&;\"\1\";\"\2\"/g' shop.csv
  • it captures the parts between Product, (Shop ID: and )" into \1 and \2
  • the replacement uses & (the whole line) and appends a string made up of \1 and \2

1 Comment

Worth noting that this this an external program that he's going to have to download.
0

This problem may be solved in a very simple way without a regex with this Batch file:

@echo off

(for /F "skip=1 tokens=1-3 delims=;" %%a in (input.csv) do (
   for /F "tokens=3,6 delims=() " %%d in (%%c) do (
      echo %%a;%%b;%%c;"%%d";"%%e"
   )
)) > output.txt
move /Y output.csv input.csv

Result:

"stock_change";"[email protected]";"Change Product Teddy-Bear (Shop ID: AR832H0823)";"Teddy-Bear";"AR832H0823"
"stock_update";"[email protected]";"Update Product 30142_Pen (Shop ID: GI8759)";"30142_Pen";"GI8759"

However, if there are lines that have not the format of the example data (that could be correctly processed with a regex, but not with this code), then an adjustment in this code may be required. Note that depending on the differences in the data, the problem may not be solved via a pure Batch file.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.