bash regex string replace in a single column within a csv

Question

I have a csv where I need to trim some text from a column probably using a regex. This is an example:

test.csv:

Number	Text	Group
1	the id is G11 (sometimes)	A
2	the id is G12	B
3	the id is G15	C
4	the id is Z13 (always)	D

I want to only grab the code value like (G11, G12, G15, Z13) from the Text column and replace the value so that I would get:

Number	Text	Group
1	G11	A
2	G12	B
3	G15	C
4	Z13	D

I have tried using forms of grep, awk, and sed but haven't got anything to work. I'm unfamiliar with bash, but essentially I want a new csv that looks like table 2.

These are my attempts, unsure how to only work on the Test column....

sed 's/([A-Z]\d{2}).*/([A-Z]\d{2})' test.csv > test2.csv

sseLtaH · Accepted Answer · 2022-08-02 00:33:48Z

2

Using sed

sed -E '1!s/([^a-z]*)[^A-Z]*([^ ]* )[^|]*/\1\2/'

Number	Text	Group
1	G11	A
2	G12	B
3	G15	C
4	Z13	D

1! - Do not match line 1

([^a-z]*) - Retain everything within the parenthesis up until the next occurrence of lower case letters which will later be returned with back reference \1.

[^A-Z]* - Exclude everything up till the next occurrence of capital letters as it is not captured within parenthesis.

([^ ]* ) - Retain everything up to the next occurrence of a space and including a space which will later be returned with back reference \2.

[^|]* - Exclude everything up till the next occurrence of a pipe symbol

Anything not included in the find will be returned with the replacement.

edited Aug 2, 2022 at 0:33

answered Aug 2, 2022 at 0:09

sseLtaH

11.3k5 gold badges17 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

lmmjrf Over a year ago

What is the 1! do at the start? Also I need to get rid of the bracket (sometimes) and (always) in my final output

sseLtaH Over a year ago

@lmmjrf Do not match line 1, so no modification is carried out on that line

sseLtaH Over a year ago

@lmmjrf Is the bracket with the contents literally in your data? You can try this sed -E '1!s/([^a-z]*)[^A-Z]*([^ ]* )[^|]*/\1\2/' input_file

Collectives™ on Stack Overflow

bash regex string replace in a single column within a csv

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related