1

I have the line of text within a text file. The line looks something like this:

xxxx,xxxxx,xxxxxx,xxxxx,xxxx,NL-1111 xx,xxxx,xxx

The NL- is an identifier for the country so this could be anything. I would like to remove the NL- part from the line so it looks like this:

xxxx,xxxxx,xxxxxx,xxxxx,xxxx,1111 xx,xxxx,xxx

And write the file afterwards.

Thanks in advance.

2
  • did you try anything? Commented Jan 14, 2015 at 8:14
  • @ChrisMaes Yes, i played with sed and awk but i'm not sure what methods to use. I dont work with bash that often Commented Jan 14, 2015 at 8:15

4 Answers 4

2

Another solution close to sed's ones, but with perl:

perl -i -pe "s/(?<=,)[a-zA-Z]{2}-//g" file.txt

It uses look behind expression, so that you don't need to repeat the comma in the replacement part.

Sign up to request clarification or add additional context in comments.

2 Comments

This is a good solution, +1. It works on all platforms, unlike sed-in-place (sed -i); see my answer for details.
@bgoldst There is also some differences between platforms with perl -i: on some (on my Windows cmd for example), you need to specify a backup pattern after -i (no backup is not possible). With that, it works everywhere, but you get a backup file.
2

something like this using sed

sed -i 's/,[A-Z][A-Z]-\([0-9]\+,\)/,\1/i' file.txt 

,[A-Z][A-Z]-\([0-9]\+,\)search for comma letter, letter, -, digit(s), comma

,\1keep only the commas and the digits.

iignore case on the letters

thankyou to @chris for proof-reading.

1 Comment

better solution with regex. is the < necessary?
2

I think the simplest solution here is reading it from the file into a shell variable, then writing it back immediately and using the pattern substitution variation of parameter expansion:

line="$(<file)"; echo "${line/[a-zA-Z][a-zA-Z]-}" >|file;

I would warn you against solutions that use sed-in-place functionality. I've found that sed behavior differs on different platforms with respect to the -i option. On Mac you have to give an empty argument ('') to the -i option, while on Cygwin you must not have an empty argument following the -i. To get platform compatibility you'd have to test what platform you're on.

4 Comments

not sure this will work since there is 'XXX,XXX,...' on the same line before NL-
It works. The dash at the end of the pattern protects the rest of the string.
Won't work if you have a prefix with more than 2 characters: it deletes only the last two characters plus the dash. Instead the prefix should be untouched.
The prefix is only two characters. Probably ISO 3166-1 alpha-2.
1

sed might do the trick: remove the string ",NL-", "BE-" etc from anywhere in the file:

sed -i 's/,[A-Z][A-Z]-/,/' file.txt

3 Comments

Yes, This works if the NL- part would be static. The problem is that this could be anything. NL-,BE-,DE- ect.
You should not remove the comma I think.
thanks for the comments; I adapted it slightly so it won't remove the comma, and will remove other countries as well... The regex by @Jasen might be better...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.