0

I'm currently working on a bash script to automate a list of regex for a list of links to clean up the file. Currently i'm doing all manually on kate with find/replace, but having it as a script would be more comfortable. Since i'm fairly new to bash scripting, i ask you for help.

Example list of urls:

0: "/suburl0"
​
1: "/suburl1"
​
2: "/suburl2"
​
3: "/suburl3"
​
4: "/suburl4"

Currently script i have:

#!/bin/bash
awk '[^\x00-\x7F]+' $1 #there are non-ascii chars in the file, so clean it out
awk 'NF' $1 # remove non-character lines
awk '^[0-900]{0,3}: ' $1 #delete all those number infront of the link
awk '"' $1 # remove those quotation marks
awk '!seen[$0]++' $1 #remove duplicate lines
awk '{print "http://example.com/" $0}' $1 #prepend the full url to the suburl

The goal is to apply all those regexes to the file, so the file ends cleaned up

My guess is, that i'm not redirecting the output of awk correctly, but when i tried to pipe it into the file, the file was just empty lines.

10
  • Each awk invocation produces a modified output, but leaves the input file untouched. You have multiple solutions : 1) redirect the output of each awk invocation to a file, have the next invocation work on that file; 2) pipe the output of each awk into the following awk invocation and do not provide them a file input : they'll work on their standard input, populated by the previous one's output. Of course the first must still take the file as input, and the last's output can be redirected to a file; 3) use a single awk invocation that will do all the actions. Commented Nov 27, 2019 at 13:35
  • Note that most your awk commands aren't correct either. You might want to test your commands one at a time on your input file and test whether they produce the expected result Commented Nov 27, 2019 at 13:38
  • Could you please post sample of your Input and expected output in your question and let us know then, please make sure you are wrapping your samples/codes in CODE TAGS. Commented Nov 27, 2019 at 14:04
  • 1
    Your awk scripts don't do what the comments next to them suggest you think they do. Commented Nov 27, 2019 at 14:22
  • @Aaron when i do them seperately, an error occurs awk '{print [^\x00-\x7F]+/}' testfile ^ backslash not last character on line the syntax of the regex should be correct since it's working in kate without a problem RavinderSingh13 as i mentioned the input are the lines in the file above for example: 0: "/suburl0" ​ 1: "/suburl1" output should be: example.com/suburl0example.com/suburl1 Arkku as i mentioned i'm fairly new to shell scripting. Doing those regexes manually in kate works Commented Nov 27, 2019 at 14:55

1 Answer 1

1

A more-or-less translation of what you wanted, without restricting to awk:

cat $1 \
        | tr -cd '[:print:][:space:]' \
        | grep . \
        | sed -r 's/^[0-9]{1,3}: //' \
        | tr -d '"' \
        | sort -u \
        | awk '{print "http://example.com" $0}'

Note that sort will change the order, I am assuming the order doesn't matter.

Also note that sed -r is GNU.

A slightly simplified and more portable version:

cat $1 \
        | tr -cd '[:graph:]\n' \
        | grep . \
        | tr -d '"' \
        | sort -u \
        | sed 's,^[0-9]*:,http://example.com,'

Output:

http://example.com/suburl0
http://example.com/suburl1
http://example.com/suburl2
http://example.com/suburl3
http://example.com/suburl4
Sign up to request clarification or add additional context in comments.

1 Comment

"sed -r is GNU" I suggest using sed -E as a replacement, it works both with modern GNU sed and BSD sed, plus it's consistent with grep's flags. It won't work with older GNU sed versions where you want -r instead and it's not POSIX-defined either, but on somewhat modern systems you have better chance it works without having to know which sed you're coding for

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.