Apply bash script with awk-commands to file

Question

I'm currently working on a bash script to automate a list of regex for a list of links to clean up the file. Currently i'm doing all manually on kate with find/replace, but having it as a script would be more comfortable. Since i'm fairly new to bash scripting, i ask you for help.

Example list of urls:

0: "/suburl0"

1: "/suburl1"

2: "/suburl2"

3: "/suburl3"

4: "/suburl4"

Currently script i have:

#!/bin/bash
awk '[^\x00-\x7F]+' $1 #there are non-ascii chars in the file, so clean it out
awk 'NF' $1 # remove non-character lines
awk '^[0-900]{0,3}: ' $1 #delete all those number infront of the link
awk '"' $1 # remove those quotation marks
awk '!seen[$0]++' $1 #remove duplicate lines
awk '{print "http://example.com/" $0}' $1 #prepend the full url to the suburl

The goal is to apply all those regexes to the file, so the file ends cleaned up

My guess is, that i'm not redirecting the output of awk correctly, but when i tried to pipe it into the file, the file was just empty lines.

Each awk invocation produces a modified output, but leaves the input file untouched. You have multiple solutions : 1) redirect the output of each awk invocation to a file, have the next invocation work on that file; 2) pipe the output of each awk into the following awk invocation and do not provide them a file input : they'll work on their standard input, populated by the previous one's output. Of course the first must still take the file as input, and the last's output can be redirected to a file; 3) use a single awk invocation that will do all the actions. — Aaron
– Aaron, Commented Nov 27, 2019 at 13:35
Note that most your awk commands aren't correct either. You might want to test your commands one at a time on your input file and test whether they produce the expected result — Aaron
– Aaron, Commented Nov 27, 2019 at 13:38
Could you please post sample of your Input and expected output in your question and let us know then, please make sure you are wrapping your samples/codes in CODE TAGS. — RavinderSingh13
– RavinderSingh13, Commented Nov 27, 2019 at 14:04
Your awk scripts don't do what the comments next to them suggest you think they do. — Arkku
– Arkku, Commented Nov 27, 2019 at 14:22
@Aaron when i do them seperately, an error occurs awk '{print [^\x00-\x7F]+/}' testfile ^ backslash not last character on line the syntax of the regex should be correct since it's working in kate without a problem RavinderSingh13 as i mentioned the input are the lines in the file above for example: 0: "/suburl0" 1: "/suburl1" output should be: example.com/suburl0 example.com/suburl1 Arkku as i mentioned i'm fairly new to shell scripting. Doing those regexes manually in kate works — Lukas S
– Lukas S, Commented Nov 27, 2019 at 14:55

root · Accepted Answer · 2019-11-29 03:09:41Z

1

A more-or-less translation of what you wanted, without restricting to awk:

cat $1 \
        | tr -cd '[:print:][:space:]' \
        | grep . \
        | sed -r 's/^[0-9]{1,3}: //' \
        | tr -d '"' \
        | sort -u \
        | awk '{print "http://example.com" $0}'

Note that sort will change the order, I am assuming the order doesn't matter.

Also note that sed -r is GNU.

A slightly simplified and more portable version:

cat $1 \
        | tr -cd '[:graph:]\n' \
        | grep . \
        | tr -d '"' \
        | sort -u \
        | sed 's,^[0-9]*:,http://example.com,'

Output:

http://example.com/suburl0
http://example.com/suburl1
http://example.com/suburl2
http://example.com/suburl3
http://example.com/suburl4

answered Nov 29, 2019 at 3:09

root

6,1731 gold badge12 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Aaron Over a year ago

"sed -r is GNU" I suggest using sed -E as a replacement, it works both with modern GNU sed and BSD sed, plus it's consistent with grep's flags. It won't work with older GNU sed versions where you want -r instead and it's not POSIX-defined either, but on somewhat modern systems you have better chance it works without having to know which sed you're coding for

Collectives™ on Stack Overflow

Apply bash script with awk-commands to file

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related