0

I have a file which contains only lines of the form

new file (7,59) ; lim = 0.876 ; dim = 0.000433344 ; r_d = 0.00003

is it possible to parse this output with bash into a form like

7,59,0.876,0.000433344,0.00003

to read it then into python?

8 Answers 8

3
sed 's/[^0-9,;.]//g;y/;/,/' YourFile
  1. Remove any non digit, and ,.;
  2. Change ; to ,
Sign up to request clarification or add additional context in comments.

2 Comments

Nice. Can you explain the y/;/,/ instead of s/;/,/g?
y mean transform like TR where s is substitute. y works per peer where s take the whole pattern. So s/12/34/ change any 12by 34 where y/12/34/ change any 1 to 3 AND any 2 to 4. y is always for all occurance and a bit faster.
1

You could try the below sed command if the contents are in the format you mentioned,

$ sed 's/^[^(]*(\([^)]*\))\s*;\s*\S*\s*=\s*\(\S\+\)\s*;\s*\S*\s*=\s*\(\S\+\)\s*;\s*\S*\s*=\s*\(\S\+\)$/\1,\2,\3,\4/' file
7,59,0.876,0.000433344,0.00003

3 Comments

Note your final version (quite similar to mine, by the way) does not need the /g in sed, because it is executed just once.
@fedorqui yep , you're right but your grep command will match incorrect numbers also like 99...9. So i posted this. I'll remove if you insist.
No, no need to remove the full approach... it was just that last part (piped sed to format output) was quite similar. Regarding my solution, yes, it is right that won't match 99..., etc, but should be fine for sample numbers are described by the OP.
1

Using sed:

sed 's/[^0-9,.][^0-9,.]*/ /g' input

for better formatting:

 sed 's/[^0-9,.][^0-9,.]*/ /g' input | column -to,

Gives:

7,59,0.876,0.000433344,0.00003

4 Comments

Post the output of your commands. I get 7,59 0 876 0 000433344 0 00003. So all . is gone and only one , left.
@Jotne, ah missed the dots, fixed now
But it's still not: 7,59,0.876,0.000433344,0.00003. This sed 's/[^0-9,.][^0-9,.]*/ /g;s/ /,/g' will help some, but gives an extra , at the beginning. Sorry to be picky :)
@Jotne, thanks again, modified the column command to add the commas.
0

You can grep for numbers:

$ grep -o '[0-9.]*' file
7
59
0.876
0.000433344
0.00003

With the -o flag we indicate grep just to print the matched results. This way, you have all your values but not the surrounding text.

If you want it comma-separated, pipe to tr to replace every new line with comma, and finally to sed to replace last comma with a new line:

$ grep -o '[0-9.]*' a | tr -s '\n' ',' | sed 's/,$/\n/'
7,59,0.876,0.000433344,0.00003

7 Comments

what if there is multiple lines in the file ?
@pomeh it will still work. Test it with dummy data :)
This does not work for me with multiple lines, I got: 7,59,0.876,0.000433344,0.00003,7,59,0.876,0.000433344,0.00003,[...]
@pomeh it is not quite clear what you mean. If you refer to the trailing comma, it is handled by the sed at the end.
Here's what I get: sebsauvage.net/paste/…
|
0

also gnu awk with FPAT:

awk -v FPAT="[0-9.]+" '{for(i=1;i<=NF;i++)printf "%s%s", $i,(i!=NF?",":"\n")}'

test:

$ echo "new file (7,59) ; lim = 0.876 ; dim = 0.000433344 ; r_d = 0.00003"|awk -v FPAT="[0-9.]+" '{for(i=1;i<=NF;i++)printf "%s%s", $i,(i!=NF?",":"\n")}'      
7,59,0.876,0.000433344,0.00003

The FPAT could be made better.

1 Comment

It may be worth to mention that you need gnu awk 4.00 or newer to use FPAT
0

Many solutions, only perl misisng ;)

perl -nlE '$,=",";say m/[\d.]+/g'
  • set the "list separator" to ,
  • match only numbers (returns a list)
  • print the list

or (ofc) @neronlevelu's solution

perl -plE 's/[^\d,;.]//g;y/;/,/'
  • remove anything what isn't an digit,;.
  • change ; to ',' (the y transliterates all occurrences of the characters found in the search list with the corresponding character in the replacement list ) - aka tr.

Comments

0

Using gnu awk:

cat file

new file (7,59) ; lim = 0.876 ; dim = 0.000433344 ; r_d = 0.00003
new file (7,59) ; lim = 0.876 ; dim = 0.000433344 ; r_d = 0.00003
new file (7,59) ; lim = 0.876 ; dim = 0.000433344 ; r_d = 0.00003
new file (7,59) ; lim = 0.876 ; dim = 0.000433344 ; r_d = 0.00003
new file (7,59) ; lim = 0.876 ; dim = 0.000433344 ; r_d = 0.00003
new file (7,59) ; lim = 0.876 ; dim = 0.000433344 ; r_d = 0.00003
new file (7,59) ; lim = 0.876 ; dim = 0.000433344 ; r_d = 0.00003

awk -F ' *[=()] *' -v RS=' ; |\n' -v OFS= -v ORS= 'NF{print $2, (NR%4==0)? "\n":","}' file
7,59,0.876,0.000433344,0.00003
7,59,0.876,0.000433344,0.00003
7,59,0.876,0.000433344,0.00003
7,59,0.876,0.000433344,0.00003
7,59,0.876,0.000433344,0.00003
7,59,0.876,0.000433344,0.00003
7,59,0.876,0.000433344,0.00003

5 Comments

does not work for me, output has a trailing comma in a new line. Also, this does not work with multiple lines.
@pomeh It does work with multiple lines also. Also what is your awk version? Are you using gnu awk?
Here is the output I get: sebsauvage.net/paste/… My awk version ig 3.1.7 running on CentOS 6.5
The correct link is: sebsauvage.net/paste/… (no expires timeout set)
@pomeh: Thanks for providing input data. Based on that I have edited my awk command. Check my answer now.
0
$ sed -r 's/[^0-9.]+/,/g;s/^,//' file
7,59,0.876,0.000433344,0.00003

$ awk -F'[^0-9.]+' -v OFS=',' '{$1=$1;sub(/^,/,"")} 1' file
7,59,0.876,0.000433344,0.00003

$ sed -r 's/[^0-9.,;]+//g;s/;/,/g' file
7,59,0.876,0.000433344,0.00003

$ awk -F';' -v OFS=',' '{$1=$1;gsub(/[^0-9.,]/,"")} 1' file
7,59,0.876,0.000433344,0.00003

Personally I prefer the last 2 as they don't add a comma and then remove it again, which always feels kinda cludgy and error-prone.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.