0

I'm trying to filter a massive CSV export containing phone records. The file contains 97 columns and the rows depend on how many calls have been made. Current there are 838.239 rows. I have trouble with excel loading this much data so I've turned to linux.

Out of the 97 columns, I'm only interested in columns 13, 28 and 53. I have managed to extract the data using -

cut -d, -f 1-12,14-27,29-52,54-97 --complement cdr_export.csv >> filtered_CDR.CSV

I'm stuck on how to filter the rows.

Telephone       ic_hg       og_hg
111111111111    server03    slo.out
222222222222    HG_1        server02
333333333333    HG_1        server03
444444444444    Trunk       server02
555555555555    Trunk       server03
666666666666    server614   slo.out
777777777777    HG_1        server563
888888888888    server563   slo.out
999999999999    HG_2        server563

The only data I need is -

  • Any telephone number
  • ic_hg = HG_1 or HG_2 & og_hg = slo.out

Example -

222222222222 HG_1 slo.out

222222222222 HG_2 slo.out

any other combinations can be removed.

1
  • Your desired output is NOT csv, btw, but apparently fixed-width. Commented Sep 13, 2022 at 18:04

2 Answers 2

1

You can easily do that with awk:

awk '($2 == "HG_1" || $2 == "HG2") || $3 == "slo.out" { print $1 }' filtered_CDR.CSV
111111111111
222222222222
333333333333
666666666666
777777777777
888888888888
Sign up to request clarification or add additional context in comments.

5 Comments

This doesn't quite work as expected. if the data is saved into a text file it will sort the data without issue, If it's in a excel csv and the data is in separate columns it returns no results. I also didn't explain myself very well. The og_hg must always = slo.out
You can pipe your previous command into awk to avoid saving into a text file: cut … | awk '(…) { print $1 }'. Actually, you can avoid using cut altogether and just process with awk the columns you need: just replace $1 with $13 to reference not the 1st but the 13th column etc. If og_hg must always be equal to slo.out you just need to turn the or (||) preceding such condition into an and (&&).
Hey etuardu, sorry i'm really stuck with turning the || into an AND condition awk '($2 == "HG_1" || $2 == "HG2") && $3 == "slo.out" { print $1 }' filtered_CDR.CSV This provided no results
Your command seems to be correct. Please double check if such condition is satisfied for any line in your input data
By using awk '{ print $1 }' allowed me to see my data was separated by commas. Starting the line with awk -F',' allows this to work perfectly. sorry my fault and thanks for the help
1

Just pipe it through a grep.

cut -d, -f 1-12,14-27,29-52,54-97 --complement cdr_export.csv |
  grep -E 'HG_[12][[:space:]]*slo[.]out' >> filtered_CDR.CSV

or maybe an awk

$: awk -F, '$53 ~ /^slo.out$/ && $28 ~ /^HG_[12]$/{ 
     print print $13"\t"$28"\t"$53}' cdr_export.csv >> filtered_CDR.CSV

I used tabs in the awk output. YMMV.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.