0

I have made a shell script that is supposed to extract data with certain field names and put them in a CSV file.

An example input file may have the following lines:

                  user_name: [email protected]
                      EMAIL: [email protected]
                 FIRST_NAME: jonathan
                  LAST_NAME: doestein
              CREATION_DATE: 2013-08-01 01:08:52
        REGISTRATION_STATUS: Y
                     VENDOR: vendorname

This will repeat itself 'n' times.

This is an excerpt of the script I wrote so far:

#!/bin/sh

echo "Please enter input file name."
read input_variable
echo "You entered: $input_variable"

echo "Please enter a name of the new output file."
read output_file
touch $output_file
echo "The output file name is going to be $output_file"

echo "Extracting files..."  ;

awk '$1 ~ /^(user_name:|EMAIL:|FIRST_NAME:|LAST_NAME:|CREATION_DATE:|REGISTRATION_STATUS:)$/{printf "%s,",$2} $1 ~ /REGISTRATION_STATUS:/{print $2}' $input_variable >> $output_file.ib ;

However, although data prints to my output file, which must be a .csv extension for a GUI to view, when I open the file in a GUI such as OpenOffice Calc, there are many rows concatenated in the same row, while other lines appear to start a new line like they are supposed to.

For example, the one line might look like the following:

[email protected],noreally51,noway,username,username...x40 or so

usnername,username,username.... what this means is that it just lists about 40-50 usernames all in one row, then goes to the next line finally and prints information.

I would like to add column names to the output file:

VENDOR,user_name,FIRST_NAME,LAST_NAME,CREATION_DATE,REGISTRATION_STATUS

I can't figure out how to do that.

Thank you for your time and all of your support!

I edited my script as follows:

#!/bin/sh

echo "Please enter input file name."
read input_variable
echo "You entered: $input_variable"

echo "Please enter a name of the new output file."
touch output_file
read $output_file
echo "The output file name is going to be $output_file"

echo "Processing data extraction..." ;

awk -F": " n=25 -v 'NR<=n {h[NR-1]=$1} {a[NR%n-1]=$2} $1~/VENDOR/ && !hp{for(k=0;k<n;k++) printf "%s ", h[k] $input_variable && print "";hp=1} $1~/VENDOR/{for(k=0;k<n;k++) printf "%s ", a[k] && print ""}' data | column -t $input_variable ;

echo "Done."

This at least prints data to the $output_file. However, the data in the $output_file looks like:

??ࡱ?;?? ????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????Root Entry????????????????????????????????????????????????????????????????

@karakfa

This is the contents of the script I have. I noticed that more than the first line of your script in your answer changed. So, I amended my script to the following:

#!/bin/sh

echo "Please enter input file name."
read input_variable
echo "You entered: $input_variable"

echo "Please enter a name of the new output file."
touch output_file
read $output_file
echo "The output file name is going to be ${output_file}"

echo "Processing data extraction..." ;

cat $input_variable | awk -F": " -v OFS="," -v n=25
  'NR<=n{sub(/^ */,"",$1);h[NR-1]=$1}
        {a[(NR-1)%n]=$2}
$1~/VENDOR/ && !hp{line=h[0];
                  for(k=1;k<n;k++) line=line OFS h[k];
                  print line;hp=1
                 }
      $1~/VENDOR/{line=a[0];
                  for(k=1;k<n;k++) line=line OFS a[k];
                  print line}' $input_variable ;
echo "Done."

The output was:

Please enter input file name.
inputfile.txt
You entered: allgmail.com_accounts.txt
Please enter a name of the new output file.
outputfile.csv
The output file name is going to be 
Processing data extraction...
awk: no program given

./scriptname: line 23: NR<=n{sub(/^ */,"",$1);h[NR-1]=$1} 
          {a[(NR-1)%n]=$2} 
  $1~/VENDOR/ && !hp{line=h[0]; 
                    for(k=1;k<n;k++) line=line OFS h[k];
                    print line;hp=1
                   }  
        $1~/VENDOR/{line=a[0];
                    for(k=1;k<n;k++) line=line OFS a[k];
                    print line}: No such file or directory
Done.

I did not find any articles about 'awk: no program given' error. Do you know what I am doing incorrectly?

I noticed that where it says 'line 23', so line 23 is the following:

 print line}' $input_variable ;

Then, I noticed that it also says the following on the last line:

print line}: No such file or directory

This occurs with or without 'cat $input_variable |' before awk. Normally, awk works fine on my OS. It is a Mac 10.11.1 (15B42). Is #!/bin/sh incorrect?

I look forward to your thoughts. Thank you!

3
  • the problem is in file input (encoding or binary file), is it a text file ? Commented Dec 4, 2015 at 22:09
  • It is a text file, CSV. Commented Dec 5, 2015 at 22:51
  • 1
    your output is weird, try first cat "input". then try awk command without redirection >>. Commented Dec 6, 2015 at 8:41

2 Answers 2

2

why dont you use echo before awk ?

echo ENDOR,user_name,FIRST_NAME,LAST_NAME,CREATION_DATE,REGISTRATION_STATUS > file
Sign up to request clarification or add additional context in comments.

4 Comments

Thank you. Any advise on the rest of the issue, or do you have any questions about it?
all records are on the same line ?
The records seem unorganized. That one line, and many others, have the same issue, while other lines appear to do as they are supposed to do, which is to just have the username, vendor, creation date, first name, last name, and email address. And, there are some of those shorter lines that still have information that makes no sense.
can't say , i paste several record based on your example and your extraction is perfect.
2

If all your fields are always present, you can try the following awk script. The number of fields is set as a variable (7 in this case) and "VENDOR" is used as last field of the record indicator.

UPDATE: didn't notice the csv output

$ awk -F": " -v OFS="," -v n=7 
    'NR<=n{sub(/^ */,"",$1);h[NR-1]=$1} 
          {a[(NR-1)%n]=$2} 
 $1~/VENDOR/ && !hp{line=h[0]; 
                    for(k=1;k<n;k++) line=line OFS h[k];
                    print line;hp=1
                   }  
        $1~/VENDOR/{line=a[0];
                    for(k=1;k<n;k++) line=line OFS a[k];
                    print line}' inputfilename


user_name,EMAIL,FIRST_NAME,LAST_NAME,CREATION_DATE,REGISTRATION_STATUS,VENDOR
[email protected],[email protected],jonathan,doestein,2013-08-01 01:08:52,Y,vendorname

Building the header during the first n lines, when done print header once and each record when the final field is seen.

to move the last field to first you can change the code as

line=h[n-1]; 
for(k=1;k<n-1;k++) line=line OFS h[k];

for both occurrences (change the array name from "h" to "a" in the second instance).

4 Comments

I get 'awk: can't open file data' and I added 'cat $input_variable |' in front of 'awk -F'... but have the same error. Also, how to I get this to print to a csv file or the $output_file, should my last line be '> $output_file' ?
data is the name of my file. You have to substitute with your file or pipe in the data to awk script (remove "data").
I found that the 'n=7' does not work because I printed only a certain amount of data for stackoverflow because that is what I thought was relevant. I added my edited script to my original question.
please see the updated script. Output field separator is defined now as comma. 7 should be working for the given data set.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.