1

I have the following text line :

"Field1":"Data1","Field2":"Data2","Field3":"Data3","Field4":"Data4" ...

And I need to generate the following INSERT statement :

INSERT INTO data (Field1,Field2,Field3,Field4 ... ) VALUES(Data1,Data2,Data3,Data4 ... );

Any ideas on how to do it in BASH ?

Thanks in advance!

2
  • The proper Right Answer is not to use bash for this at all. See item 7 in mywiki.wooledge.org/BashWeaknesses; the Right Thing is to use a language with database bindings that natively support bound parameters. Anything else opens you up to xkcd.com/327 bugs (by making your code's correctness dependent on the individual database's character set conversion, quoting, and similar logic). Commented Jan 2, 2014 at 15:27
  • If you want automatic type recognition from MySQL when you use a constructed INSERT statement, you have to convert inputs into type forms that MySQL understands ( see 8 in the list by @CharlesDuffy ). For a "generic" csv->Postgres script, I ended up with flags to denote column types as boolean, dates, strings, and numerics. For example, you might want to treat "0" as a boolean sometimes ( outputting false ) or leave it as an integer based on the table column's type. That's not something a script can "magically" know based on the raw inputs. Commented Jan 2, 2014 at 15:42

4 Answers 4

2
$ cat file
"Field1":"Data1","Field2":"Data2","Field3":"Data3","Field4":"Data4"
$
$ cat tst.awk
BEGIN { FS="^\"|\"[:,]\"|\"$" }
{
    fields = values = ""
    for (i=2; i<NF; i+=2) {
        fields = fields (i>2 ? "," : "") $i
        values = values (i>2 ? "," : "") $(i+1)
    }
    printf "INSERT INTO data (%s) VALUES(%s);\n", fields, values
}
$
$ awk -f tst.awk file
INSERT INTO data (Field1,Field2,Field3,Field4) VALUES(Data1,Data2,Data3,Data4);
Sign up to request clarification or add additional context in comments.

Comments

1

You could try this awk command:

$ cat file
"Field1":"Data1","Field2":"Data2","Field3":"Data3","Field4":"Data4"
$ awk -F'[:"]+' '{s=(NR>1?",":""); fields=fields s $2;data=data s $3}END{printf "INSTERT INTO data(%s) VALUES(%s)\n", fields,data}' RS="," file
INSTERT INTO data(Field1,Field2,Field3,Field4) VALUES(Data1,Data2,Data3,Data4)

Or a bit more readable

#!/usr/bin/awk -f
BEGIN {
    FS ="[:\"]+";
    RS=",";
}
{
    s=(NR>1?",":"")
    fields=fields s $2
    data=data s $3
}
END{
    printf "INSTERT INTO data(%s) VALUES(%s)\n", fields,data
}

Save it in a file named script.awk, and run it like:

./script.awk file

Comments

1

Since you specifically asked for a BASH solution (rather than awk, perl, or python):

data='"Field1":"Data1","Field2":"Data2","Field3":"Data3","Field4":"Data4"'

data=${data//,/$'\n'}     # replace comma with new-lines
data=${data//\"/}         # remove the quotes
while IFS=':' read -r field item
do
    if [[ -n $fields ]]
    then
        fields="$fields,$field"
        items="$items,$item"
    else
        fields=$field
        items=$item
    fi
done < <(echo "$data")

stmt="INSERT INTO data ($fields) VALUES($items);"
echo "$stmt"

7 Comments

will fail if input contains backslashes due to missing -r arg for read, will produce different outputs on different systems depending on which version of echo is installed, and will have other unexpected results due to non-quoted shell variables. Don't use shell for parsing text files - it's what awk was invented to do and does best.
@EdMorton ...actually, the places where quoting is missing here is all right-hand-side of assignments or inside [[ ]] (which has special parse-stage behavior). Those aren't string-split or glob expanded, so quoting isn't needed in them anyhow. Also, the shell-builtin version of echo shadows the system binary, so echo's behavior in bash is consistent. You're right that read should have -r, but the other criticisms are mistaken.
@CharlesDuffy I've never heard that echo is guranteed to be consistent when used in bash scripts before. OK, thanks for the info. Does bash's echo reproduce the text as-is in the output, for example does it print \n when it sees \n in the input or does it convert it to a newline character?
@EdMorton If you want something guaranteed to be consistent and safe, printf %s "$var" is the best practice. However, echo "$foo" can be considered guaranteed to emit the contents of the variable foo byte-for-byte so long as those contents are not precisely -n, -e or -E.
printf is a built-in in bash, as is echo. It is also a built-in in ksh93 but not ksh88 where the program printf is used. I accept the use of -r on read. Modifying code.
|
0
sed -n 's/$/) VALUES(/
: next
   s/"\([^"]*\)":"\([^"]*\)"\(.*\)) VALUES(\(.*\)/\1\3) VALUES(\4,\2/
   t next
   s/VALUES(,/VALUES(/
   s/.*/INSERT INTO data (&)/
   p
   ' YourFile

Assuming there is no " in data value nor ) VALUES( (could be treated also if needed)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.