0

I have numerous csv files that will form the basis of a mysql database. My problem is as follows:

The input CSV files are of the format:

TIME | VALUE PARAM 1 | VALUE PARAM 2 | VALUE PARAM 3 | ETC.
0.00001 | 10 | 20 | 30 | etc.

This is not the structure I want to use in the database. There I would like one big table for all of the data, structured something like:

TIME | PARAMETER | VALUE | Unit of Measure | Version

This means that I would like to insert the combination of TIME and VALUE PARAM 1 from the CSV into the table, then the combination of TIME and VALUE PARAM 2, and so on, and so on.

I haven't done anything like this before, but could a possible solution be to set up a BASH script that loops through the columns and on each iteration inserts the combination of time + value into my database?

I have a reasonable understanding of mysql, but very limited knowledge of bash scripting. But I couldn't find a way out with the mysql LOAD DATA INFILE command.

If you need more info to help me out, I'm happy to provide more info!

Regards,

Erik

2 Answers 2

1

i do this all day, every day, and as a rule, have the most success with the least headaches by using LOAD DATA INFILE to a temporary table, then leveraging the power of mySQL to get it into the final table/format successfully. Details at this answer.

To illustrate this further, we process log files for every video event of 80K highschools/colleges around the country (that's every pause/play/seek/stop/start for 100's of thousands of videos).

They're served from a number of different servers, depending on the type of videos (WMV, FLV, MP4, etc.), so there's some 200GB to handle every night, with each format having a different log layout. The old way we did it with CSV/PHP took literally days to finish, but changing it to LOAD DATA INFILE into temporary tables, unifying them into a second, standardized temporary table, then using SQL to group and otherwise slice and dice cut the execution time to a few hours.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks Greg, I will look into that. I guess I would still need to loop through the columns, but now in the temp table. That should be possible with a procedure right?
Yes...the net effect of this is that you've got your raw data in a far more manageable format that you can inspect and manipulate with much more ease and accuracy. Looping/saving/etc. will depend ultimately on what you want for your finished product. Hope this helps and that you'll "Accept" my answer.
1

It would probably be easiest to preprocess your CSV with an awk script first, and then (as Greg P said) use LOAD DATA LOCAL INFILE. If I understand your requirements correctly, this awk script should work:

#!/usr/bin/awk -F| -f

NR==1 { 
    for(col = 2; col <= NF; col++) label[col] = $col
    printf("TIME | PARAM | VALUE | UNIT | VERSION\n")
    next 
}

{
    for(col = 2; col <= NF; col++) {
        printf("%s | %s | %s | [unit] | [version]\n", $1, label[col], $col)
    }
}

Output:

$ ./test.awk test.in
TIME | PARAM | VALUE | UNIT | VERSION
0.00001  |  VALUE PARAM 1  |  10  | [unit] | [version]
0.00001  |  VALUE PARAM 2  |  20  | [unit] | [version]
0.00001  |  VALUE PARAM 3  |  30  | [unit] | [version]
0.00001  |  ETC.   |  etc. | [unit] | [version]

Then

mysql> LOAD DATA LOCAL INFILE 'processed.csv' 
mysql> INTO TABLE 'table' 
mysql> FIELDS TERMINATED BY '|' 
mysql> IGNORE 1 LINES;

(Note: I haven't tested the MySQL)

1 Comment

Thanks Kevin! This should work as well. I guess (as Greg said) you have a bit more power of the raw data from a temp table. But both your approaches are what I was looking for.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.