How to import BLOB formatted as CSV to postgres

Question

I have a csv file that is an output from a BLOB store. The csv contains 6 related tables. Not all records utilise the 6 tables but all records do use table 1. I would like to import table 1 into postgres. Data is described as following

Files are ASCII text files comprising variable length fields delimited by an asterisk. The files have an extension of “csv”. Records are separated by a Carriage Return/Line Feed. No data items should contain an asterisk.

Further information is given in the technical arrangement.

Technical Arrangement

The technical arrangement of our summary data is as follows: Fields are in the exact order in which they are listed in this file specification. Records are broken down into Types. Each Type represents a different part of the record. Each record will begin with Type ‘01’ data For each record type ‘01’, there is one or more record type ‘02’s containing Survey Line Item data. There may be zero or more of record types ’03’ and '06'. There may be zero or one of record types '04' and '05'. If a record type '06' exists, there will be one record type '07' The end of a record is only indicated by the next row of Type ‘01’ data or the end of the file. You should use this information to read the file into formal data structures.

I'm new to databases and want to know how to tackle this, i understand that postgres has python and java connectors which in turn have ways to read blob data. Is that the best approach?

EDIT Sample data, one entry comprising 2 record types then 1 containing all 7 record types ;

01*15707127000*8227599000*0335*The Occupier*3****MARKET STREET**BRACKNELL*BERKS*RG12 1JG*290405*Shop And Premises*60.71*14872*14872*14750*2017*Bracknell Forest*00249200003001*20994339144*01-APR-2017**249*NIA*330.00
02*1*Ground*Retail Zone A*29.42*330.00*9709
02*2*Ground*Retail Zone B*31.29*165.00*5163
01*15707136000*492865165*0335**7-8****CHARLES SQUARE**BRACKNELL*BERKS*RG12 1DF*290405*Shop And Premises*325.10*34451*32921*32750*2017*Bracknell Forest*00215600007806*21012750144*01-APR-2017**249*NIA*260.00
02*1*Ground*Retail Zone A*68.00*260.00*17680
02*2*Ground*Remaining Retail Zone*83.50*32.50*2714
02*3*Ground*Office*7.30*26.00*190
02*4*First*Locker Room (Female)*3.20*13.00*42
02*5*First*Locker Room (Male)*5.80*13.00*75
02*6*First*Mess/Staff Room*11.50*13.00*150
02*7*Ground*Internal Storage*7.80*26.00*203
02*8*Ground*Retail Zone B*68.10*130.00*8853
02*9*Ground*Retail Zone C*69.90*65.00*4544
03*Air Conditioning System*289.5*7.00*+2027
06*Divided or split unit*-5.00%
06*Double unit*-5.00%
07*36478*-3557`

Post sample data

Clodoaldo Neto
– Clodoaldo Neto

2017-04-18 17:40:34 +00:00
Commented Apr 18, 2017 at 17:40 — Clodoaldo Neto
– Clodoaldo Neto, Commented Apr 18, 2017 at 17:40

klin · Accepted Answer · 2017-04-18 20:27:24Z

2

Copy the text file to an auxiliary table with a single text column:

drop table if exists text_buffer;
create table text_buffer(text_row text);
copy text_buffer from '/data/my_file.csv';

Transform the text column to text array skipping rows you do not need. You'll be able to select any element as a new column with a given name and type, e.g.:

select 
    cols[2]::bigint as bigint1,
    cols[3]::bigint as bigint2,
    cols[4]::text as text1,
    cols[5]::text as text2
    -- specify name and type of any column you need 
from text_buffer,
lateral string_to_array(text_row, '*') cols -- transform text column to text array
where left(text_row, 2) = '01';             -- get only rows for table1

   bigint1   |  bigint2   | text1 |    text2     
-------------+------------+-------+--------------
 15707127000 | 8227599000 | 0335  | The Occupier
 15707136000 |  492865165 | 0335  | 
(2 rows)

answered Apr 18, 2017 at 20:27

klin

123k15 gold badges240 silver badges262 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

mapping dom Over a year ago

I get DROP TABLE CREATE TABLE ERROR: extra data after last expected column

mapping dom Over a year ago

Presumably i can repeat the second step if i want to keep the remaining records as well.

klin Over a year ago

The error is caused by the tab character. Change the delimiter to the character which is certainly not in the file, e.g. copy text_buffer from '/data/my_file.csv' delimiter '|';. Yes, you can repeat step 2 for other tables.

mapping dom Over a year ago

Thank you, that works. I've realised there's nothing in the subsequent records to tie them back to the lead record. Can i do something like this; where cols[2-30]::...apply to record 1 cols [30-35] apply to record 2 'from text_buffer,lateral string_to_array(text_row, '*') cols,where left(text_row, 2) = '01' then where left(text_row, 2) = '02' etc. ?

mapping dom Over a year ago

Hmm, apparently not.

|

Collectives™ on Stack Overflow

How to import BLOB formatted as CSV to postgres

1 Answer 1

9 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Your Answer

Sign up or log in

Post as a guest

Related