0

I have a csv file that is an output from a BLOB store. The csv contains 6 related tables. Not all records utilise the 6 tables but all records do use table 1. I would like to import table 1 into postgres. Data is described as following

Files are ASCII text files comprising variable length fields delimited by an asterisk. The files have an extension of “csv”. Records are separated by a Carriage Return/Line Feed. No data items should contain an asterisk.

Further information is given in the technical arrangement.

Technical Arrangement

The technical arrangement of our summary data is as follows: Fields are in the exact order in which they are listed in this file specification. Records are broken down into Types. Each Type represents a different part of the record. Each record will begin with Type ‘01’ data For each record type ‘01’, there is one or more record type ‘02’s containing Survey Line Item data. There may be zero or more of record types ’03’ and '06'. There may be zero or one of record types '04' and '05'. If a record type '06' exists, there will be one record type '07' The end of a record is only indicated by the next row of Type ‘01’ data or the end of the file. You should use this information to read the file into formal data structures.

I'm new to databases and want to know how to tackle this, i understand that postgres has python and java connectors which in turn have ways to read blob data. Is that the best approach?

EDIT Sample data, one entry comprising 2 record types then 1 containing all 7 record types ;

01*15707127000*8227599000*0335*The Occupier*3****MARKET STREET**BRACKNELL*BERKS*RG12 1JG*290405*Shop And Premises*60.71*14872*14872*14750*2017*Bracknell Forest*00249200003001*20994339144*01-APR-2017**249*NIA*330.00
02*1*Ground*Retail Zone A*29.42*330.00*9709
02*2*Ground*Retail Zone B*31.29*165.00*5163
01*15707136000*492865165*0335**7-8****CHARLES SQUARE**BRACKNELL*BERKS*RG12 1DF*290405*Shop And Premises*325.10*34451*32921*32750*2017*Bracknell Forest*00215600007806*21012750144*01-APR-2017**249*NIA*260.00
02*1*Ground*Retail Zone A*68.00*260.00*17680
02*2*Ground*Remaining Retail Zone*83.50*32.50*2714
02*3*Ground*Office*7.30*26.00*190
02*4*First*Locker Room (Female)*3.20*13.00*42
02*5*First*Locker Room (Male)*5.80*13.00*75
02*6*First*Mess/Staff Room*11.50*13.00*150
02*7*Ground*Internal Storage*7.80*26.00*203
02*8*Ground*Retail Zone B*68.10*130.00*8853
02*9*Ground*Retail Zone C*69.90*65.00*4544
03*Air Conditioning System*289.5*7.00*+2027
06*Divided or split unit*-5.00%
06*Double unit*-5.00%
07*36478*-3557`
1
  • Post sample data Commented Apr 18, 2017 at 17:40

1 Answer 1

2

Copy the text file to an auxiliary table with a single text column:

drop table if exists text_buffer;
create table text_buffer(text_row text);
copy text_buffer from '/data/my_file.csv';

Transform the text column to text array skipping rows you do not need. You'll be able to select any element as a new column with a given name and type, e.g.:

select 
    cols[2]::bigint as bigint1,
    cols[3]::bigint as bigint2,
    cols[4]::text as text1,
    cols[5]::text as text2
    -- specify name and type of any column you need 
from text_buffer,
lateral string_to_array(text_row, '*') cols -- transform text column to text array
where left(text_row, 2) = '01';             -- get only rows for table1

   bigint1   |  bigint2   | text1 |    text2     
-------------+------------+-------+--------------
 15707127000 | 8227599000 | 0335  | The Occupier
 15707136000 |  492865165 | 0335  | 
(2 rows)
Sign up to request clarification or add additional context in comments.

9 Comments

I get DROP TABLE CREATE TABLE ERROR: extra data after last expected column
Presumably i can repeat the second step if i want to keep the remaining records as well.
The error is caused by the tab character. Change the delimiter to the character which is certainly not in the file, e.g. copy text_buffer from '/data/my_file.csv' delimiter '|';. Yes, you can repeat step 2 for other tables.
Thank you, that works. I've realised there's nothing in the subsequent records to tie them back to the lead record. Can i do something like this; where cols[2-30]::...apply to record 1 cols [30-35] apply to record 2 'from text_buffer,lateral string_to_array(text_row, '*') cols,where left(text_row, 2) = '01' then where left(text_row, 2) = '02' etc. ?
Hmm, apparently not.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.