0

I have ~100 Postgres .dump from different sources. They all have the same schema, just a single table, and a few hundred to a few hundred thousand rows. However, the data was collected at different locations and now needs to all be combined.

So I'd like to merge all the rows from all the databases into one single database, ignoring the ID key. What would be a decent way to do this? I may collect more data in the future from more sources, so it's likely to be a process I need to repeat.

1
  • In future, try collecting data in a more practical format than a database dump. A dump is designed to reproduce a database exactly. It's not good for merging, etc. I suspect your best bet will be to restore each dump to a staging DB then use ETL tools (CloverETL, Talend Studio, Pentaho Kettle, etc) or custom scripting to merge into the main DB. But I'm not sure. Commented Jan 7, 2015 at 7:03

1 Answer 1

1

if needed use pg_restore to convert the dumps into SQL.

run the SQL dump trhough

 sed '/^COPY .* FROM stdin;$/,/^\\.$/ p;d'

as there is only one table in your data that will give you the copy command needed to load the data send that to your database to load the data.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.