Data Transformation in Postgres Using SqlSoup

Question

I have a large dataset of events in a Postgres database that is too large to analyze in memory. Therefore I would like to quantize the datetimes to a regular interval and perform group by operations within the database prior to returning results. I thought I would use SqlSoup to iterate through the records in the appropriate table and make the necessary transformations. Unfortunately I can't figure out how to perform the iteration in such a way that I'm not loading references to every record into memory at once. Is there some way of getting one record reference at a time in order to access the data and update each record as needed?

Any suggestions would be most appreciated!

Chris

A code sample showing the basic problem would allow someone to make a concrete suggestion. — kgrittn
– kgrittn, Commented Apr 28, 2012 at 1:52
This is vary vague. Why do you want to perform "row at a time" processing (iterating)? Is your data actually a graph with records "pointing" to multiple other records without any grouping or nesting? And: 10^7 records is not big for a database. — wildplasser
– wildplasser, Commented Apr 28, 2012 at 11:49

Chris · Accepted Answer · 2012-04-28 02:31:38Z

1

After talking with some folks, it's pretty clear the better answer is to use Pig to process and aggregate my data locally. At the scale, I'm operating it wasn't clear Hadoop was the appropriate tool to be reaching for. One person I talked to about this suggests Pig will be orders of magnitude faster than in-DB operations at the scale I'm operating at which is about 10^7 records.

answered Apr 28, 2012 at 2:31

Chris

3,2897 gold badges34 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Data Transformation in Postgres Using SqlSoup

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related