Nothing will be faster than a single sequential scan for reading a whole table, at least until PostgreSQL 9.6, where parallel sequential scans will be introduced.
It would be tempting to split the table by ctid, the physical location of the tuple in the table, but PostgreSQL doesn't optimise access by ctid for operators different from =:
test=> EXPLAIN SELECT * FROM large WHERE ctid BETWEEN '(390, 0)' AND '(400,0)';
┌───────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├───────────────────────────────────────────────────────────────────┤
│ Seq Scan on large (cost=0.00..1943.00 rows=500 width=8) │
│ Filter: ((ctid >= '(390,0)'::tid) AND (ctid <= '(400,0)'::tid)) │
└───────────────────────────────────────────────────────────────────┘
(2 rows)
The same holds for inserts: Without being able to show numbers, I'm pretty sure that one process INSERTing or COPYing into one table will not be slower than several processes all loading data into the same table.
Since it seems that the bottleneck is the processing of the rows between SELECT at the origin and INSERT at the destination, I'd suggest the following:
Have one thread that performs a single SELECT * FROM all_emails.
Create a number of threads that can perform the expensive processing in parallel.
The first thread distributes the result rows to the parallel workers in a round robin fashion.
Yet another thread collects the results of the parallel workers and composes them into input for a COPY tablename FROM STDIN statement that it executes.