Company Cybertec prepared new extension for parallel run of aggregations. Here are some my first experiences with it (written 2015-01-13).

Overview:

 

Installation:

  • Installation of agg extension is still quite a problem because pg 9.5 is freshly released and agg extension is only in beta version and must be installed using “make”.
  • Finally I was able to make everything running on CentOS7 with group “Development Tools” installed.
  • Packages of Pg9.5 have a little bit different names, there is no “server-dev” package available, only “devel”.
  • Only after all that I was able to create extension agg on database.

 

Tests:

  • To activate agg extension we must set parameter agg.hash_workers to more than 0.
  • I tested with agg.hash_workers=4 and compared with database without agg.

 

Tested query:

explain analyze
select unixtimestamp, count(*)
from mypartitionedtable
group by unixtimestamp

Select without agg:

QUERY PLAN
HashAggregate  (cost=193753.62..193755.62 rows=200 width=8) (actual time=1195.002..1195.036 rows=159 loops=1)
Group Key: mypartitionedtable.unixtimestamp
->  Append  (cost=0.00..188288.74 rows=1092975 width=8) (actual time=0.020..955.800 rows=1091774 loops=1)
->  Seq Scan on mypartitionedtable  (cost=0.00..0.00 rows=1 width=8) (actual time=0.001..0.001 rows=0 loops=1)
->  Seq Scan on mypartitionedtable_1444135500000  (cost=0.00..1569.02 rows=9002 width=8) (actual time=0.019..6.363 rows=9002 loops=1)
->  Seq Scan on mypartitionedtable_1444136400000  (cost=0.00..2521.40 rows=14540 width=8) (actual time=0.006..10.155 rows=14540 loops=1)
->  Seq Scan on mypartitionedtable_1444137300000  (cost=0.00..2462.98 rows=14298 width=8) (actual time=0.006..10.526 rows=14298 loops=1)
... and similar rows ...

Select with agg:

QUERY PLAN
Custom Scan (ParallelHashAggregate)  (cost=193753.62..193755.62 rows=200 width=8) (actual time=451.256..451.287 rows=159 loops=1)
->  Append  (cost=0.00..188288.74 rows=1092975 width=8) (never executed)
->  Seq Scan on mypartitionedtable  (cost=0.00..0.00 rows=1 width=8) (never executed)
->  Seq Scan on mypartitionedtable_1444135500000  (cost=0.00..1569.02 rows=9002 width=8) (never executed)
->  Seq Scan on mypartitionedtable_1444136400000  (cost=0.00..2521.40 rows=14540 width=8) (never executed)
->  Seq Scan on mypartitionedtable_1444137300000  (cost=0.00..2462.98 rows=14298 width=8) (never executed)
->  Seq Scan on mypartitionedtable_1444138200000  (cost=0.00..2408.83 rows=13983 width=8) (never executed)
... and similar rows ...

First results:

  • Explain plan with agg shows ParallelHashAggregate but obviously it is not able to show separate workers and shows only main loop which did not executed seq scans therefore they are marked as (never executed).
  • Total time of run with agg is better – explain plan shows ca 1.2 sec without agg vs. 0.4 sec with agg. Queries itself showed 0.9 s without agg vs. 0.4 s with agg.