Tuesday, September 10, 2013

Netezza: List Skew Tables Using Query

Below will will find out tables having data slices with "data slices contain more than 3 time of perfectly distributed data for a table"

In the bellow query I have used "3*((cast(100 as decimal(5))/(select count(*) from _V_DUAL_DSLICE)))" when means below:
"3 * 100 / total number of data slices", that is 3 times of a 100% data that is distributed evenly on all the available data slices.


select database_name,table_name,dsid,ALLOCATED_BYTES,USED_BYTES,PCT_DATA_IN_DSID 
from 
(select  the_table.database as database_name,
the_table.objname table_name,
        storage.dsid,
storage.ALLOCATED_BYTES,
storage.USED_BYTES,
storage.ALLOCATED_BYTES*100/sum(ALLOCATED_BYTES) over (partition by database_name,table_name) PCT_DATA_IN_DSID -- % of total size in this data slice
  FROM  _V_OBJ_RELATION_XDB the_table LEFT OUTER JOIN _V_SYS_OBJECT_DSLICE_INFO storage 
        on (  the_table.objid = storage.tblid  and storage.tblid > 200000)) TBL
  where --DATABASE_NAME = '' and -- if want to specify database name
PCT_DATA_IN_DSID > 3*((cast(100 as decimal(5))/(select count(*) from _V_DUAL_DSLICE)))  -- data slices contain more than 3 time of perfectly distributed data for a table
and ALLOCATED_BYTES > 100*1024*1024 --100 MB
--and table_name = '' --if want to specify tables name

--and DSID=630 --if want to specify Dataslice
  order by PCT_DATA_IN_DSID;