Hello Everyone,
As also you know, redshift is using for DWH, and for some reason, the disk can be full 🙂
Create a superuser for those kinds of queries or login with superuser of Redshift.
Lets Start;
Execute below query for sorting table with the schema in redshift.
SELECT TRIM(pgn.nspname) AS SCHEMA, TRIM(a.name) AS TABLE, id AS TableId, decode(pgc.reldiststyle, 0, 'EVEN', 1,det.distkey , 8,'ALL' ) AS DistKey, decode(pgc.reldiststyle, 8,NULL, dist_ratio.ratio::DECIMAL(20,4) ) AS Skew, det.head_sort AS "SortKey", det.n_sortkeys AS "#SKs", CASE WHEN pgc.reldiststyle = 8 THEN a.rows_all_dist ELSE a.rows END AS rows, b.mbytes, decode(det.max_enc, 0,'N', 'Y' ) AS Enc, det.pct_enc, decode(b.mbytes, 0,0, ((b.mbytes/part.total::DECIMAL)*100)::DECIMAL(20,2) ) AS pct_of_total, (CASE WHEN a.rows = 0 THEN NULL ELSE CASE WHEN pgc.reldiststyle = 8 THEN ((a.rows_all_dist - pgc.reltuples)::DECIMAL(20,3) / a.rows_all_dist::DECIMAL(20,3)*100)::DECIMAL(20,2) ELSE ((a.rows - pgc.reltuples)::DECIMAL(20,3) / a.rows::DECIMAL(20,3)*100)::DECIMAL(20,2) END END ) AS pct_stats_off, CASE WHEN pgc.reldiststyle = 8 THEN decode( det.n_sortkeys,0, NULL,DECODE( a.rows_all_dist,0,0, (a.unsorted_rows_all_dist::DECIMAL(32)/a.rows_all_dist)*100))::DECIMAL(20,2) ELSE decode( det.n_sortkeys,0, NULL,DECODE( a.rows,0,0, (a.unsorted_rows::DECIMAL(32)/a.rows)*100))::DECIMAL(20,2) END AS pct_unsorted FROM (SELECT db_id, id, name, SUM(ROWS) AS ROWS, MAX(ROWS) AS rows_all_dist, SUM(ROWS) - SUM(sorted_rows) AS unsorted_rows, MAX(ROWS) - MAX(sorted_rows) AS unsorted_rows_all_dist FROM stv_tbl_perm a GROUP BY db_id, id, name) AS a JOIN pg_class AS pgc ON pgc.oid = a.id JOIN pg_namespace AS pgn ON pgn.oid = pgc.relnamespace LEFT OUTER JOIN (SELECT tbl, COUNT(*) AS mbytes FROM stv_blocklist GROUP BY tbl) b ON a.id = b.tbl INNER JOIN (SELECT attrelid, MIN(CASE attisdistkey WHEN 't' THEN attname ELSE NULL END) AS "distkey", MIN(CASE attsortkeyord WHEN 1 THEN attname ELSE NULL END) AS head_sort, MAX(attsortkeyord) AS n_sortkeys, MAX(attencodingtype) AS max_enc, SUM(case when attencodingtype <> 0 then 1 else 0 end)::DECIMAL(20,3)/COUNT(attencodingtype)::DECIMAL(20,3) *100.00 as pct_enc FROM pg_attribute GROUP BY 1) AS det ON det.attrelid = a.id INNER JOIN (SELECT tbl, MAX(Mbytes)::DECIMAL(32) /MIN(Mbytes) AS ratio FROM (SELECT tbl, TRIM(name) AS name, slice, COUNT(*) AS Mbytes FROM svv_diskusage GROUP BY tbl, name, slice) GROUP BY tbl, name) AS dist_ratio ON a.id = dist_ratio.tbl JOIN (SELECT SUM(capacity) AS total FROM stv_partitions WHERE part_begin = 0) AS part ON 1 = 1 WHERE mbytes IS NOT NULL AND pgc.relowner > 1 -- and pgn.nspname = 'schema' -- schemaname -- and a.name like 'table%' -- tablename -- and det.max_enc = 0 -- non-compressed tables ORDER BY mbytes DESC;
First, start you find which table is not using and remove those are.
Drop or Truncate bietltools.orders
Second, Vacuum table which is needs or you can start the size of data.
vacuum full bietltools.customer to 100 percent;
Third, It is a difficult part, encoding.
ANALYZE COMPRESSION bietltools.bietltools_reporting COMPROWS 10000
Comprows is important otherwise your disk usage will be 100 🙂
When you get the results, steps are like below;
1 - alter table bietltools.bietltools_reporting add column new_tooladded_time timestamp ENCODE zstd; 2 - update bietltools.bietltools_reporting set new_tooladded_time =tooladded_time; 3 - alter table bietltools.bietltools_reporting drop column tooladded_time; 4 - alter table bietltools.bietltools_reporting rename column new_tooladded_time totooladded_time;
I write like bold zstd, redshift has an encoding type for every datatype, and you have to figure out and set the best encoding type for that.
One more useful query at below it means bonus;
Show databases size
select sum(mbytes)/1024 as db_size_in_gb, database from ( select trim(pgdb.datname) as Database, trim(a.name) as Table, b.mbytes from stv_tbl_perm a join pg_database as pgdb on pgdb.oid = a.db_id join (select tbl, count(*) as mbytes from stv_blocklist group by tbl) b on a.id=b.tbl where a.slice=0 order by db_id, name) group by database;
Welcome to the black box of AWS