Hello Everyone,
As also you know, redshift is using for DWH, and for some reason, the disk can be full 🙂
Create a superuser for those kinds of queries or login with superuser of Redshift.
Lets Start;
Execute below query for sorting table with the schema in redshift.
SELECT TRIM(pgn.nspname) AS SCHEMA,
TRIM(a.name) AS TABLE,
id AS TableId,
decode(pgc.reldiststyle,
0, 'EVEN',
1,det.distkey ,
8,'ALL'
) AS DistKey,
decode(pgc.reldiststyle,
8,NULL,
dist_ratio.ratio::DECIMAL(20,4)
) AS Skew,
det.head_sort AS "SortKey",
det.n_sortkeys AS "#SKs",
CASE WHEN pgc.reldiststyle = 8 THEN a.rows_all_dist ELSE a.rows END AS rows,
b.mbytes,
decode(det.max_enc,
0,'N',
'Y'
) AS Enc,
det.pct_enc,
decode(b.mbytes,
0,0,
((b.mbytes/part.total::DECIMAL)*100)::DECIMAL(20,2)
) AS pct_of_total,
(CASE WHEN a.rows = 0 THEN NULL ELSE
CASE WHEN pgc.reldiststyle = 8 THEN ((a.rows_all_dist - pgc.reltuples)::DECIMAL(20,3) / a.rows_all_dist::DECIMAL(20,3)*100)::DECIMAL(20,2)
ELSE ((a.rows - pgc.reltuples)::DECIMAL(20,3) / a.rows::DECIMAL(20,3)*100)::DECIMAL(20,2) END END
) AS pct_stats_off,
CASE WHEN pgc.reldiststyle = 8
THEN decode( det.n_sortkeys,0, NULL,DECODE( a.rows_all_dist,0,0, (a.unsorted_rows_all_dist::DECIMAL(32)/a.rows_all_dist)*100))::DECIMAL(20,2)
ELSE decode( det.n_sortkeys,0, NULL,DECODE( a.rows,0,0, (a.unsorted_rows::DECIMAL(32)/a.rows)*100))::DECIMAL(20,2) END
AS pct_unsorted
FROM (SELECT db_id,
id,
name,
SUM(ROWS) AS ROWS,
MAX(ROWS) AS rows_all_dist,
SUM(ROWS) - SUM(sorted_rows) AS unsorted_rows,
MAX(ROWS) - MAX(sorted_rows) AS unsorted_rows_all_dist
FROM stv_tbl_perm a
GROUP BY db_id,
id,
name) AS a
JOIN pg_class AS pgc ON pgc.oid = a.id
JOIN pg_namespace AS pgn ON pgn.oid = pgc.relnamespace
LEFT OUTER JOIN (SELECT tbl, COUNT(*) AS mbytes FROM stv_blocklist GROUP BY tbl) b ON a.id = b.tbl
INNER JOIN (SELECT attrelid,
MIN(CASE attisdistkey WHEN 't' THEN attname ELSE NULL END) AS "distkey",
MIN(CASE attsortkeyord WHEN 1 THEN attname ELSE NULL END) AS head_sort,
MAX(attsortkeyord) AS n_sortkeys,
MAX(attencodingtype) AS max_enc,
SUM(case when attencodingtype <> 0 then 1 else 0 end)::DECIMAL(20,3)/COUNT(attencodingtype)::DECIMAL(20,3) *100.00 as pct_enc
FROM pg_attribute
GROUP BY 1) AS det ON det.attrelid = a.id
INNER JOIN (SELECT tbl,
MAX(Mbytes)::DECIMAL(32) /MIN(Mbytes) AS ratio
FROM (SELECT tbl,
TRIM(name) AS name,
slice,
COUNT(*) AS Mbytes
FROM svv_diskusage
GROUP BY tbl,
name,
slice)
GROUP BY tbl,
name) AS dist_ratio ON a.id = dist_ratio.tbl
JOIN (SELECT SUM(capacity) AS total
FROM stv_partitions
WHERE part_begin = 0) AS part ON 1 = 1
WHERE mbytes IS NOT NULL
AND pgc.relowner > 1
-- and pgn.nspname = 'schema' -- schemaname
-- and a.name like 'table%' -- tablename
-- and det.max_enc = 0 -- non-compressed tables
ORDER BY mbytes DESC;
First, start you find which table is not using and remove those are.
Drop or Truncate bietltools.orders
Second, Vacuum table which is needs or you can start the size of data.
vacuum full bietltools.customer to 100 percent;
Third, It is a difficult part, encoding.
ANALYZE COMPRESSION
bietltools.bietltools_reporting
COMPROWS 10000
Comprows is important otherwise your disk usage will be 100 🙂
When you get the results, steps are like below;
1 - alter table bietltools.bietltools_reporting add column new_tooladded_time timestamp ENCODE zstd;
2 - update bietltools.bietltools_reporting set new_tooladded_time =tooladded_time;
3 - alter table bietltools.bietltools_reporting drop column tooladded_time;
4 - alter table bietltools.bietltools_reporting rename column new_tooladded_time totooladded_time;
I write like bold zstd, redshift has an encoding type for every datatype, and you have to figure out and set the best encoding type for that.
One more useful query at below it means bonus;
Show databases size
select sum(mbytes)/1024 as db_size_in_gb, database from (
select trim(pgdb.datname) as Database,
trim(a.name) as Table, b.mbytes
from stv_tbl_perm a
join pg_database as pgdb on pgdb.oid = a.db_id
join (select tbl, count(*) as mbytes
from stv_blocklist group by tbl) b on a.id=b.tbl
where a.slice=0
order by db_id, name)
group by database;
Welcome to the black box of AWS