Introduction

This article discusses DB2 LUW Deep Compression from the perspective of someone who planned, organized, communicated and coordinated the implementation of this in development, test, regression test and production environments for multi-terabyte data warehouse databases.

Pros and cons

DB2 LUW Deep Compression reduces the amount of disk space needed to store information in databases and improves performance through faster I/O and more efficient use of memory. Here is a list of our production databases showing the size before and after compression along with the compression ratios and space savings we accrued.

We measured performance improvements in the 5% to 30% range. We determined that more rows were stored in buffer pools and that application memory high water marks were lower. We reduced our application memory configuration and increased our buffer pool memory configuration to better utilize the memory we had available on servers. Overall performance of online and batch SQL has improved significantly and better memory management by DB2 has helped us avoid having to purchase more memory to handle increased work loads. The combined improvement in I/O and memory utilization of DB2 LUW Deep Compression has made this a good investment for our large databases.

DB2 LUW Deep Compression uses a bit more CPU, requires more of a conversion effort if system managed tablespaces (SMS) are being used and costs more to license than DB2 LUW without compression. We measured CPU utilization increases of 2-5% after implementing compression and suspect this cost is due to the decompression that is done before information is returned to applications. In our environments the improved I/O and memory management performance far outweigh this small CPU cost.

Environment

AIX 5.3.0.0 Technology Level 6

DB2 V9.1 FIX PAK 2
DB2 Database Partitioning: “Licensed”
DB2 Storage Optimization: “Licensed”

Detail Steps

  • Alter database managed tablespace (DMS) from REGULAR to LARGE.
  • Alter DMS tablespace to AUTORESIZE YES.
  • Alter tables in DMS tablespace to COMPRESS YES and VALUE COMPRESSION.
  • Run reorg using the temporary tablespace and RESETDICTIONARY options.
  • Run DB2 list tablespaces show detail command.
  • Recycle the DB2 instance.
  • Run reorg in place to free up space inside DMS tablespaces.
  • Run DB2 list tablespaces show detail command.
  • Recycle the DB2 instance.
  • Alter tablespaces to reduce the size of containers to recover disk space.
  • Use the copy/rename table approach for large tables in DMS tablespaces.
  • Create new DMS LARGE tablespaces to replace SMS tablespaces.
  • Use the copy/rename table approach for tables in SMS tablespaces.

Recommendations and Information

  • Start with the smallest DMS tablespace first and work toward the largest.
  • Start with the smallest table in a tablespace first and work toward the largest table in that tablespace. DB2 will not implement compression for a table if it does not have enough rows to warrant this. We found it faster and easier to alter all tables to be compressed and to simply let DB2 decide whether to build the compression dictionary and compress the data during the reorg.
  • Using the smallest to largest approach helps in several ways. First, experience is gained while working with smaller units of work. Second, if you have multiple tables in a tablespace, DB2 will recover more space for reuse after compression is implemented due to the way DB2 stores data internally in tablespaces. Third, using the smallest table to the largest table will allow you to free up disk space as you work through the project. This space can be reused to support the additional space you may need when using the copy/rename table approach for large tables or for tables you are converting from SMS to DMS tablespaces.
  • Complete the alter tablespace to LARGE, alter table and reorgs with the temporary tablespace and resetdictionary options for all tables in a tablespace before altering the next tablespace to LARGE.
  • Plan tablespace alters, table alters and reorgs during times when tables are not in great demand. We found that we could run alters and reorgs for small to medium size tables (less than a billion rows) during the day when we have a lot of SQL selects running against our data warehouses. We did not run alters or reorgs during our nightly batch and replication cycle windows. We had a few -911 time out roll back errors on alters and reorgs and simply had to rerun these. We had a few times when people using our data warehouses got a -911 time out roll back error. We avoided most of this by using the copy/rename table approach for large tables (more than a billion rows).
  • The alter DMS tablespace AUTORESIZE YES option makes DMS tablespaces behave like SMS tablespaces because they will automatically increase in size when more space is needed.
  • The reorg with the resetdictionary option builds the compression dictionary and compresses the data in the table if DB2 thinks there is enough data to make this worthwhile. Pages of data are converted from regular record identifiers (RIDS) to large RIDS whether compressed or not. Large RIDS result in more data being stored per page and this combined with compression reduces the amount of disk space needed to store the information and it reduces the number of I/O’s required to return the information.
  • The db2 list tablespaces show detail command resets some internal pointers in tablespaces after they have been converted to large RIDS with compressed tables. This is required in order to be able to recover space for reuse.
  • Recycling the DB2 instance flushes the bufferpools and DB2 will use these more efficiently with compressed data when the DB2 instance is started. Tablespace information in memory is also refreshed after the instance is recycled.
  • The second table reorg will lower the high watermark in the table and tablespace and allow you to reduce tablespace containers to recover space for reuse. Be sure to reorg the smallest tables first and work your way toward the largest. This approach will recover the most space due to the way information is stored internally in tablespaces that contain many tables.
  • Alter the DMS tablespace to reduce the size of container files to recover disk space for reuse when you have completed implementing compression for all tables in the tablespace. We found that doing this with a script that loops and attempts to reduce container sizes using smaller values works best. Containers are reduced in size until there is no longer free space to allow this and then the script will get warning messages as it works it’s way down in size. DB2 version 9.5 will automatically reduce the size of DMS tablespaces defined with the AUTORESIZE YES feature. You have to run DB2 alter commands to reduce tablespace container sizes for tablespaces with volatile information in releases before DB2 v9.5.
  • Space recovered after compressing small and medium tables can be used to increase file systems that have temporary tablespace container files. This can ensure enough temporary work space is available to rebuild multi-billion row indexes of the largest tables in a database.
  • Additional space recovered after compressing small and medium tables can be used to support the copy/rename approach on large tables and on tables converted from SMS to DMS tablespaces.
  • The copy/rename approach simply means to create a new DMS LARGE tablespace, a new table and new indexes, to copy the data from the current table to the new table and to implement compression on the new table. Then the current table can be renamed to an old name and the new table renamed to the current name. Since the rename takes seconds, this avoids long outages when you need to implement compression against large tables. The old table and tablespace can be dropped.
  • Reorgs should usually be run serially and not in parallel against multiple tables at the same time if you want to reduce the impact the compression project has on other users of the database.
  • Recycling instances can be coordinated based on production service level agreements.

Summary

It took us about a week to implement compression in each of our development and test environments. It took about two weeks to implement compression in each of our user regression test environments. It took from four to six weeks to implement compression in each of our production environments. We only ran alters and reorgs during the day and not at night during our nightly batch and replication cycles. Coordinating instance outages increased the elapsed time.

The amount of space we recovered for reuse and the performance improvement we have seen through reduced I/O’s and better database memory utilization make DB2 LUW Deep Compression well worth the investment of time and money.

Script Examples

The script examples below can help provide information that can be used to plan and manage projects to implement DB2 LUW Deep Compression. DBA’s can combine and enhance these scripts to generate the commands needed to alter tablespaces, alter tables and reorganize tables in order to convert regular record identifiers to large record identifiers in tablespaces and to build compression dictionaries and compress data in tables. Ordering the generated command output so that tablespaces and tables are converted from smallest to largest will help you recover more space for reuse during a compression project. Small and medium tablepsaces and tables can be converted using the generated commands and large tables can be skipped in this process and converted using the copy/rename table approach mentioned earlier in this article.

(Replace the word Database in the script with your database name.)

db2list_candidate_tablespaces_for_conversion.ksh

#  Description      : This script lists tablespaces that are candidates for
#                   : conversion from SMS to DMS and from DMS regular to
#                   : DMS large in support of a project to implement large
#                   : record identifiers so that DB2 will store more rows
#                   : per page.   This effort could be part of a project
#                   : to implement compression because table compression
#                   : requires large record identifiers. (RIDS)
db2 connect to Database
echo "Listing candidate tablespaces for conversion."
db2 -x "select TBSP_TYPE,sum(TBSP_USED_SIZE_KB) as TBSP_USED_SIZE_KB,
char(TBSP_NAME,20) as TBSP_NAME,TBSP_CONTENT_TYPE,TBSP_PAGE_SIZE,TBSP_AUTO_RESIZE_ENABLED 
from sysibmadm.tbsp_utilization 
where TBSP_CONTENT_TYPE not like '%TEMP%' 
group by TBSP_TYPE,TBSP_NAME,TBSP_CONTENT_TYPE,
TBSP_PAGE_SIZE,TBSP_AUTO_RESIZE_ENABLED" > FileName.dat
sort FileName.dat > FileName.txt
cat FileName.txt
rm FileName.dat
echo "Candidate tablespaces for conversion located in FileName.txt"

db2list_candidate_tables_for_compression.ksh

#  Description      : This script lists candidate tables for compression.
#                   : A text file is generated with information about the
#                   : server, instance, database, # tablespaces, tablespace
#                   : types (D=DMS, S=SMS), schema, table name and row
#                   : count (cardinality) from DB2 SYSCAT catalog views.
#                   : Output can be imported into a spreadsheet
#                   : and used to help plan a compression project.
#
db2 connect to Database
echo "Identifying tables that are candidates for compression."
db2 -x "select a.tbspace,b.tbspacetype,digits(a.card),a.tabschema,a.tabname 
from syscat.tables a,syscat.tablespaces b 
where a.tabschema not like 'SYS%' 
and  a.tbspace=b.tbspace and a.card > 0 
and a.type in ('T','S') 
order by a.card desc,a.tbspace,a.tabschema,a.tabname" > FileName.dat
cat FileName.dat | while read Tbspace TbspaceType Card TabSchema TabName
do
        echo '"'$(hostname)'","'${DB2INSTANCE}'","'Database'","'${Tbspace}'","
'${TbspaceType}'","'${TabSchema}'","'${TabName}'","'${Card}'"' | tee -a FileName_excel.txt
done
rm FileName.dat
echo "Candidate tables for compression located in FileName_excel.txt"

Space savings

DB2 LUW Terabytes Terabytes Percent Partitions
Database Before After compressed  
prod001 20 9 55% 16
prod002 8 4 50% 11
prod003 12 6 50% 11
prod004 22 10 55% 7
prod005 18 7 61% 16
Total 80 36 55%  
Space savings 44 Terabytes