Oracle LOB compression: basic and "cmap"
Continuing the topic begun in the article about XMLTYPE, I will discuss the features of Oracle RDBMS's BLOBs and CLOBs, which were once modeled after the large objects of Oracle RDB (originally called DEC VAX Rdb/VMS and which gave Oracle RDBMS not only large objects, but also, for example, CBO) to replace the LONG and LONG RAW types.
Now we will focus on Oracle LOB compression, which is available for SecureFile LOBs. As we already saw in the article dedicated to XMLTYPE, it compresses even small binary XML files well. As with the XMLTYPE analysis, we will examine this through the prism of how this data is written to the redo log.
Let's create a table that will include three LOB columns: one that will store content with compression set to LOW, one with compression set to MEDIUM, and one with compression set to HIGH.
create table TEST_COMPRESSION (
ID varchar2(255) not null,
COMPRESSION_LOW blob,
COMPRESSION_MEDIUM blob,
COMPRESSION_HIGH blob,
constraint TEST_COMPRESSION_PK primary key(ID)
)
lob(COMPRESSION_LOW) store as securefile SF_COMPRESSION_LOW (
disable storage in row compress low)
lob(COMPRESSION_MEDIUM) store as securefile SF_COMPRESSION_MEDIUM (
disable storage in row compress medium)
lob(COMPRESSION_HIGH) store as securefile SF_COMPRESSION_HIGH (
disable storage in row compress high)
/
alter table TEST_COMPRESSION add supplemental log data (ALL) columns
/
And now we just insert one row into this table and write down the SCN of the beginning and end of our insertion into the table
REM Please write down this SCN
select current_scn from v$database
/
insert into TEST_COMPRESSION values('C1000',UTL_RAW.CAST_TO_RAW(RPAD('G',1000,'G')),UTL_RAW.CAST_TO_RAW(RPAD('G',1000,'G')),UTL_RAW.CAST_TO_RAW(RPAD('G',1000,'G')))
/
commit
/
REM Please write down this SCN
select current_scn from v$database
/Now we are ready to get a redo log dump for our insert operation using either the standard Oracle database command
ALTER SYSTEM DUMP LOGFILE '/Users/averemee/polyxena/oracle/oradata/KAFKA19/KAFKA19/redo03.log' SCN MIN 47651026 SCN MAX 47652496;or a utility included in my open-source CDC solution, which generates additional output about certain data changes.
java -cp oracdc-kafka-2.8.2-standalone.jar solutions.a2.cdc.oracle.utils.file.OraRedoLogFile -f /Users/averemee/polyxena/oracle/oradata/KAFKA19/KAFKA19/redo03.log -o test_compression.trc --start-scn 47651026 --end-scn 47652496 -r -bNothing unexpected, in operation OP:26.6 (you can find the Java parsing code for this change in Layer 26 opcodes 2 and 6)
REDO RECORD - Thread:1 RBA: 0x0002e0.000018c9.0094 LEN: 0x0200 VLD: 0x01 CON_UID: 3517775930
SCN: 0x0000000002d8c8a9 SUBSCN: 4
CHANGE #1 CON_ID:3 TYP:10 CLS:1 AFN:12 DBA:0x0300032e OBJ:77346 SCN:0x0000000002d8c8a9 SEQ:1 OP:26.6 ENC:0 RBL:0 FLG:0x0000
KDLI common [12]
op 0x06 [BIMG]
type 0x20 [data]
flg0 0x02
flg1 0x00
psiz 8060
poff 128
dba 0x0300032e
KDLI fpload [11.32]
bsz 8192
scn 0x0000000002d8c8a9
xid 0x0002.015.00000c8c
objd 77346
KDLI load data [4.56]
bdba [0x0300032e]
kdlich
flg0 0x20 [ver=0 typ=data lock=n]
flg1 0x00
scn 0x0000000002d8c8a9
lid 00000001000000106578
spare 0x00000000
kdlidh
flg2 0x00 [ver=0 lid=short-rowid hash=n cmap=n pfill=n]
flg3 0x00
pskip 0
sskip 0
hash 0000000000000000000000000000000000000000
hwm 317
spr 0
KDLI data load
78 9c ed d3 41 09 00 30 0c c0 40 79 a5 fe 0d 55 c5 18 84 3b 05 f9 64 07 a8 db
...........
........... we will see the expected zlib signature 78 9c which according to the List of file signatures means zlib’s "Default Compression (no preset dictionary)".
Inserting 100,000,000 bytes of data into a column with COMPRESSION HIGH enabled yields much more interesting and, at first glance, less understandable results.
REM Please write down this SCN
select current_scn from v$database
/
declare
LARGE_BLOB blob;
I binary_integer;
begin
LARGE_BLOB := EMPTY_BLOB();
DBMS_LOB.CREATETEMPORARY(LARGE_BLOB, false);
for I in 1 .. 100000 loop
DBMS_LOB.WRITEAPPEND(
LARGE_BLOB,
1000,
UTL_RAW.CAST_TO_RAW(RPAD('K',1000,'G')));
end loop;
insert into TEST_COMPRESSION(ID, COMPRESSION_HIGH)
values('C100000000', LARGE_BLOB);
commit;
end;
/
insert into TEST_COMPRESSION values('C1000',UTL_RAW.CAST_TO_RAW(RPAD('G',1000,'G')),UTL_RAW.CAST_TO_RAW(RPAD('G',1000,'G')),UTL_RAW.CAST_TO_RAW(RPAD('G',1000,'G')))
/
commit
/
REM Please write down this SCN
select current_scn from v$database
/After obtaining the redo dump, we see that the data no longer begins with the zlib signature 78 9c, but instead starts with DEADBEEF.
REDO RECORD - Thread:1 RBA: 0x0002e0.000148ce.0164 LEN: 0x203c VLD: 0x01 CON_UID: 3517775930
SCN: 0x0000000002d9a521 SUBSCN: 4
CHANGE #1 CON_ID:3 TYP:10 CLS:1 AFN:12 DBA:0x03000106 OBJ:77346 SCN:0x0000000002d9a521 SEQ:1 OP:26.6 ENC:0 RBL:0 FLG:0x0000
KDLI common [12]
op 0x06 [BIMG]
type 0x20 [data]
flg0 0x02
flg1 0x00
psiz 8060
poff 128
dba 0x03000106
KDLI fpload [11.32]
bsz 8192
scn 0x0000000002d9a521
xid 0x000a.00a.000009df
objd 77346
KDLI load data [4.56]
bdba [0x03000106]
kdlich
flg0 0x20 [ver=0 typ=data lock=n]
flg1 0x00
scn 0x0000000002d9a521
lid 0000000100000010657a
spare 0x00000000
kdlidh
flg2 0x10 [ver=0 lid=short-rowid hash=n cmap=y pfill=n]
flg3 0x00
pskip 0
sskip 0
hash 0000000000000000000000000000000000000000
hwm 8060
spr 0
KDLI data load
de ad be ef 01 00 00 00 00 00 3f f3 e0 00 00 00 00 00 00 31 bf 00 10 00 a5 00
00 27 0f 00 04 0e fc 00 00 03 1e 00 04 0e fc 00 00 03 1e 00 04 0e fc 00 00 03
1e 00 04 0e fc 00 00 03 1d 00 04 0e fc 00 00 03 1d 00 04 0e fc 00 00 03 1e 00
04 0e fc 00 00 03 1e 00 04 0e fc 00 00 03 1e 00 04 0e fc 00 00 03 1e 00 04 0e
fc 00 00 03 1b 00 04 0e fc 00 00 03 1b 00 04 0e fc 00 00 03 1c 00 04 0e fc 00
00 03 1c 00 04 0e fc 00 00 03 1d 00 04 0e fc 00 00 03 1e 00 03 13 1c 00 00 02
65 47 47 47 47 47 47 47 47 78 9c ed d3 31 0d 00 20 10 c0 40 79 3f e0 df 0f 0a
...........
...........And, naturally, there are many such OP:26.6 changes (you can find the Java parsing code for this change in Layer 26 opcodes 2 and 6), and it's understandable – compressing 100,000,000 bytes of data using a single thread is both slow and heavily taxes memory. This means the data is split into parts, but what exactly is inside this magical DEADBEEF? And for the kdlidh attribute in this case, flg2 contains cmap=y, and not cmap=n as it does for simple compression using zlib.
Fortunately, the Oracle's whitepaper provides one hint - “Finally, Advanced LOB Compression HIGH achieves the highest storage savings but incurs the most CPU overhead.“, and the second is inside each chunk beginning with DEADBEEF - numerous occurrences of the already familiar zlib signature 78 9c in each chunk (on my test instance, there are 16 of them).
A bit of decoding effort, and....
You can view the Java code here, and the full test data example and JUnit test here.
Feel free to reach out if you're curious about anything regarding Oracle internals!
