DB_BLOCK_CHECKSUM vs DB_BLOCK_CHECKING

I’ve been reading the guide Oracle Database High Availability Best Practices. In the Chapter 8 about Data Guard, it mentions two parameters for Redo Apply best practices: DB_BLOCK_CHECKSUM and DB_BLOCK_CHECKING and suggests to set them to FULL and MEDIUM/FULL respectively.

It’s quite confusing when looking at the definition of two parameters:

DB_BLOCK_CHECKING = { FALSE | OFF | LOW | MEDIUM | TRUE | FULL }

DB_BLOCK_CHECKING specifies whether Oracle Database performs block checking for database blocks. So when it is set to non False/OFF, Oracle checks a block by going through the data in the block, making sure it is logically self-consistent. Block checking can often prevent memory and data corruption. Block checking typically causes 1% to 10% overhead in most applications, depending on workload and the parameter value. Specific DML overhead may be higher. The more updates or inserts in a workload, the more expensive it is to turn on block checking. You should set DB_BLOCK_CHECKING to FULL if the performance overhead is acceptable.

DB_BLOCK_CHECKSUM = { OFF | FALSE | TYPICAL | TRUE | FULL }

DB_BLOCK_CHECKSUM determines whether DBWn and the direct loader will calculate a checksum (a number calculated from all the bytes stored in the block) and store it in the cache header of every data block when writing it to disk. Checksums are verified when a block is read – only if this parameter is TYPICAL or FULL and the last write of the block stored a checksum. In FULL mode, Oracle also verifies the checksum before a change application from update/delete statements and recomputes it after the change is applied. In addition, Oracle gives every log block a checksum before writing it to the current log.
Most of the log block checksum is done by the generating foreground processes, while the LGWR or the LGWR slave processes (LGnn processes) perform the rest of the work, for better CPU and cache efficiency.
If this parameter is set to OFF, DBWn calculates checksums only for the SYSTEM tablespace, but not for user tablespaces. In addition, no log checksum is performed when this parameter is set to OFF.
Checksums allow Oracle to detect corruption caused by underlying disks, storage systems, or I/O systems. If set to FULL, DB_BLOCK_CHECKSUM also catches in-memory corruptions and stops them from making it to the disk. Turning on this feature in TYPICAL mode causes only an additional 1% to 2% overhead. In the FULL mode it causes 4% to 5% overhead. Oracle recommends that you set DB_BLOCK_CHECKSUM to TYPICAL.

What are the difference between them? Why does Oracle need two different parameters for block checking? AskTom has a good explaination for them:

“So, the checksum will ensure that what was WRITTEN OUT is what gets read back in. It is useful to detect corruptions introduced by IO operations (bad disk, bad cables, bad hardware). It will not detect that a block in memory is already corrupt (bad memory chips, bad software, etc). The checksum is useful to find faulty disk operations.

The block checking is useful to verify that a block is physically “correct” — that it most likely is not corrupted. If a varchar2 field on a block has a length > 4000 bytes — we know it is corrupt. If the head of the block doesn’t match the tail, we know it is corrupt. And so on. Block checking validates the integrity of the block.

The two parameters do different checks (CHECKSUM doesn’t validate only makes sure what we wrote is what we read, CHECKING doesn’t ensure what we read is what we wrote but rather that what we read was a VALID block) at different times.”

In the same section of the guide, it suggests to set the 3rd parameter DB_LOST_WRITE_PROTECT to FULL on the standby database to enable Oracle to detect writes that are lost in the I/O subsystem.

Actually there is a parameter DB_ULTRA_SAFE introduced in 11g which controls 3 parameters above.

DB_ULTRA_SAFE = { OFF | DATA_ONLY | DATA_AND_INDEX }

It sets the default values for other parameters (DB_BLOCK_CHECKING, DB_BLOCK_CHECKSUM, and DB_LOST_WRITE_PROTECT) that control protection levels — an integrated mechanism to provide protection from various possible data corruptions. So you can either use this parameter to set 3 parameters to the pre-defined values at the same time or set them individually with this parameter set to OFF.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s