ORA-16624: broker protocol version mismatch detected

I did an Oracle rolling upgrade for an Oracle DataGuard environment last week and ran into the following error when doing a switchover from one server PROD2 (Oracle SID) to another server PROD1 (Oracle SID) after the upgrade:

ORA-16624: broker protocol version mismatch detected

This was an interesting problem which was caused by a small configuration overlooked. Basically we had the ODG environment running with Oracle 12cR1 on Oracle Linux 6.10, and they are now running with Oracle 19c (19.8.0.0.0) on Oracle Linux 7.8. The whole enviroment was/is on VMware. As you can see, the upgrade was a bit complex enough because it involved not only Oracle version upgrade, but also Linux version upgrade.

To reduce downtime and emilinate the need of extra storage, the rolling upgrade strategy below was used:

  1. stood up a new server with Oracle 12cR1 binary on Linux 7.8;
  2. moved database storage(VMware disks vmdk files) over and had the physical 12c standby running on the new OS;
  3. followed the Oracle MAA rolling upgrade manul method to convert it to a logical standby and upgraded it to 19c;
  4. switched over the database roles and made the logical as the primary;
  5. flashbacked the new logical standby to physical standby;
  6. forklifted the old primary to 19c on Linux 7.

After both database were upgraded and are running on 19c, I recreated the ODG broker configuration (it was disabled and removed because we didn’t have ADG license to use the DBMS_ROLLING package).

The interesting issue happened when I switched the database back from PROD2 to PROD1: PROD1 was transferred to the primary database, but the broker was hanging when started up PROD2 as the standby.

Initially I looked at the alert log of PROD2 and it seemed it started acutally. I saw Oracle background processes running, but when I used SQL*Plus to login, it said “Connected to an idle instance.” That looked really weird.

I was not sure what really happened. So I used to control-c to exit the hanging broker CLI. When I showed the configuraton, I saw “ORA-16624”:

DGMGRL> show configuration;
 Configuration - DG_PROD
 Protection Mode: MaxPerformance
   Members:
   PROD1 - Primary database
     PROD2 - Physical standby database (disabled)
       ORA-16624: broker protocol version mismatch detected
 Fast-Start Failover:  Disabled
 Configuration Status:
 SUCCESS   (status updated 26 seconds ago)

Since SQL*Plus gave me “an idle instance” message, I went ahead to kill PMON and SMON on PROD2 and started the database mounted. I decided to recreate the broker configuration and do a few more switchover tests to see how it goes. Again, switchover from PROD1 to PROD2 was good, but from PROD2 to PROD1, the same problem happened: DGMGRL was hanging

DGMGRL> switchover to "PROD1";
 Performing switchover NOW, please wait…
 New primary database "PROD1" is opening…
 Operation requires start up of instance "PROD" on database "PROD2"
 Starting instance "PROD"…
 Connected to an idle instance.
 ORACLE instance started.
 Connected to "PROD2"
 Database mounted.
 Connected to "PROD2"
 ^CORA-01013: user requested cancel of current operation
 ORA-06512: at "SYS.X$DBMS_DRS", line 443
 ORA-06512: at line 8

Control-C to exit DGMGRL and check the configuration again — it showed “ORA-16624: broker protocol version mismatch detected”. The SQL*Plus gave me the “idle instance” message. But the PROD2 database seemed to be mounted up based on the DGMGRL message and the alert log. And Oracle backupground processes were running.

ORA-16624 indicated there was some kind of version mismatch. What was going on?

Since I did Oracle 12cR1 to 19c upgrade on the PROD2 server(PROD1 19c upgrade was done by ODG redo media recovery), there is no 12cR1 binary files on the PROD1 server, but the PROD2 server does have 12cR1 and it is still there.

Could it be that Oracle 12c binary was used to start up the database on the PROD2 server when switching over from PROD2 to PROD1? Why the issue didn’t happen when switching over from another direction: PROD1 to PROD2?

I examined the alert log on the PROD2 server again, especially lines during the startup mounted phase of the switchover phase to the standby. Yes, Oracle 12c binary was used! I saw the following messages during the startup:

ORACLE_HOME = /opt/oracle/121020
.
.
Using parameter settings in server-side spfile /opt/oracle/121020/dbs/spfilePROD.ora
.
.
Completed: alter database mount

I also confirmed it with checking the actual oracle binary to start SMON. 18427 was the process ID for SMON in the following output:

[oracle@ol7db02 ~]$ ls -l /proc/18427/exe         
lrwxrwxrwx. 1 oracle oinstall 0 Nov 18 09:00 /proc/18427/exe -> /opt/oracle/121020/bin/oracle

And I temporarily set the environment variable ORACLE_HOME to 12c, I was NOT connecting to an idle instance any more. It started mounted and I was able to shut it down.

Obviously something was missed during the upgrade of 19c from 12c on the PROD2 server. I got envrionment ORACLE_SID, ORACLE_HOME, oratab etc setup correctly that’s why 19c could be started manually or with the OS startup without any issues. After reviewed the whole upgrade process and compared what’s been done the PROD1, I remembered the listener file was not updated!

There is a static listener configured for Oracle DataGuard broker which had the old 12c ORACLE_HOME path

SID_LIST_LISTENER =
   (SID_LIST =
     (SID_DESC =
       (SID_NAME = PROD)
       (GLOBAL_DBNAME = PROD2_DGMGRL)
       (ORACLE_HOME = /opt/oracle/121020)
     )
 )

And because I haven’t removed the Oracle 12c binary yet, the broker was able to access it and brought up the database with the 12c binary.

Why the issue didn’t happen when switching over from another direction: PROD1 to PROD2? That was because switchover to primary is not a complete restart, it just transitions the role, updates some parameters and open the database from the mounted state. But switchover to standby needs to shut down the old primary and restart it. Switching PROD2 to primary from standby, no issue. Switching PROD2 to standby from primary, issue appeared. Because I have all other settings correct, a manual switchover without using broker would not reveal the issue as well.

The fix was easy once the root cause was identified:

  1. Updated listener.ora with the correct ORACLE_HOME path;
  2. Reloaded the listener on PROD2;
  3. Enabled database PROD2 in the broker configuration.
Advertisement

One thought on “ORA-16624: broker protocol version mismatch detected

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s