The story was about a project which the system is running on Solaris SPARC platform. I was told the system is a Solaris zone and I had the access of the global zone. From the global zone, I could see 3 non-global zones and one of them was the system I needed to work on. One day I almost finished my tasks and brought up the application, the system went into freeze state — no any command worked because swap was full. Any command would just return “fork: not enough space” for BASH, or “cannot fork: no swap space” for SH. The reason was the other two zones are for test purpose and were not capped on memory as they were supposed to be.
I could not do anything so I sent an request to the system adminsitrator to restart the global zone. The second day, he replied saying he restarted the global zone and capped the other two zones.
The problem was fixed! Where was the interesting story?
Well, after that I looked at again, I noticed my SSH session to the global zone was still there and recovered. I could run any commands again in that session. It didn’t get disconnected at all which should if the system was rebooted! Then I checked the system uptime, it’s been up for 301 days! So I asked the administrator back how he did the reboot to release the resource just for my curiority. He replied that the global zone was just a Solaris 10 container (don’t be confused here. The term “container” is obsolete and zone should be used instead) sitting on a Solaris 11 system. He rebooted the Solaris 11 system.
At this point, I was totally confused because:
- It sounded like Solaris 11 should be the global zone. Why I saw Solaris 10 was the global zone with 3 non-global zones.
- How come he rebooted Solaris 11 didn’t reboot Solaris 10, but released the resource?
- What’s the real relationship of the Solaris 11 system and Solaris 10 system?
I was not aware of the existence of Solaris 11 until he mentioned it and had no access of it. He’s not sure what happened as well. Fortunately he sent me a screen capture showing the uptime of Solaris 11 to prove the reboot. In the picture, he also showed how he connected to the Solaris 10 system which I noticed was different from a normal zone login using zlogin. What he used was:
telnet 0 5000
Bingo! That revealed their virtualization architecture. They are using Oracle VM Server for SPARC which is a type 1 bare metal hypervisor. The Solaris 11 and 10 systems are two logical domains. Bascially “Each logical domain can be stopped, started, and rebooted independently of each other without requiring you to perform a power cycle of the server.” The main difference is the Solaris 11 system is also a control domain. That explained why rebooting Solaris 11 didn’t reboot Solaris 10, but released resource. The Solaris zones I haven’t been working on were just another layer of virtualization — type 2 hosted hypervisor.