I had a fairly memorable 21st birthday, but not for the reasons you might assume. I was working in support for BEA at the time and we managed to have a simultaneous production outage at two of our largest clients, one in Hong Kong and the other in Australia.
The Hong Kong one was quite nasty. A database issue had caused the failure of a large Oracle Tuxedo based environment. It took only a few minutes to sort the database issue out but we then needed to reboot Tuxedo, that took over 45 minutes. The problem was that this system controlled a major logistics provider and access for all of their trucks to the facility. Within 45 minutes we had created a traffic jam that extended on to a major Hong Kong freeway and created chaos.
The Melbourne issue was slightly less serious, just a Telco billing systems outage! The good news is at least that one didn’t cause a traffic jam.
I have not so fond memories of trying to eat birthday cake, while remotely debugging a Tuxedo instance in Hong King and dealing with an irate Russian from the Telco, which we eventually managed to sort out.
The Hong Kong experience taught me a valuable lesson though. It doesn’t matter how quickly you fix the problem, if it takes another 45 minutes just to start the system you still have a large outage window. Which brings us neatly to the topic of this article.
We are currently working with a client who is rolling out a large Oracle SOA Suite environment based on Oracle WebLogic server. This environment is going to have around 50 virtual machines and 40 to 50 WebLogic Server instances in the domain, so it raises all sorts of management challenges. To help manage the solution they are using ConfigNOW, for central distribution of the environments, application deployments and general management tasks.
One of the common problems that you face when working with an environment this size is just getting it started. The WebLogic node manager becomes essential but beyond that you still need to be able to start 50 odd WebLogic Server instances in a reasonable amount of time. With each instance taking several minutes to start, the only viable option is to start them in parallel.
Fortunately WebLogic gives you a neat way to tackle this problem through the WebLogic Scripting Tool (WLST) and we are using it in conjunction with ConfigNOW to create a solution that can boot a large number of servers in parallel.
One trick is to use the block flag on the start command in WLST as follows[code]
start(‘myManagedServer’, ‘Server’, block=’false’)[/code]
myManagedServer should be replaced with the name of your server and the block flag ensures that the server start command does not wait for the server to start before returning. Effectively this gives you a asynchronous or threaded style calling behavior where you can place a number of these calls in a loop to rapidly start a number of servers at once.
If your entire domain runs as a cluster the other option is to use the logic[code]
Which works fine if every server in the domain is defined in a cluster, this domain however is not like that.
In the end what we have done is create a custom ConfigNOW command that;
- Looks for the availability of the WebLogic Admin Server
- Starts the Admin Server if it is not available via the ConfigNOW start_wls_admin command
- Iterates through the managed servers, starting each in parallel based on the configuration properties
The script is below. To use it with your ConfigNOW instance, simply save it in to the /confignow/custom/commands folder. Also if you are planning to use the script on Windows you will need to adjust it to run ConfigNOW.cmd rather than ConfigNOW.sh as the script currently does. You will also need the NodeManager installed and running in order for this script to work.
If you have any questions, feel free to post them in the comments[code]
# Copyright (c) 2007-2012 Integral Technology Solutions Pty Ltd,
# All Rights Reserved.
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD PARTY RIGHTS.
# IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE BE
# LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL DAMAGES, OR
# ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
# IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
# OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
# FOR FURTHER INFORMATION PLEASE SEE THE INTEGRAL TECHNOLOGY SOLUTIONS
# END USER LICENSE AGREEMENT (ELUA).
"""Startup all servers in a domain"""
# Get the admin server details
admin_name = config.getProperty(‘wls.admin.name’)
log.debug("Admin server’s name is ‘" + admin_name + "’")
# Collect list of servers from the properties files
serversString = config.getProperty(‘wls.servers’)
servers = serversString.split(",")
# Get the start sequentially flags
startSequentially = config.getProperty(‘wls.servers.startSequentially’)
if(startSequentially is None or startSequentially==”): startSequentially = ‘true’
# Attempt to connect to admin server, first we check to see if the port is there
# if it is then we try to connect, if not we initiate a boot sequence
s = socket.socket()
address = config.getProperty(‘wls.admin.listener.address’)
port = int(config.getProperty(‘wls.admin.listener.port’))
# If we got here, then the socket is not available so we call start_wls_admin for this environment
configNowHome = config.getProperty(‘ConfigNOW.home’)
configNowEnvironment = config.getProperty(‘ConfigNOW.environment’)
configNowConfiguration = config.getProperty(‘ConfigNOW.configuration’)
configNOWLaunchCommand = configNowHome + ‘/ConfigNOW.sh start_wls_admin ‘ + configNowEnvironment + ‘ ‘ + configNowConfiguration
log.info("The WebLogic Admin Server is not running, so we will use ConfigNOW to start it")
log.info("check the log files for more details.")
log.info("Running:" + configNOWLaunchCommand)
for currentServer in servers: # Collect listen address and port for the configured servers
currentServerName = config.getProperty(‘wls.server.’ + currentServer + ‘.replace.name’)
if(currentServerName is None or currentServerName==”):
currentServerName=config.getProperty(‘wls.server.’ + currentServer + ‘.name’)
if (currentServerName != admin_name):
serverStatus = getServerStatus(currentServerName)
if (serverStatus.upper() != ‘RUNNING’):
log.info("Sending startup command to server ‘" + currentServerName + "’")
start(currentServerName, ‘Server’, block=startSequentially)
log.info("..not sending startup command to server ‘" + currentServerName + "’ due to state of ‘" + serverStatus + "’")
log.debug(‘Getting status of server ‘ + server)
cd(‘/ServerLifeCycleRuntimes/’ + server)