Friday, January 12, 2018

Memory management in TBSM



TBSM needs a tight handling of its memory consumption. No, it’s not a C/C++ level of challenges but since TBSM is based on the Netcool/Impact policy engine in which IPL/JavaScript developers can programme their own scripts, capable of doing namely everything that JavaScript or IPL can only do, it is especially recommended to take a closer look at any unstable condition that code can execute.


TBSM is not very fragile though. Yes you can run out of your available memory if you create a large while loop for some large objects. And yes, you need to make sure that you don’t really let NULL values out of your TBSM rules but check on them in your policies and replace with other values like 0 or an empty string “”. But these are the two major things to watch out. I’d like to concentrate on the other one in this post.


First, I’d like to show some hints on measuring memory consumption in TBSM 6.1.1. It’s based on the embedded edition of the IBM Websphere Application Server and so we have access to few regular WAS tools like scripting. I created this simple Jython script to get all information I need from the server regarding memory:



server = AdminConfig.getid('/Server:server1/')
jvm = AdminConfig.list('JavaVirtualMachine', server)

print 'initialHeapSize: ' + AdminConfig.showAttribute(jvm, 'initialHeapSize')
print 'maximumHeapSize: ' + AdminConfig.showAttribute(jvm, 'maximumHeapSize')

jvmName = AdminControl.completeObjectName('WebSphere:type=JVM,process=server1,*')
heapSize = long(AdminControl.getAttribute(jvmName, "heapSize")) / 1024 / 1024
freeMem = long(AdminControl.getAttribute(jvmName, "freeMemory")) / 1024 / 1024
maxMem = long(AdminControl.getAttribute(jvmName, "maxMemory")) / 1024 / 1024
totalmem = long(AdminControl.invoke(jvmName, 'getTotalMemory')) / 1024 / 1024
#totalMem = long(AdminControl.getAttribute(jvmName, "totalMemory")) / 1024 / 1024

print 'Max available memory [MB]: ' + str(maxMem)
print 'Current Heap size [MB]: ' + str(heapSize)
print 'Current Total Memory size [MB]: ' + str(totalmem)
print 'Current Free Memory size [MB]: ' + str(freeMem)
print 'Current Used memory [MB]: ' + str(totalmem-freeMem)
print 'Current unallocated memory [MB]: ' + str(maxMem-(totalmem-freeMem))



Let me call the script getWASCurrentHeapUtil.py
To make it easier to execute it, I created such a simple shell script.

#!/bin/sh

$TIP_HOME/bin/wsadmin.sh -username tipadmin -password tipadmin -f getWASCurrentHeapUtil.py -lang jython
  
Let me call the shell script getWASCurrentHeapUtil.sh

The shell script is an executable for linux systems but the Jython script could also be taken and executed by a Windows powershell script if you created one, with no problem.
The shell script as you may suspect, needs to be executed by the user who owns or has access rights to the TBSM’s TIP_HOME directory.

An example of executing the script:

$ ./getWASCurrentHeapUtil.sh

WASX7209I: Connected to process "server1" on node TBSMNode using SOAP connector;  The type of process is: UnManagedProcess

initialHeapSize: 256

maximumHeapSize: 1536

Max available memory [MB]: 4096

Current Heap size [MB]: 811

Current Total Memory size [MB]: 811

Current Free Memory size [MB]: 394

Current Used memory [MB]: 417

Current unallocated memory [MB]: 3679

Let me quickly explain what the script is returning:
Label
Sample value
Explanation
initialHeapSize
256
It’s your XMS java runtime parameter as stored in the server’s configuration
maximumHeapSize
1536
It’s your XMX java runtime parameter as stored in the server’s configuration
Max available memory [MB]
4096
It’s the XMX as passed to the JVM during the server’s startup as an extra parameter. It can be different than the server’s XMX configuration value.
Current Heap size [MB]
811
This is your current heap allocation. It always will be between the initialHeapSize and Max available memory [MB] values. WAS handles it automatically and increases or decreases as needed.
Current Total Memory size [MB]
811
It’s the current memory allocation within the current heap
Current Free Memory size [MB]
394
It’s the free memory available within the current heap
Current Used memory [MB]
417
It’s the occupied memory within the current heap
Current unallocated memory [MB]
3679
It’s the total available memory for the server to occupy at any time. It includes Current Free memory size and the difference between Max available memory [MB] and Current Heap size [MB]. This is your real available memory.
 
So what you’re really interested in is:
-        - Look at the Current unallocated memory [MB]
- Take a corrective action on your TBSM configuration (rules, policies) if this value reaches 10-15% of the Max Available Memory [MB].
 
Make note. If you’ve split your TBSM on two servers: the data server and the legacy dashboard server (TIP) plus you’re using the Jazz for Service Management/DASH, you may want to install this Jython script on all the three servers (just don’t forget to adapt the shell script to the correct paths).

Make note. The JVM consumption can also be seen in the TBSM UI (TIP):



TBSM configuration best practices
There are two general best practices related to the TBSM configuration, one regarding the NULL values which might occur in data sources for TBSM to extract and the other regarding NULL values being already stored in TBSM fields, parameters and processed by functions. The general rule is: avoid NULLs and try to replace them with a 0 value for numbers and empty character (“”) for strings.
My biggest achievements with tightening the TBSM memory consumption control has come with stricter handling of NULL values occurring in data being processed by TBSM on every stage: from extraction to presentation in DASH. Thanks to these two simple techniques (it’s not always obvious that you should be using them, you’ve got to work closer with the data and adopt the techniques accordingly) I managed to achieve a flat memory consumption, at a level of 10-20%, for a stable service model for months, and still going (no new services being created dynamically or manually, just data fetchers-driven and PassToTBSM-driven numerical rules and formulas calculating new outputs to DASH and logging). Before that I occasionally could see a significant growth of the memory consumption (usually associated with more NULLs occurring in the source data), leading to memory leaks and OutOfMemory errors in the most extreme cases. So a great achievement, worth to take a closer look. 

In case of DB2 i.e., try the COALESCE function in your SQL SELECT statements. The COALESCE function was designed to substitute any numeric or character value for a null, based on the column data type.

   SELECT COALESCE(int(MY_FIELD),0) as MY_FIELD from MY_TABLE

For other RDBMS, see the other relevant native functions. 

Inside the policy-based numerical formulas and rules, do checks for NULL values and perform a replacement accordingly. Make every effort to avoid processing NULL values in mathematical operations by the policy as that can generate a lot of issues later.
 
To reset NULL values to a non-NULL values obtained from service instances attributes:

if(ServiceInstance.<Attribute>==NULL) {
   Status = 0;
} else {
   Status = int(ServiceInstance. <Attribute>);
}  

To reset NULL values to a non-NULL values obtained from other numerical or text rules:

if(InstanceNode.<rule_name>.Value!=NULL) {
   Status = InstanceNode. <rule_name>.Value;
} else{
   Status = 0;
}


This is it. I hope you like this article. Please leave your comment if you have any and meanwhile – cheers!


No comments: