Tuesday, May 10, 2016

Unique Grand Children Count in TBSM


Tivoli Business Service Manager can calculate amazing things for you, if you only need them. This is thanks to the powerful rules engine being the key part of TBSM as well as the Netcool/Impact policies engine running  just under the hood together with every TBSM edition. You can present your calculation results later on on a dashboard or in reports, depending if you think of a real time scorecard or historical KPI reports. 
In this article, I’ll show how you can use TBSM rules engine to calculate unique children count for a grand parent level service instance. It is something that isn’t really documented at all and the case isn’t very popular but in case you need it, you can find it here in this material.
In this material I will use the following hierachy of three templates:
  • T_NetworkSite – acting as grandparent template level
  • T_Interface – acting as parent template level
  • T_Router – acting as child template level
Interface a parent to a Router? – You may ask. It is not really what’s being promoted in various documents, definitely not something documented here:
Well, this depends very much on what and how you want to present in TBSM dashboards. So it depends on what is your busienss service about. The example in the article I mention above is concentrating more on VPN services:
Figure 1. Source: https://www.ibm.com/support/knowledgecenter/api/content/nl/en-us/SSSPFK_6.1.1.3/com.ibm.tivoli.itbsm.doc/bsma/10/images/bsma_cust_sm_network_topology.jpg

In my example, I’m concentraing on Layer 2 connectivity, in other words: I cannot connect to my network site or it is unavailable if all router interfaces are down. All router interfaces can be down and the routers themselves can be up – it doesn’t matter, it means the same thing  for the service: an outage. Automatically, if whole routers get switched off, the interfaces will be switched off too so my network site will be unavailable too. 
Figure 2. Templates hierarchy used in this material
The desired effect is the following:
  • There is one grandparent KrakowSite
  • There are 2 routers in total
  • There are 4 interfaces in total, 2 per each of routers
Figure 3. Access to Krakow network site - business service sample diagram
In other words, KrakowSite should report to run 4 installed interfaces but 2 router devices only. The next scorecard is something we will be building during this exercise.

Figure 4. Target scorecard to build
Before I continue, I will need to introduce a HeartBeat and PassToTBSM concept.

PassToTBSM and Heartbeat

PassToTBSM is an Impact function that can be used to send any data from Netcool/Impact policy straight to TBSM. It doesn’t have to be same Impact as Impact running jointly with TBSM on the same server, it can be a standalone Impact server too (but I haven’t tried that). It can also be both Impact 6.1 or Impact 7.1 (announced not to have PassToTBSM but I hear it’s still there, not tested by myself though).
A policy that sends data to TBSM with PassToTBSM function can be as follows:
Seconds = GetDate();
Time = LocalTime(Seconds, "HH:mm:ss");

ev.timestamp = String(Time);
ev.bsm_identity = "AnyChild";

So we construct an IPL policy in which we take the current time (it is important to have at least one changing value, I’ll explain why in another article on this blog) and specify service instance identifier that affected service instance is expected to have defined for its incoming status rules or numerical rules. Because I’m going to affect two routers: RouterA and RouterB, I specify something generic like “AnyChild”. I could also send two events to TBSM, one with ev.bsm_identity=”RouterA” and the other with ev.bsm_identity=”RouterB”. In a case of large implementations it is easier to specify something generic like AnyChild and add such an identifier to every service instance automatically during an import process via SCR API/XMLtoolkit.
Let me call the policy with TBSMTreeRulesHeartbeat.
Such a policy needs now to be called by an Impact service:

Figure 5. Impact service to run the heartbeat policy
Make note. Alternatively a data fetcher could be used, which also can be scheduled to run every 30 seconds or even once a day at 12:00 AM or at another time, however I wanted to show PassToTBSM function in action and also in large solution cases you may not want to involve an SQL SELECT statement against any database to simply run such a heartbeat function. Alternatively you could create a policy fetcher, but then you need more skills to do it since there’s no UI for that in TBSM.
Make note. Such a service doesn’t really needs to be added to any of Impact projects. 
Now, in order to use such a service and policy in a numerical rule in TBSM, you do two things: you set that service as the data source and set mapping. I have created my HeartbeatRule in TBSM with the following settings:
Figure 6. Numerical Rule with heartbeat service as data feed

Then in Customize Fields form you should have:
Figure 7. Custom fields mapping

Save this rule to your LEAF template:
Figure 8. Heartbeat rule in the LEAF template definition

And the last thing: don’t forget to make sure your service instances have “AnyChild” instance identifier specified:
Figure 9. Adding new instance ID - AnyChild

Why is it for? You may ask.
The answer is: We will be calculating unique number of grand children in one of TBSM functions. All functions in TBSM need a trigger which is an input value that changes, in order to return a fresh value. If the input value doesn’t change, you’ll not see a new value on the output. It can be the same value, but your rule won’t work if you don’t trigger it from outside somehow. Example? Sure:
On the next level in templates hierarchy there will be NumberOfRouters rule defined (and the heartbeat rule too):
Figure 10. T_Interface template's rules list

Let’s see inside the NumberOfRouters rule:
Figure 11. NumberOfRouters rule definition

This rule will return the output value from the function NumberOfAllChildren defined in the policy NumericalAttributeFunctions.ipl every time the HeartbeatRule triggers it.
In other words, the number of routers below interfaces won’t change in output of this function, even if it really changes (grows, reduces) unless the rule is kicked again.
So you need that extra rule on the children level like HeartbeatRule running periodically every 30 seconds and returning a random timestamp every time to ensure a different output value every time it runs.
Why so much hassle, you may say?
Why not to use ServiceInstace.NUMCHILDREN inside a policy-based numerical formula?
Well, first of all, Numerical formula is also a rule that also needs a trigger to run. Every rule in TBSM needs a trigger to run. I can dedicate a special post to that topic.
Second of all, I do use ServiceInstance.NUMCHILDREN, check out my policy function:
function NumberOfAllChildren(ChildrenStatusArray, AllChildrenArray, ServiceInstance, Status) {
   Status = ServiceInstance.NUMCHILDREN;

So this policy, I mean this function, will return the NUMCHILDREN value any time you trigger the rule.
The main reason for that hassle is that unfortunately but you cannot use NUMCHILDREN directly on a scorecard, you only can return it in rules. And rules need a trigger. NUMCHILDREN isn’t also an additional attribute, which could be shown directly in JazzSM dashboard.
Is it clear? I know, it’s bit weird, but just at the first sight.
You may also doubt: why am I using ServiceInstance.NUMCHILDREN? Is there any other attribute to return same value? Why am I using TIP, not JazzSM in my examples at all? The answers are: there’s no additional attribute that you could return in JazzSM straight, without wrapping it with a rule (and you cannot return an additional attribute without packing it in a rule in TIP) to return anything like number of children. So you have two choices:
  1.    Use ServiceInstance object’s field NUMCHILDREN – see above
  2.    Use a policy that will iterate through an array of children objects of your service instance and return the array’s length.
As you can see, still a policy, so still a numerical aggregation rule or a numerical formula rule must be used. So there’s no other way really. So rules are your way and you need to trigger them.

Recalculate correct number of objects after server restart

There’s an alternative to the Heartbeat rule, from TBSM 6.1.1 FP2 you can run this policy and associate it with the server start or run it from time to time manually or schedule it with an Impact service, there are two policies actually, one is for all nodes and the other just for leafs.
All nodes
log("Recalc Leaf Node Only. Policy Start." );
GetByFilter(Type, Filter, false);
log("Recalc Leaf Node Only. Policy Finish." );
log("Recalc All Nodes. Policy Start." );
GetByFilter(Type, Filter, false);
log("Recalc All Nodes. Policy Finish." );

This alternative is documented here:
The difference between my heartbeat solution and the policy documented above is that my heartbeat function is selective, I decide which elements of the service tree will be recalculated (not just leafs but also not the entire service tree) and when (not just during a restart but every now and then). This is important, because change in number of children on some intermediate levels may occur independently on changes in number of children on the leaf level and I still need to trigger that change. Same time it’s an effort for TBSM to recalculate the whole tree, especially in case I have 100k instances in my service tree. That’s why I prefer to make it selective, so I use Heartbeat concept.

Unique grandchildren count rule

Now once we have the children count rule created and triggered, it’s time to get the unique grandchildren count rule.
What’s the difference?
It’s simple, you don’t want to take your children children count, because every Interface will report it has 1 parent, which gives you 4 parents while the true number is just 2.

So you need a smart Impact policy that will calculate that for you.

Since we’re clear on what rules need to be created on the Router level and the Interfaces level, it’s time to present rules on the NetworkSite template level:
Figure 12. Rules defined inside T_NetworkSite template

The NumberOfInterfaces rule is just to calculate the number of interface below the network site and inside of that rule the same function NumberOfAllChildren is being called from within NumericAttributeFunctions.ipl. The trigger should be the heartbeat rule again since number of interfaces inside the site may change independently. As you could see above, I defined a heartbeat rule inside the T_Interface template and I called it HeartbeatRuleIfc.
The more interesting rule is UniqueGrandChildren, which runs another function from the NumericAttributeFunctions policy, called NumberOfUniqueGrandChildren:

function NumberOfUniqueGrandChildren(ChildrenStatusArray, AllChildrenArray, ServiceInstance, Status) {
   i = 0;

   uniquegrandchildrenarray = {};
   log("MP: "+ServiceInstance);
   while(i<length(ServiceInstance.CHILDINSTANCEBEANS)) {
      child = ServiceInstance.CHILDINSTANCEBEANS[i];
      log("Child "+child.DISPLAYNAME+" of grand parent "+ServiceInstance.DISPLAYNAME+" was found.");

      j = 0;
      while(j<length(child.CHILDINSTANCEBEANS)) {
         grandchild = child.CHILDINSTANCEBEANS[j];
         log("Child "+grandchild.DISPLAYNAME+" of child "+child.DISPLAYNAME+" was found.");

         // Testing if currently analyzed child has already occurred
         k = 0;
         occurence = 0;
         while(k<length(uniquegrandchildrenarray)) {
            if(uniquegrandchildrenarray[k].SERVICEINSTANCEID == grandchild.SERVICEINSTANCEID) {
               // if yes, mark occurred = 1 (true) and finish analyzing further, so exit this loop
               occurence = 1;
               // k = length(uniquegrandchildrenarray); //uncomment this line to speed up in case of large child arrays
               log("Duplicate found: "+uniquegrandchildrenarray[k].SERVICEINSTANCEID+" and "+grandchild.SERVICEINSTANCEID+". Skipping.");
         if(occurence == 0) {
            uniquegrandchildrenarray = uniquegrandchildrenarray + grandchild;
            log("Unique grand child found: "+grandchild.DISPLAYNAME+". Added to the list.");
         j = j + 1;
      i = i + 1;
   Status = length(uniquegrandchildrenarray);
   log("Grand parent "+ServiceInstance.DISPLAYNAME+" has # grand unique children "+Status);

So basically the function will traverse the service tree two levels down to the grandchildren level and will start storing their number by tracking their name. For every reoccurring name a counter will be incremented by 1. For every new name, a new item will be added to an array. The size of the array is the returned value.

Is it simple? Not so much, but it’s probably one of those functions you implement once and use all times, so it’s worth to learn about it. Let’s see the rule at the end:
Figure 13. NumberOfUniqueGrandChildren rule

So this is your desired effect:
Figure 14. Unique GrandChildrenCount on the scorecard

I hope that you like this type of small hints on how to achieve something useful in TBSM, if so, please comment and I'll try to post as man of this type of posts as I can. Thanks!

No comments: