Introduction
Tivoli Business Service
Manager can calculate amazing things for you, if you only need them. This is
thanks to the powerful rules engine being the key part of TBSM as well as the
Netcool/Impact policies engine running
just under the hood together with every TBSM edition. You can present
your calculation results later on on a dashboard or in reports, depending if
you think of a real time scorecard or historical KPI reports.
In this article, I’ll show
how you can use TBSM rules engine to calculate unique children count for a
grand parent level service instance. It is something that isn’t really documented
at all and the case isn’t very popular but in case you need it, you can find it
here in this material.
In this material I will use
the following hierachy of three templates:
- T_NetworkSite – acting as grandparent template level
- T_Interface – acting as parent template level
- T_Router – acting as child template level
Interface a parent to a
Router? – You may ask. It is not really what’s being promoted in various
documents, definitely not something documented here:
Well, this depends very much
on what and how you want to present in TBSM dashboards. So it depends on what
is your busienss service about. The example in the article I mention above is
concentrating more on VPN services:
Figure 1.
Source: https://www.ibm.com/support/knowledgecenter/api/content/nl/en-us/SSSPFK_6.1.1.3/com.ibm.tivoli.itbsm.doc/bsma/10/images/bsma_cust_sm_network_topology.jpg
In my example, I’m
concentraing on Layer 2 connectivity, in other words: I cannot connect to my
network site or it is unavailable if all router interfaces are down. All router
interfaces can be down and the routers themselves can be up – it doesn’t matter,
it means the same thing for the service:
an outage. Automatically, if whole routers get switched off, the interfaces
will be switched off too so my network site will be unavailable too.
Figure 2.
Templates hierarchy used in this material
The desired effect is the following:
- There is one grandparent KrakowSite
- There are 2 routers in total
- There are 4 interfaces in total, 2 per each of routers
In other words, KrakowSite should report to run 4 installed
interfaces but 2 router devices only. The next scorecard is something we will
be building during this exercise.
Figure 4.
Target scorecard to build
Before I continue, I will need to introduce a HeartBeat and
PassToTBSM concept.
PassToTBSM and Heartbeat
PassToTBSM is an Impact function that can be used to send
any data from Netcool/Impact policy straight to TBSM. It doesn’t have to be
same Impact as Impact running jointly with TBSM on the same server, it can be a
standalone Impact server too (but I haven’t tried that). It can also be both
Impact 6.1 or Impact 7.1 (announced not to have PassToTBSM but I hear it’s
still there, not tested by myself though).
A policy that sends data to TBSM with PassToTBSM function
can be as follows:
Seconds = GetDate();
Time = LocalTime(Seconds, "HH:mm:ss"); ev=NewEvent("TBSMTreeRuleHeartbeatService"); ev.timestamp = String(Time); ev.bsm_identity = "AnyChild"; PassToTBSM(ev); |
So we construct an IPL policy in which we take the current
time (it is important to have at least one changing value, I’ll explain why in
another article on this blog) and specify service instance identifier that
affected service instance is expected to have defined for its incoming status
rules or numerical rules. Because I’m going to affect two routers: RouterA and
RouterB, I specify something generic like “AnyChild”. I could also send two
events to TBSM, one with ev.bsm_identity=”RouterA” and the other with
ev.bsm_identity=”RouterB”. In a case of large implementations it is easier to
specify something generic like AnyChild and add such an identifier to every
service instance automatically during an import process via SCR API/XMLtoolkit.
Let me call the policy with TBSMTreeRulesHeartbeat.
Figure 5.
Impact service to run the heartbeat policy
Make note.
Alternatively a data fetcher could be used, which also can be scheduled to run
every 30 seconds or even once a day at 12:00 AM or at another time, however I
wanted to show PassToTBSM function in action and also in large solution cases
you may not want to involve an SQL SELECT statement against any database to
simply run such a heartbeat function. Alternatively you could create a policy
fetcher, but then you need more skills to do it since there’s no UI for that in
TBSM.
Make note. Such
a service doesn’t really needs to be added to any of Impact projects.
Now, in order to use such a
service and policy in a numerical rule in TBSM, you do two things: you set that
service as the data source and set mapping. I have created my HeartbeatRule in
TBSM with the following settings:
Then in Customize Fields form you should have:
Save this rule to your LEAF template:
And the last thing: don’t forget to make sure your service instances have “AnyChild” instance identifier specified:
Why is it for? You may ask.
The answer is: We will be calculating unique number of grand
children in one of TBSM functions. All functions in TBSM need a trigger which
is an input value that changes, in order to return a fresh value. If the input
value doesn’t change, you’ll not see a new value on the output. It can be the
same value, but your rule won’t work if you don’t trigger it from outside
somehow. Example? Sure:
On the next level in templates hierarchy there will be
NumberOfRouters rule defined (and the heartbeat rule too):
Let’s see inside the NumberOfRouters rule:
This rule will return the output value from the function NumberOfAllChildren defined in the policy NumericalAttributeFunctions.ipl every time the HeartbeatRule triggers it.
In other words, the number of routers below interfaces won’t
change in output of this function, even if it really changes (grows, reduces)
unless the rule is kicked again.
So you need that extra rule on the children level like
HeartbeatRule running periodically every 30 seconds and returning a random
timestamp every time to ensure a different output value every time it runs.
Why so much hassle, you may say?
Why not to use ServiceInstace.NUMCHILDREN inside a
policy-based numerical formula?
Well, first of all, Numerical formula is also a rule that
also needs a trigger to run. Every rule in TBSM needs a trigger to run. I can
dedicate a special post to that topic.
Second of all, I do use ServiceInstance.NUMCHILDREN, check out
my policy function:
function NumberOfAllChildren(ChildrenStatusArray, AllChildrenArray, ServiceInstance, Status) {
Status = ServiceInstance.NUMCHILDREN; } |
So this policy, I mean this function, will return the
NUMCHILDREN value any time you trigger the rule.
The main reason for that hassle is that unfortunately but
you cannot use NUMCHILDREN directly on a scorecard, you only can return it in rules.
And rules need a trigger. NUMCHILDREN isn’t also an additional attribute, which
could be shown directly in JazzSM dashboard.
Is it clear? I know, it’s bit weird, but just at the first
sight.
You may also doubt: why am I using ServiceInstance.NUMCHILDREN?
Is there any other attribute to return same value? Why am I using TIP, not
JazzSM in my examples at all? The answers are: there’s no additional attribute
that you could return in JazzSM straight, without wrapping it with a rule (and
you cannot return an additional attribute without packing it in a rule in TIP)
to return anything like number of children. So you have two choices:
- Use ServiceInstance object’s field NUMCHILDREN – see above
- Use a policy that will iterate through an array of children objects of your service instance and return the array’s length.
As you can see, still a policy, so still a numerical
aggregation rule or a numerical formula rule must be used. So there’s no other
way really. So rules are your way and you need to trigger them.
Recalculate correct number of objects after server restart
There’s an alternative to the Heartbeat rule, from TBSM
6.1.1 FP2 you can run this policy and associate it with the server start or run
it from time to time manually or schedule it with an Impact service, there are
two policies actually, one is for all nodes and the other just for leafs.
All nodes
|
Leafs
|
USE_SHARED_SCOPE;
Type="StateModel";
Filter = "RECALCSTATENODESLEAF";
log("Recalc Leaf Node Only. Policy Start." );
GetByFilter(Type, Filter, false);
log("Recalc Leaf Node Only. Policy Finish." );
|
USE_SHARED_SCOPE;
Type="StateModel";
Filter = "RECALCSTATENODESALL";
log("Recalc All Nodes. Policy Start." );
GetByFilter(Type, Filter, false);
log("Recalc All Nodes. Policy Finish." );
|
This alternative is documented here:
The difference between my heartbeat solution and the policy
documented above is that my heartbeat function is selective, I decide which
elements of the service tree will be recalculated (not just leafs but also not
the entire service tree) and when (not just during a restart but every now and
then). This is important, because change in number of children on some intermediate
levels may occur independently on changes in number of children on the leaf
level and I still need to trigger that change. Same time it’s an effort for
TBSM to recalculate the whole tree, especially in case I have 100k instances in
my service tree. That’s why I prefer to make it selective, so I use Heartbeat
concept.
Unique grandchildren count rule
Now once we have the children count rule created and
triggered, it’s time to get the unique grandchildren count rule.
What’s the difference?
It’s simple, you don’t want to take your children children
count, because every Interface will report it has 1 parent, which gives you 4
parents while the true number is just 2.
So you need a smart Impact policy that will calculate that for you.
Since we’re clear on what rules need to be created on the Router level and the Interfaces level, it’s time to present rules on the NetworkSite template level:
The NumberOfInterfaces rule is just to calculate the number of interface below the network site and inside of that rule the same function NumberOfAllChildren is being called from within NumericAttributeFunctions.ipl. The trigger should be the heartbeat rule again since number of interfaces inside the site may change independently. As you could see above, I defined a heartbeat rule inside the T_Interface template and I called it HeartbeatRuleIfc.
The more interesting rule is UniqueGrandChildren, which runs
another function from the NumericAttributeFunctions policy, called
NumberOfUniqueGrandChildren:
function NumberOfUniqueGrandChildren(ChildrenStatusArray, AllChildrenArray, ServiceInstance, Status) {
i = 0; uniquegrandchildrenarray = {}; log("MP: "+ServiceInstance); while(i<length(ServiceInstance.CHILDINSTANCEBEANS)) { child = ServiceInstance.CHILDINSTANCEBEANS[i]; log("Child "+child.DISPLAYNAME+" of grand parent "+ServiceInstance.DISPLAYNAME+" was found."); j = 0; while(j<length(child.CHILDINSTANCEBEANS)) { grandchild = child.CHILDINSTANCEBEANS[j]; log("Child "+grandchild.DISPLAYNAME+" of child "+child.DISPLAYNAME+" was found."); // Testing if currently analyzed child has already occurred k = 0; occurence = 0; while(k<length(uniquegrandchildrenarray)) { if(uniquegrandchildrenarray[k].SERVICEINSTANCEID == grandchild.SERVICEINSTANCEID) { // if yes, mark occurred = 1 (true) and finish analyzing further, so exit this loop occurence = 1; // k = length(uniquegrandchildrenarray); //uncomment this line to speed up in case of large child arrays log("Duplicate found: "+uniquegrandchildrenarray[k].SERVICEINSTANCEID+" and "+grandchild.SERVICEINSTANCEID+". Skipping."); } k=k+1; } if(occurence == 0) { uniquegrandchildrenarray = uniquegrandchildrenarray + grandchild; log("Unique grand child found: "+grandchild.DISPLAYNAME+". Added to the list."); } j = j + 1; } i = i + 1; } Status = length(uniquegrandchildrenarray); log("Grand parent "+ServiceInstance.DISPLAYNAME+" has # grand unique children "+Status); } |
So basically the function will traverse the service tree two
levels down to the grandchildren level and will start storing their number by
tracking their name. For every reoccurring name a counter will be incremented
by 1. For every new name, a new item will be added to an array. The size of the
array is the returned value.
Is it simple? Not so much, but it’s probably one of those functions you implement once and use all times, so it’s worth to learn about it. Let’s see the rule at the end:
So this is your desired effect:
I hope that you like this type of small hints on how to achieve something useful in TBSM, if so, please comment and I'll try to post as man of this type of posts as I can. Thanks!
No comments:
Post a Comment