Tivoli Business Service
Manager can calculate amazing things for you, if you only need them. This is
thanks to the powerful rules engine being the key part of TBSM as well as the
Netcool/Impact policies engine running
just under the hood together with every TBSM edition. You can present
your calculation results later on on a dashboard or in reports, depending if
you think of a real time scorecard or historical KPI reports.
In this article, I’ll show
how to calculate a total event count throughout multi-level service tree. It is
something that TBSM isn’t doing right after a fresh install because it doesn’t
provide you with the right rules out of the box, however TBSM doesn’t have also
any predefined service tree available to you so in order to see this working
you’d need to do both: add the rules to your service templates and import or
create by hand your service tree structure to test this.
In this material, I’ll create
a simple, multi-level service tree consisting of 3 levels of instances and I’ll
use my own defined template T_Regions, but in order to repeat this exercise you
can also simply reuse the template SCR_ServiceComponentRawStatusTemplate, which
comes with every TBSM installation and is widely used in integrations with
Tivoli Application Dependency Discovery Manager (TADDM). They key thing is that
your template is to:
-
Have at least 1
incoming status rule
-
Be in use across
the whole service tree, so all service instances on all levels in your service
tree implement that template.
Figure 2. Incoming Status Rule body
used in this excercise
Figure 3.
Simple service tree used in this material
Make note. This
document is trying to implement already existing functionality, means
calculating the total number of events on every service tree level which result
is stored in numRawEventsInt parameter. This parameter can be visible as the
last value in the RAD_prototype widget being used typically on Custom Canvases
on TBSM dashboards created in Tivoli Integrated Portal. But that parameter
value isn’t accessible for numerical rules or policies for further processing.
Figure 4. numRawEventsInt value used
on RAD_prototype widget
The newest add-on to TBSM, the debug Spy tools, also offer a
parameter per every service tree level, called Matching Events. However that
value is correct too, it also isn’t accessible from numerical rules or
policies.
Make note. There
is a BSM Accelerator template, called BSMAccelerator_EventCount which was
designed to present the correct number of events for every service instance,
however it was tailored to BSM Accelerator needs and service tree structure and
isn’t scalable for potentially endlessly high service trees. However, some of
the concepts introduced in order to support the BSM Accelerator package, will
be covered in this document. If you want to read more, see this document:
Make note. TBSM 6.1.1
FP3 is a prerequisite for all rules described in this material to work
correctly. However it is highly recommended to install Fix Pack 4 or higher for
ensuring the latest improvements.
TBSM runs an Impact service called TBSMOMNIbusEventReader
which comes with the product out of the box and is responsible for reading
events in Netcool/OMNIbus on a regular basis (every 3000 miliseconds by
default) and finding events to be processed by TBSM by using its special
predefined filter.
Here’s the default filter:
(Class
<> 12000) AND (Type <> 2) AND ((Severity <>
RAD_RawInputLastValue) or (RAD_FunctionType = '')) AND (RAD_SeenByTBSM = 0)
AND (BSM_Identity <> '')
|
All events which pass that filter get processed further by
TBSM service template rules, actually their special kind called Incoming Status
Rules. The most typical incoming status rule, predefined inside SCR_ServiceComponentRawStatusTemplate template, called
ComponentRawEventStatusRule has a precondition, called a discriminator, which
filters out all events filtered in previously by the event reader, which don’t
have one of the following classes:
·
TPC Rules(89200),
·
IBM Tivoli Monitoring Agent(87723),
·
Predictive Events(89300),
·
IBM Tivoli Monitoring(87722),
·
Default Class(0),
·
TME10tecad(6601),
·
Tivoli Application Dependency Discovery
Manager(87721),
·
Precision [Start](8000),
·
MTTrapd(300),
·
Precision [End](8049)
Make note. In my
example my Incoming Status rule will simply expect just Default Class (0) in
all my test events.
This is not the end. There’s one more filter. It is called
event identification field and by default TBSM will look for its value in
event’s field called BSM_Identity. Value that is expected in that field comes
from every service instance event identifier, which by default is the same as
service instance name. So the event identifiers for my simple service tree will
be the following:
Service instance
name
|
Event identifier
|
Europe
|
Europe
|
Poland
|
Poland
|
Malopolska
|
Malopolska
|
I will not discuss in this material about how to maintain
event identifiers, how many event identifiers you can have, how to set up event
identifiers in XMLtoolkit configuration files (if you’re interested in those
topics, please see my private blog entry:
http://www.marcinpaluch.pl/wordpress/?p=231).
I will also not discuss here on how the event severity may affect service
instance status, I go defaults here in my example, but I will not focus on that
area in this material at this time.
To sum it up: there are 3 filters your event has to pass
before it affects your service instance:
a) The
TBSMOMNIbusEventReader’s filter
b) The
Incoming Status Rule discriminator / event class filter
c) The
event identifier
If your event made it through all the filters, you can call
it a service instance affecting event.
It doesn’t have to mean your event has to change your
service instance status, it only means that your event was processed by the
Incoming Status Rule implemented in your service instance’s template. If you
use TBSM 6.1.1 FP4, you can use Service Model Spy tool to see that your
Incoming Status rule updated various attributes like Matching Events (number),
Max Event Status (Event’s severity) and a timestamp of time when the rule
processed the event.
The Matching Events parameter is what I’ll be calling in
this material the EventCount.
Now, why Multilevel event count?
Every service instance can have its own individual EventCount.
Every level of the service tree can contain more than one service instance and
the best way to sum them up is to calculate their sum on their parent level.
Then the parent service instance may also be used to implement a template with
Incoming Status rule and therefore it can have its own individual EventCount. And
then the parent service instance can be one of many parent service instances so
the best way of summing them up would be calculating TotalEventCount on the
grandparent service instance level. And so on. So the Multi-level event count
is a feature to calculate the total number of events being processed by TBSM in
the whole service tree.
Why would you need it? There are several use cases possible:
-
Your service tree consistency check and
verification - in a development phase,
to see if all levels of your service tree get processed correctly
-
Statistics – to see the current and true load on
TBSM by source, class, alert type, any event field in order to perform some
further analysis of event storms and their reasons
-
To monitor the operations – for example to
compare total events count to total acknowledged events count to total count of
events escalated by opening an incident etc.
-
To monitor service component qualities –
especially important in case of service components are managed or provided by a
3rd party provider – you can assess how much trouble all of them
give your company or your operations team
Once the use case is agreed, you may want to use this
material to start collecting your Total event counts in order to present them
on a dashboard or in a report. Let me now explain to you how to set it up.
As the first step let’s make sure I’m collecting the event
count for each of my service tree elements. Let me create my new rule: OwnEvents
count.
Make note. This
step has a prerequisite: I need to have my Incoming Status rule already
created.
This is perhaps not well documented, but every Incoming
Status Rule can be used in a Numerical Formula rule to get the number of events
processed. It is documented in this technote:
So let me do exactly what the technote does, this is my
numerical formula, my rule called OwnEvents, which will return only non-clear
events count via the default (since TBSM 6.1.1 FP1) Incoming Status Rule’s
parameter NumEventsSevGE2. Whenever my Incoming Status Rule has processed
another event with severity 1 or higher, the output of my numerical formula
will refresh and increase by 1.
Figure 5. OwnEvents rule settings
And on my scorecard:
Figure 6. OwnEvents in a scorecard
Let’s send a test event to the last level now:
Figure 7. Sending test event
Figure 8. Test event settings
Figure 9. OwnEvents after sending test event
As you could see the events severity was passed through the
whole service tree up, that is why the icon in the Events column changed color
to Purple from bottom level right to the top one.
After sending a critical event to the 2nd level
the icons from the 2nd level to the top one changed their color to
red.
Figure 10. OwnEvents after sending 2nd
test event
Make note. In
order to perform this exercise, I haven’t created a status propagation rule.
And I will not!
Take a look at the OwnEvents column. Even if status was
propagated through the service tree from bottom to the top, the OwnEvents rule
worked for every level individually. Europe shows bad Events noticed but
OwnEvents column shows 0 events affected that level.
Now, let’s try to make every level aware of events happening
on the level below it.
Prepare such a policy:
/* trigger_totalevents */
log("Triggered: "+ServiceInstance.STATEMODELNODE.trigger_totalevents.Value);
Status = 0;
si = ServiceInstance.SERVICEINSTANCENAME+"
("+ServiceInstance.DISPLAYNAME+")";
if(ServiceInstance.STATEMODELNODE.count_ownevents.Value <> NULL) {
Status =
Int(ServiceInstance.STATEMODELNODE.count_ownevents.Value);
}
log("Service instance: "+si+" own events count:
"+Status);
i = 0;
while (ServiceInstance.CHILDINSTANCEBEANS[i] <> NULL) {
ci =
ServiceInstance.CHILDINSTANCEBEANS[i].SERVICEINSTANCENAME+"
("+ServiceInstance.CHILDINSTANCEBEANS[i].DISPLAYNAME+")";
if(ServiceInstance.CHILDINSTANCEBEANS[i].NUMCHILDREN > 0) {
grandChildEvents = 0;
if(ServiceInstance.CHILDINSTANCEBEANS[i].STATEMODELNODE.count_totalevents.Value
<> NULL) {
grandChildEvents =
Int(ServiceInstance.CHILDINSTANCEBEANS[i].STATEMODELNODE.count_totalevents.Value);
}
log("Service instance: "+si+",
child: "+ci+" children events: "+grandChildEvents);
Status = Status + grandChildEvents;
} else {
childOwnEvents = 0;
if(ServiceInstance.CHILDINSTANCEBEANS[i].STATEMODELNODE.count_ownevents.Value
<> NULL) {
childOwnEvents =
Int(ServiceInstance.CHILDINSTANCEBEANS[i].STATEMODELNODE.count_ownevents.Value);
}
log("Service instance: "+si+",
child: "+ci+" own events: "+childOwnEvents);
Status = Status + childOwnEvents;
log("Service instance: "+si+",
child: "+ci+" children events: "+childOwnEvents);
}
i = i + 1;
}
log("Service instance: "+si+" total events count:
"+Status);
|
I called this policy count_totalevents_policy_1
and I saved it within numerical formula rule, called count_totalevents.
Figure 11. TotalEvents rule settings
Same time, create another, numerical aggregation rule, in
which you will point to the just created rule within the same template. Make
sure you name your rule exactly same way as indicated in the header of the
policy in the numerical formula just created a moment ago.
Figure 12. TriggerTotalEvents rule
settings
You should have by the end the following list of rules in
your template:
Figure 13. T_Regions template complete rules
set
Make note. After
creating a template rule pointing to the same template as a child template, the
template will disappear from the templates list in the service navigator
portlet. In order to fix it, add that template to any other template by
associating via any type of status propagation rule:
Figure 14. T_Regions template associated
to templateFinder
And this is the result that should occur at the end in your
scorecard:
Figure 15. TotalEvents column in a
scorecard
It looks like the concept works fine. Let’s try it further.
Let’s send another event from every level, starting from Malopolska to Poland
and to Europe.
Figure 16. TotalEvents column after
sending more test events
It looks correct, every level OwnEvent count increased by 1
and I have in total 5 events in the entire tree, just 2 on the leaf, another 2
in the middle and just 1 on the root level.
Let’s add a new level below Malopolska and call it Krakow.
This will simulate expanding the service tree i.e. in case of a fresh import
from TADDM or CMDB.
Figure 17. OwnEvents and TotalEvents
after adding a new child service
Let’s now send a new event, Severity 3 to Krakow:
Figure 18. OwnEvents and TotalEvents
after sending a test event to the new child service
The new event affected Krakow and was included in all level
calculations of the TotalEvents count correctly. Let’s now create one level
above the all, called Earth:
Figure 19. OwnEvents and TotalEvents
after adding a new root service
Adding Earth didn’t change the TotalEvents count of course,
but the current max was reflected on the new top/root level. Let’s send another
event to Poland:
Figure 20. OwnEvents and TotalEvents
after sending test events to the new root service
The total event count increased by 1 again. Only Europe’s
OwnEvents column value increased by 1.
Let’s now remove Krakow from the Leaf level to see if the
TotalEvents count will decrease by 1 now:
Figure 21. OwnEvents and TotalEvents
after removing the child service from the tree
So it is correct again, after removing Krakow with its 1
event the overall TotalEvents count dropped by 1 too and equals now 6.