Friday, May 13, 2016

Triggering TBSM Rules

Introduction

Tivoli Business Service Manager can calculate amazing things for you, if you only need them. This is thanks to the powerful rules engine being the key part of TBSM as well as the Netcool/Impact policies engine running  just under the hood together with every TBSM edition. You can present your calculation results later on on a dashboard or in reports, depending if you think of a real time scorecard or historical KPI reports.
In this article, I’ll show how you can make TBSM to process its inputs by triggering various template rules in various ways. It is something that isn’t really well documented or at least it isn’t well documented in a single place.

Status, numerical and text rules triggers

In this chapter  I’ll show three kinds of rules (status, numerical and text) and I’ll show how TBSM triggers them, so processes the input data, runs them and returns the outputs.
In general these three techniques always kick off TBSM rules based on the same two conditions: time and a new value. Here they are:
  1. Omnibus Event Reader service (for incoming status rules or numerical rules or text rules based on Omnibus ObjectServer as the data feed)
  2. TBSM data fetchers (for numerical or text rules with fetcher based data feed)
TBSM/Impact services of type Policy activator  (using Impact policy with PassToTBSM function calls to send data to numerical or text rules)
Figure 1. Three techniques of triggering status, numerical and text rules in TBSM



Make note. There are ITM policy fetchers as well as highly undocumented any-policy fetchers configurable in TBSM, I’ll not comment on them in this material, however their basis is just like any fetcher: the time.
Let’s take a look at the first type of rules triggers, the most popular, the OMNIbus Event Reader service in Impact, widely used in TBSM to process events stored in ObjectServer against the service trees.

OMNIbus Event Reader

Omnibus Event Reader is an Impact service running regularly by default every 3 seconds in order to connect Netcool/OMNIbus for getting events stored in the Objectserver memory that might be affecting TBSM service tree elements. It selects the events based on the following default filter:

(Class <> 12000) AND (Type <> 2) AND ((Severity <> RAD_RawInputLastValue) or (RAD_FunctionType = '')) AND (RAD_SeenByTBSM = 0) AND (BSM_Identity <> '')

(Severity <> RAD_RawInputLastValue) is the condition ensuring that event will be tested against containing a new value in the Severity field comparing to the previous event.
The Event Reader itself can be found in Impact UI server within Services page in TIP among other services included in the Impact project called TBSM_BASE:
Figure 2. Configuration of TBSMOMNIbusEventReader service


Make note. TBSM allows you configuring other event readers but you can use just one same time.
All incoming status rules use this Event Reader by default. There’s embedded mapping between the name of the Event Reader and the Data source field in Status/Text/Numerical rules, hence just “ObjectServer” caption occurs in the new rule form:
Figure 3. Screenshot of New Incoming status rule form

Policy activators  

Impact services of type Policy activators simply call a policy every defined period of time and run that policy.
Figure 4. Screenshot of a policy activator service configuration for TBSMTreeRuleHeartbeat

The policy needs to be created earlier. In order to trigger TBSM rules, it is required to call PassToTBSM() function and contain in its argument an object. Let’s say this is my TBSMTreeRulesHeartbeat policy:

Seconds = GetDate();
Time = LocalTime(Seconds, "HH:mm:ss");
randNum = Random(100000);

ev=NewEvent("TBSMTreeRuleHeartbeatService");
ev.timestamp = String(Time);
ev.bsm_identity = "AnyChild";
ev.randNum =  randNum;
PassToTBSM(ev);
log(ev); 

In my example, a new value generates every time when the policy is activated by using GetDate() function. Pay attention to field called ev.bsm_identity. I’ll be referring to this field later on. For simplification this field has always value “AnyChild”. 

Make note. Unlike TBSM OMNIbus event reader, the policy activating services, also policies themselves don’t have to be included in TBSM_BASE impact project.
Netcool/Impact policies give you freedom of reaching to any data source, via SQL or SOAP or XML or REST or command like or anyhow you like, and processing any data you can see useful to process in TBSM. The only requirement is passing that data to TBSM via PassToTBSM function.

TBSM data fetchers

TBSM datafetchers combine an Impact policy with DirectSQL function call and Impact policy activator service. Additionally, data fetchers have a mini concept of schedule, it means you can set their run time interval to specific hour and minute and run it once a day (i.e. 12:00 am daily). It also allows postponing or rushing next runtime in case of longer taking SQL queries.
Make note. Data fetchers can be SQL fetchers, ITM policy fetchers or any policy fetchers, unfortunately the TBSM GUI was never fully adjusted to reconfigure ITM policy fetcher and was never enabled to allow configuration of any policy fetchers and in case of the last two you’ve got to perform few manual and command-line level steps instead. Fortunately, the PassToTBSM function available in Impact 6.x policies can be used instead and the any-policy fetchers aren’t that useful anymore.
Every fetcher by default runs every 5 minutes.
Figure 5. Screenshot of data fetcher Heartbeat fetcher


In the example presented on the screenshot above data fetcher connects DB2 database and runs DB2 SQL query. The new value is ensured every time in this example by calling the native DB2 random function and the whole query is the following:

select 100000*rand() as "randNum", 'AnyChild' as "bsm_identity" from sysibm.sysdummy1

Pay attention to field bsm_identity. It always returns the same value “AnyChild”, just like the policy explained before.


Triggering status, numerical, text and formula rules  

In the previous chapter I presented various rules triggering methods. It’s now time to show how those triggers work in real. I’ll create a template with 5 numerical and text rules (I don’t want to change any instance status, so I won’t create any incoming status rule this time) and additionally with 3 supportive formulas and I’ll present the output values of all those rules on a scorecard in Tivoli Integrated Portal. Below you can see my intended template with rules:
Figure 6. Screenshot of t_triggerrulestest template with rules

OMNIbus Event Reader-based rules

Let’s start from 2 rules utilizing the TBSM OMNibus Event Reader service. Like I said, I won’t use a status rule, but I’ll use one text and one numerical rule, in order to return the last event’s severity and the last event’s summary. Before I do it, let me configure my simnet probe that will be sending random events to my ObjectServer. My future service instance implementing the t_triggerrulestest template will be called TestInstance (or at least it will have such a value in its event identifiers), I want one event of each type and various probability that such event will be sent:

#######################################################################
#
# Format:
#
#       Node Type Probability
#
# where Node        => Node name for alarm
#       Type        =>  0 - Link Up/Down
#                      1 - Machine Up/Down
#                      2 - Disk space
#                      3 - Port Error
#       Probability => Percentage (0 - 100%)
#
#######################################################################

TestInstance 0 10
TestInstance 1 15
TestInstance 2 20
TestInstance 3 25


Let’s see if it works:
Figure 7. Screenshot of Event Viewer presenting test events sent by simnet probe.

So now my two rules, I marked important settings with green. So my Data feed is ObjectServer, the source is SimNet probe, the field containing service identifiers is Node and the output value taken back is Severity:
Figure 8. Screenshot of the rule numr_event_lastseverity form
Same here, this time the rule is Text rule and I have a fancy output expression, it is:


'AlertGroup: '+AlertGroup+', AlertKey: '+AlertKey+', Summary: '+Summary

The rule itself:

Figure 9. Screenshot of the rule txtr_event_lastsummary

Fetcher-based rule

That was easy. Now something little bit more complicated, the data fetcher. I already have my datafetcher created and show above in this material, let’s check if it works fine, the logs shows the fetcher is fine, i.e. if it fetches data every 5 minutes:

1463089289272[HeartbeatFetcher]Fetching from TBSMComponentRegistry has started on Thu May 12 23:41:29 CEST 2016
1463089289287[HeartbeatFetcher]Fetched successfully on Thu May 12 23:41:29 CEST 2016 with 1 row(s)
1463089289287[HeartbeatFetcher]Fetching duration: 00:00:00s
1463089289412[HeartbeatFetcher]1 row(s) processed successfully on Thu May 12 23:41:29 CEST 2016. Duration: 00:00:00s. The entire process took 00:00:00s
1463089589412[HeartbeatFetcher]Fetching from TBSMComponentRegistry has started on Thu May 12 23:46:29 CEST 2016
1463089589427[HeartbeatFetcher]Fetched successfully on Thu May 12 23:46:29 CEST 2016 with 1 row(s)
1463089589427[HeartbeatFetcher]Fetching duration: 00:00:00s
1463089589558[HeartbeatFetcher]1 row(s) processed successfully on Thu May 12 23:46:29 CEST 2016. Duration: 00:00:00s. The entire process took 00:00:00s

And the data preview looks good too:
Figure 10. The Heartbeat fetcher output data preview
My next rule will be just one and it will be a numerical rule to return the randNum value, I marked important settings in green again, so I select HeartbeatFetcher as the Data Feed, I select bsm_identity as service event identifier and randNum as the output value:
Figure 11. Screenshot of numr_fetcher_randNum rule

Policy activated rules

Last but not least I will create two rules getting data from my policy activated by my custom Impact Service. I did show the policy and the service in the previous chapter, let’s just make sure they both work ok. This is how the service works, every 5 minutes I get my policy activated and every time it returns another value in the randNum field:

12 maj 2016 23:41:29,652: [TBSMTreeRulesHeartbeat][pool-7-thread-87]Parser log: (PollerName=TBSMTreeRuleHeartbeatService, randNum=71648, timestamp=23:41:29, bsm_identity=AnyChild)
12 maj 2016 23:46:29,673: [TBSMTreeRulesHeartbeat][pool-7-thread-87]Parser log: (PollerName=TBSMTreeRuleHeartbeatService, randNum=8997, timestamp=23:46:29, bsm_identity=AnyChild)
12 maj 2016 23:51:29,674: [TBSMTreeRulesHeartbeat][pool-7-thread-91]Parser log: (PollerName=TBSMTreeRuleHeartbeatService, randNum=73560, timestamp=23:51:29, bsm_identity=AnyChild)
12 maj 2016 23:56:29,700: [TBSMTreeRulesHeartbeat][pool-7-thread-91]Parser log: (PollerName=TBSMTreeRuleHeartbeatService, randNum=60770, timestamp=23:56:29, bsm_identity=AnyChild)
13 maj 2016 00:01:29,724: [TBSMTreeRulesHeartbeat][pool-7-thread-92]Parser log: (PollerName=TBSMTreeRuleHeartbeatService, randNum=55928, timestamp=00:01:29, bsm_identity=AnyChild)

Let’s then create the rules. I will have two rules again, one numerical and one text. The numerical rule will have the TBSMTreeRuleHeartbeatService as the Data Feed, the bsm_identiy field will be selected as the service event identifier field and randNum field will be my output:
Figure 12. Screenshot of numr_heartbeat_randNum rule

Make note. Every time you add another field to your policy activated by your service, make sure that new field is mapped to the right data type in the Customize Fields form. You will need to add that field first:
Figure 13. Screenshot of CustomizedFields form

And the second rule looks the following, this time it is a text rule and I return the timestamp value:
Figure 14. Screenshot of txtr_heartbeat_lasttime rule

Formula rules  

The last rules I’ll create will be three formula, policy-based, text rules. Each of them will go to another rules create previously and “spy” on their activity. Let’s see the first example:

Figure 15. Screenshot of nfr_triggered_by_events rule


This rule will use policy and will be a text rule. It is important to mark those fields before continuing, later the Text Rule field greys out and inactivates. After ticking both fields I click on the Edit policy button. All three rules will look the same at this level; hence I won’t include all 3 screenshots as just their names will differ. I’ll create another policy in IPL for each of them. Here’s the mapping:

Rule name
Policy name
nfr_triggered_by_events
p_triggered_by_events
nfr_triggered_by_fetcher
p_triggered_by_fetcher
nfr_triggered_by_service
p_triggered_by_service

Each policy of those three will look similar, it will just look after different rules created so far. The p_triggered_by_events policy will do this:
// Trigger numr_event_lastseverity
// Trigger txtr_event_lastsummary

Seconds = GetDate();
Time = LocalTime(Seconds, "HH:mm:ss");

Status = Time;
log("TestInstance triggered by SimNet events at "+Time);
log("Output value of numr_event_lastseverity: "+InstanceNode.numr_event_lastseverity.Value);
log("Output value of txtr_event_lastsummary: "+InstanceNode.txtr_event_lastsummary.Value);

Policy p_triggered_by_fetcher will do this:
// Trigger numr_fetcher_randnum

Seconds = GetDate();
Time = LocalTime(Seconds, "HH:mm:ss");

Status = Time;
log("TestInstance triggered by HeartbeatFetcher at "+Time);
log("Output value of numr_fetcher_randnum: "+InstanceNode.numr_fetcher_randnum.Value); 

And policy p_triggered_by_service this:

// Trigger numr_heartbeat_randnum
// Trigger txtr_heartbeat_lasttime

Seconds = GetDate();
Time = LocalTime(Seconds, "HH:mm:ss");

Status = Time;
log("TestInstance triggered by TBSMHeartbeatService at "+Time);
log("Output value of numr_heartbeat_randnum: "+InstanceNode.numr_heartbeat_randnum.Value);
log("Output value of txtr_heartbeat_lasttime: "+InstanceNode.txtr_heartbeat_lasttime.Value);

You can notice that each policy starts from a comment section. This is important. This is how the formula rules get triggered. It is enough to mention another rule by its name in a comment to trigger your formula every time that other referred rule returns another output value. This is why we have the randnum-related rules in every formula. Those rules are designed to return another value every time they run. Just the first rule isn’t the same, but I assume it will trigger every time a combination of Summary, AlertGroup and AlertKey fields value in the source event changes.
The trigger numerical or text rules are also mentioned later when these policies call them and obtain their output values in order i.e. to put those values into log file. But it is not necessary to trigger my formulas. I log those trigger text and numerical rules outputs for troubleshooting purposes only.
The purpose of these 3 policies and 3 formulas is to report on time when the numerical or text values worked for the last time.
Below you can see an example of one of the policies in actual form.
Figure 16. Screenshot of one of the policies text

Testing the triggers

Now it’s time to test the trigger rules, the triggers and troubleshoot in case 

Triggering rules in normal operations
In order to do that we will need a service instance implementing our newly created template. I call it TestInstance and this is its base configuration:
Figure 17. Screenshot of configuration of service instance TestInstance – templates

It is important to make sure that the right event identifiers were selected in the Identification Fields tab. I need to remember what bsm_identity I set in all rules, it is mainly AnyChild (the policy and the fetcher) and TestInstance (the SimNet probe).
Figure 18. Screenshot of configuration of service instance TestInstance – identifiers


Make note. In real life your instance will have its individual identifiers like TADDM GUID or ServiceNow sys_id. It is important to find a match between that value and the affecting events or matching KPIs and if this is necessary to define new identifiers, which will ensure such a match.
Let’s see if it works in general. I created a scorecard and a page to present on all values of my new instance. I’ll put on top also fragments of my formula related policy logs to see if the data returned in policies and timestamps match:
Figure 19. Screenshot of the scorecard with policy logs on top

Let’s take a closer look at the first section. Same event arrived just once but since formula is triggered by two rules it was triggered twice in a row. In general the last event arrived at 20:27:00 and its severity was 4 (major) and the summary was on Link Down. Both rules numr_event_lastseverity and txtr_event_lastsummary triggered m formula correctly.

The next section is about the fetcher, the latest random number is 16589,861 and the rule numr_fetcher_randnum triggered my formula correctly. 
The last is the policy activated rule and formula, let’s see. This time I have two rules again and they both triggered the formula correctly. The last run was at 20:26:30. I have two different randnum values in both runs. This is caused by referring to numerical rules twice in my formula policy.

Triggering rules after TBSM server restart



I’ll now show a problem that TBSM has with rules that are not triggered by any trigger. Like I said in the previous chapters, TBSM needs rules to be triggered every now and then but also the value to change between triggers, in order to return the value again.
It causes some issues in TBSM server restart situations. If a value hasn’t changed before server restart and is still the same after the restart, TBSM may be unable to display or return it correctly, if the rule used to return it is not triggered. Server restart situation means clearing TBSM memory so no output values of no rules are preserved for after the server restart.
Here’s an example. I’ll create one new formula rule with this policy in my test template:

Status = ServiceInstance.NUMCHILDREN;
log("Service instance "+ServiceInstance.SERVICEINSTANCENAME+" ("+ServiceInstance.DISPLAYNAME+") has children: "+Status);

Here’s the rule itself:
Figure 20. Screenshot of nfr_numchildren rule configuration
As next step, I add one more column to my scorecard to show the output of the newly created rule. I also created 3 service instances and made them a child to TestInstance instance.

Figure 21. Screenshot of the scorecard shows 3 children count

Also my formula policy log will return number 3:

13 maj 2016 12:17:56,664: [p_numchildren][pool-7-thread-34 [TransBlockRunner-2]]Parser log: Service instance TestInstance (TestInstance) has children: 3

Now, if I only restart TBSM server, the value shown will be 0 and I will see no new entry in the log:
Figure 22. Screenshot of the scorecard after server restart shows 0 children

I can change this situation by taking one of three actions:
  1. Adding new or removing old child instances from Testinstance
  2. Modifying the formula policy
  3. Introducing a trigger to the formula policy

However two first options don’t protect me from another server restart situation occurring.
Let’s say I add another child instance. This is how the scorecard will look like:

Before the restart
However, after the restart


Or I may want to modify my rule. After saving my changes, the value will display correctly. However another server restart will reset it back to 0 again anyway.
So let’s say I change my policy to this:
Status = ServiceInstance.NUMCHILDREN;
log("Service instance "+ServiceInstance.SERVICEINSTANCENAME+" ("+ServiceInstance.DISPLAYNAME+") has children: "+Status);
log("Service instance ID: "+ServiceInstance.SERVICEINSTANCEID);

And my policy log now contains two entries per run:
13 maj 2016 12:56:07,023: [p_numchildren][pool-7-thread-4 [TransBlockRunner-1]]Parser log: Service instance TestInstance (TestInstance) has children: 4
13 maj 2016 12:56:07,023: [p_numchildren][pool-7-thread-4 [TransBlockRunner-1]]Parser log: Service instance ID: 163

But the situation before and after the restart is the same:
Before the restart
After the restart


It’s not a frequent situation though. If your rules are normally event-triggered rules or data fetcher triggered rules you can expect frequent updates to your output values even after your TBSM server restarts.  Just in case you want to present an output value in a rule that normally is not triggered, make sure you include a reference to a trigger in your rule. Let’s use one of the triggers we configured previously in my new formula policy:

// Trigger by numr_fetcher_randnum    
// Trigger by numr_heartbeat_randnum

Status = ServiceInstance.NUMCHILDREN;
log("Service instance "+ServiceInstance.SERVICEINSTANCENAME+" ("+ServiceInstance.DISPLAYNAME+") has children: "+Status);
log("Service instance ID: "+ServiceInstance.SERVICEINSTANCEID); 

You can already notice by following the log of the policy that there are many entries per every policy run, precisely as many entries as many times the formula was triggered by one of the trigger rules.
The first pair of entries was added after saving the rule. The next 2 pairs were added in result of the triggers working fine:


13 maj 2016 13:22:36,837: [p_numchildren][pool-7-thread-3 [TransBlockRunner-1]]Parser log: Service instance TestInstance (TestInstance) has children: 4
13 maj 2016 13:22:36,837: [p_numchildren][pool-7-thread-3 [TransBlockRunner-1]]Parser log: Service instance ID: 163
13 maj 2016 13:24:12,833: [p_numchildren][pool-7-thread-3 [TransBlockRunner-1]]Parser log: Service instance TestInstance (TestInstance) has children: 4
13 maj 2016 13:24:12,833: [p_numchildren][pool-7-thread-3 [TransBlockRunner-1]]Parser log: Service instance ID: 163
13 maj 2016 13:24:18,465: [p_numchildren][pool-7-thread-3 [TransBlockRunner-1]]Parser log: Service instance TestInstance (TestInstance) has children: 4
13 maj 2016 13:24:18,465: [p_numchildren][pool-7-thread-3 [TransBlockRunner-1]]Parser log: Service instance ID: 163
 
Let’s make the final test, so TBSM server restart:
Before the restart
After the restart


This excercise ends my material for tonight. I'll continue in another material on triggering the status propagation rules and numeric aggregation rules. See you soon!
mp

No comments: