Service management: 2016

Thursday, December 15, 2016

Time related functions in Netcool/Impact

This time I'd like to share on time functions I developed over time for my TBSM and Impact projects, for both service models and dashboards. I hope you'll like and use them too, if so, please let me know!

1. Getting time zone and Daylight Saving Time offsets straight

There's a brilliant page collecting so many useful examples of Netcool/Impact policy functions and code fragments, called Policy Language Code Library, you can find it here.

However, I missed a simple function which would give me my local time zone and daylight saving time offset in those examples.

Many times we work with timestamps representing status changes or other metrics, expressed in GMT+0 time zone, no matter where the server was located. There are inconveniences with that approach in my view, I personally prefer UNIX Epoch timestamps. But anyhow, GMT+0 is a time you also cannot present as-is to your clients on dashboards designed for them to use every day. So let's try to get the local time out of that!

There's a bit of inconvenience I haven't solved yet: obtaining the local time zone of the server automatically from a JavaCall so if you know the way, please post it here. What I did, I assumed the IPL developers implementing this code would know the server time zone and simply pass it in function parameters. In a case developers wouldn't know how time zones are being called or which time zones are being actually supported, there's a function to get them all, actually it was also mentioned in the Policy Language Code Library too, but let me quot it here too:

function getTimeZones(tzList)
{
tzList = javacall("java.util.TimeZone", null, "getAvailableIDs", {});
}

Executing this function and logging the output will give you all array of time zones your Impact server supports, it's 617 positions, so I won't paste an example in order to not make my post so long ;)

I'll be using time zone "Poland" in my examples further on.
So here we go: I'm using examples from Policy Language Code Library, just put this into something I need. The only input parameter is the local server time zone and two output parameters are the offsets: time zone offset vs. GMT+0 and Day Light Saving Time offset.

The offsets in the function above are in milliseconds. If you need the same value but in seconds, simply divide it by 1000:

function f_getZoneAndDSTOffsetsInSeconds(i_TZ,o_ZoneOffset,o_DSTOffset) {
   f_getZoneAndDSTOffsetsInMilis(i_TZ,o_ZoneOffset,o_DSTOffset);
   o_ZoneOffset=o_ZoneOffset/1000;
   o_DSTOffset=o_DSTOffset/1000;
}

2. Maximum duration of months in days

I'll focus on days since it's been the most useful for my dashboards and charts so far, but based on the example below you'll easily construct your own policy to calculate your any period duration in days, hours or minutes etc.

First, something trivial, but why not to simply find this post and copy-paste it to your code? ;)
Getting maximum number of days in any month. As input you need Year as integer (in format yyyy) and month as simple short integer value from range 1-12.

The next function will be bit different, it will return the current month maximum number of days. This is useful for a dashboard which shows the current month data and needs to spread across all days even if some of them are future.

Now, another bit different function giving you same information but for the month just passed. This is also useful for dashboards on which you want to compare this and last month performance of your services etc. This function needs to be smarter, as last month could be December last year which could be 1999 (or 2099). I'm not saying you'll need this function in A.D. 2100 but let's be correct and precise even in hopeless cases as a matter of having good principles ;)

If you want to test this function I have a bunch of examples for a month with 28, 29, 30 and 31 days:

n = GetDate();
//n = 1463305335; // Sun, 15 May 2016 09:42:15 GMT
//n = 1455529335; // Mon, 15 Feb 2016 09:42:15 GMT
//n = 1458034935; // Tue, 15 Mar 2016 09:42:15 GMT
//n = 1426412535; // Sun, 15 Mar 2015 09:42:15 GMT
//n = 953113335; // Wed, 15 Mar 2000 09:42:15 GMT
f_getLastMonthMaxDays(n,ylm,lm,lmmd);
log("ylm:"+ylm+", lm:"+lm+", llmd:"+lmmd);

3. Duration of days in seconds

A very useful thing to calculate precisely duration of your day in seconds.
Someone may say: hey, I know my day is 24 hours long, and every hour is 60 minutes long and every minute is 60 seconds long, which gives me 86400 seconds a day!
True, but let's also consider some countries in which time changes to winter time in October (or November) and back to summer time in March (or April).

Therefore, because of time change to something called Daylight Saving Time or back, days on Earth can be 23, 24 or 25 hours long!

What's the easiest way of calculating the real day duration in seconds?
My response is: comparing two UNIX epoch times considered as the end and the beginning of the day. By beginning of day I consider 00:00:00 time. In 24 hours clock the end of day is at 23:59:59. Let's skip milliseconds this time.

The function below returns actually two useful values: day duration in seconds and hours. You'll get 82,800 or 86,400 or 90,000 seconds long and respectively 23, 24 or 25 hours long days this way:

function f_getMaxDayDurationInSeconds(DayID, DayBeginningEpoch, DayEndEpoch, MaxDayDurationInSeconds, MaxDayFull3600sLongHours) {

   MaxDayDurationInSeconds = Int(DayEndEpoch - DayBeginningEpoch + 1);
   MaxDayFull3600sLongHours = (MaxDayDurationInSeconds/3600);
   log("The day "+DayID+" was "+MaxDayDurationInSeconds+" seconds long.");
   log("The day "+DayID+" was "+MaxDayFull3600sLongHours+" hours long.");
}

For calculating duration of months, which may have contained 23 or 25 or just 24 hours long days only, you can use a similar function:

function f_getMaxMonthDurationInSeconds(MonthID,MonthBeginningEpoch,MonthEndEpoch,MaxMonthDurationInSeconds) {
   MaxMonthDurationInSeconds = MonthEndEpoch-MonthBeginningEpoch+1;
   MaxMonthFull24HLongDays = Int(MaxMonthDurationInSeconds/86400);
   MaxMonthFull3600sLongHours = Int(((MaxMonthDurationInSeconds-MaxMonthFull24HLongDays*86400)/3600)+0.5);

   log("The month "+MonthID+" was "+MaxMonthDurationInSeconds+" seconds long.");
   log("The month "+MonthID+" was "+MaxMonthFull24HLongDays+" days and "+MaxMonthFull3600sLongHours+" hours long.");
}

This is especially useful for calculating your base for your services availability.

I hope you enjoy this post and if you have more examples or you find these examples I posted not working or incomplete, please let me know, thanks!

Friday, December 9, 2016

Availability charts in JazzSM based on TBSM service instance status data in Metric History DB

If you're not running any IBM Tivoli Monitoring (or APM) and so cannot plug TBSM's Tivoli Agent or start collecting data in Tivoli Data Warehouse, but still would like to collect this data historically and present useful charts like Availability[%], outage counts, MTTRs and total outages, see these examples I built for my project. I use TBSM metric history DB, Impact UI data provider and JazzSM and of course any TBSM service model, means templates and instances. I don't need ITM/TDW (I have another version of this solution which can run based on ITM/TDW too ;) Let me know if you're interested in implementing these charts in your TBSM/ JazzSM!

Availability, outage count, MTTR and outage duration charts built in TBSM and JazzSM.

Thursday, November 10, 2016

Quick cheat sheet on creating an ObjectServer in Windows

Quick creation:

C:\Program Files\IBM\tivoli\netcool\omnibus\bin>nco_dbinit.exe -server COL1

Error Opening Log File C:\Program Files\IBM\tivoli\netcool\omnibus\LOG\NCOMS_audit_file.log
2016-11-10T11:32:48: Information: I-MEM-003-001: Creating 'C:\Program Files\IBM\tivoli\netcool\omnibus\db\COL1'
2016-11-10T11:32:48: Information: I-MEM-003-004: Using lock to chain ratio of '32'.
2016-11-10T11:32:49: Information: I-STO-104-006: Creating internal database persist
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table persist.restrictions
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table catalog.connections
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table catalog.properties
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table catalog.profiles
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table catalog.restrictions
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table catalog.security_permissions
2016-11-10T11:32:49: Information: I-DBI-003-001: Processing system file: C:\Program Files\IBM\tivoli\netcool\omnibus\ETC\system.sql
2016-11-10T11:32:49: Information: I-STO-104-006: Creating internal database security
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table security.owners
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table security.users
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table security.roles
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table security.role_grants
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table security.permissions
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table security.groups
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table security.group_members
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table security.restriction_filters
2016-11-10T11:32:49: Information: I-STO-104-006: Creating internal database transfer
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table transfer.users
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table transfer.roles
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table transfer.role_grants
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table transfer.permissions
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table transfer.groups
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table transfer.group_members
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table transfer.restrictions
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table transfer.security_restrictions
2016-11-10T11:32:49: Information: I-DBI-003-002: Processing application file: C:\Program Files\IBM\tivoli\netcool\omnibus\ETC\application.sql
2016-11-10T11:32:49: Information: I-STO-104-005: User root creating database alerts
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table alerts.status
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table alerts.journal
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table alerts.details
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table alerts.objclass
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table alerts.objmenus
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table alerts.objmenuitems
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table alerts.resolutions
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table alerts.conversions
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table alerts.col_visuals
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table alerts.colors
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table alerts.iduc_messages
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table alerts.application_types
2016-11-10T11:32:49: Information: I-STO-104-005: User root creating database service
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table service.status
2016-11-10T11:32:49: Information: I-STO-104-005: User root creating database tools
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table tools.actions
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table tools.action_access
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table tools.menus
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table tools.menu_items
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table tools.menu_defs
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table tools.prompt_defs
2016-11-10T11:32:49: Information: I-STO-104-005: User root creating database master
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table master.permissions
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table master.profiles
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table master.servergroups
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table master.class_membership
2016-11-10T11:32:49: Information: I-STO-104-005: User root creating database iduc_system
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table iduc_system.channel
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table iduc_system.channel_interest
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table iduc_system.channel_summary
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table iduc_system.channel_summary_cols
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table iduc_system.iduc_stats
2016-11-10T11:32:49: Information: I-STO-104-005: User root creating database custom
2016-11-10T11:32:49: Information: I-STO-104-005: User root creating database precision
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table precision.entity_service
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table precision.service_details
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table precision.service_affecting_event
2016-11-10T11:32:49: Information: I-DBI-003-003: Processing desktop file: C:\Program Files\IBM\tivoli\netcool\omnibus\ETC\desktop.sql
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table persist.procedures
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table catalog.procedures
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table catalog.sql_procedures
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table catalog.external_procedures
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table catalog.procedure_parameters
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table persist.signals
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table persist.triggers
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table persist.trigger_groups
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table catalog.primitive_signals
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table catalog.primitive_signal_parameters
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table catalog.triggers
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table catalog.database_triggers
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table catalog.signal_triggers
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table catalog.temporal_triggers
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table catalog.trigger_groups
2016-11-10T11:32:49: Information: I-STO-104-015: Creating table catalog.trigger_stats
2016-11-10T11:32:49: Information: I-DBI-003-004: Processing automation file: C:\Program Files\IBM\tivoli\netcool\omnibus\ETC\automation.sql
2016-11-10T11:32:49: Information: I-AUT-005-005: User root creating trigger group system_watch
2016-11-10T11:32:49: Information: I-AUT-005-005: User root creating trigger group connection_watch
2016-11-10T11:32:49: Information: I-AUT-005-005: User root creating trigger group security_watch
2016-11-10T11:32:49: Information: I-AUT-005-005: User root creating trigger group default_triggers
2016-11-10T11:32:49: Information: I-AUT-005-005: User root creating trigger group compatibility_triggers
2016-11-10T11:32:49: Information: I-AUT-005-005: User root creating trigger group audit_config
2016-11-10T11:32:49: Information: I-AUT-005-005: User root creating trigger group iduc_triggers
2016-11-10T11:32:49: Information: I-AUT-005-005: User root creating trigger group gateway_triggers
2016-11-10T11:32:49: Information: I-AUT-005-005: User root creating trigger group primary_only
2016-11-10T11:32:49: Information: I-AUT-005-005: User root creating trigger group sae
2016-11-10T11:32:49: Information: I-AUT-005-011: User root creating signal trigger system_watch_startup
2016-11-10T11:32:49: Information: I-AUT-005-011: User root creating signal trigger system_watch_shutdown
2016-11-10T11:32:49: Information: I-AUT-005-011: User root creating signal trigger connection_watch_disconnect
2016-11-10T11:32:49: Information: I-AUT-005-011: User root creating signal trigger connection_watch_connect
2016-11-10T11:32:49: Information: I-AUT-005-011: User root creating signal trigger security_watch_security_failure
2016-11-10T11:32:49: Information: I-AUT-005-009: User root creating database trigger new_row
2016-11-10T11:32:49: Information: I-AUT-005-009: User root creating database trigger deduplication
2016-11-10T11:32:49: Information: I-AUT-005-009: User root creating database trigger state_change
2016-11-10T11:32:49: Information: I-AUT-005-009: User root creating database trigger deduplicate_details
2016-11-10T11:32:49: Information: I-AUT-005-009: User root creating database trigger service_insert
2016-11-10T11:32:49: Information: I-AUT-005-009: User root creating database trigger service_reinsert
2016-11-10T11:32:49: Information: I-AUT-005-009: User root creating database trigger service_update
2016-11-10T11:32:49: Information: I-AUT-005-010: User root creating temporal trigger clean_details_table
2016-11-10T11:32:49: Information: I-AUT-005-010: User root creating temporal trigger clean_journal_table
2016-11-10T11:32:49: Information: I-AUT-005-010: User root creating temporal trigger delete_clears
2016-11-10T11:32:49: Information: I-AUT-005-010: User root creating temporal trigger escalate_off
2016-11-10T11:32:49: Information: I-AUT-005-010: User root creating temporal trigger expire
2016-11-10T11:32:49: Information: I-AUT-005-010: User root creating temporal trigger flash_not_ack
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table alerts.problem_events
2016-11-10T11:32:49: Information: I-AUT-005-010: User root creating temporal trigger generic_clear
2016-11-10T11:32:49: Information: I-PRO-007-004: User root creating procedure send_email
2016-11-10T11:32:49: Information: I-AUT-005-010: User root creating temporal trigger mail_on_critical
2016-11-10T11:32:49: Information: I-AUT-005-005: User root creating trigger group automatic_backup_system
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table alerts.backup_state
2016-11-10T11:32:49: Information: I-AUT-005-009: User root creating database trigger backup_state_integrity
2016-11-10T11:32:49: Information: I-AUT-005-010: User root creating temporal trigger automatic_backup
2016-11-10T11:32:49: Information: I-AUT-005-011: User root creating signal trigger backup_succeeded
2016-11-10T11:32:49: Information: I-AUT-005-011: User root creating signal trigger backup_failed
2016-11-10T11:32:49: Information: I-AUT-005-005: User root creating trigger group profiler_triggers
2016-11-10T11:32:49: Information: I-AUT-005-011: User root creating signal trigger profiler_toggle
2016-11-10T11:32:49: Information: I-AUT-005-011: User root creating signal trigger profiler_report
2016-11-10T11:32:49: Information: I-AUT-005-011: User root creating signal trigger profiler_group_report
2016-11-10T11:32:49: Information: I-AUT-005-005: User root creating trigger group trigger_stat_reports
2016-11-10T11:32:49: Information: I-AUT-005-011: User root creating signal trigger trigger_stats_report
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table master.stats
2016-11-10T11:32:49: Information: I-STO-104-014: User root creating table master.activity_stats
2016-11-10T11:32:49: Information: I-AUT-104-001: User root creating primitive signal stats_reset
2016-11-10T11:32:49: Information: I-AUT-005-005: User root creating trigger group stats_triggers
2016-11-10T11:32:49: Information: I-AUT-005-015: User root altering trigger group stats_triggers
2016-11-10T11:32:49: Information: I-AUT-005-009: User root creating database trigger new_status_inserts
2016-11-10T11:32:49: Information: I-AUT-005-009: User root creating database trigger dedup_status_inserts
2016-11-10T11:32:49: Information: I-AUT-005-009: User root creating database trigger journal_inserts
2016-11-10T11:32:49: Information: I-AUT-005-009: User root creating database trigger details_inserts
2016-11-10T11:32:49: Information: I-AUT-005-010: User root creating temporal trigger statistics_gather
2016-11-10T11:32:49: Information: I-AUT-005-010: User root creating temporal trigger statistics_cleanup
2016-11-10T11:32:49: Information: I-AUT-005-011: User root creating signal trigger stats_reset
2016-11-10T11:32:50: Information: I-STO-104-014: User root creating table alerts.login_failures
2016-11-10T11:32:50: Information: I-AUT-005-011: User root creating signal trigger disable_user
2016-11-10T11:32:50: Information: I-AUT-005-011: User root creating signal trigger reset_user
2016-11-10T11:32:50: Information: I-AUT-005-010: User root creating temporal trigger disable_inactive_users
2016-11-10T11:32:50: Information: I-AUT-005-010: User root creating temporal trigger webtop_compatibility
2016-11-10T11:32:50: Information: I-AUT-005-015: User root altering trigger group audit_config
2016-11-10T11:32:50: Information: I-AUT-005-011: User root creating signal trigger audit_config_create_object
2016-11-10T11:32:50: Information: I-AUT-005-011: User root creating signal trigger audit_config_alter_object
2016-11-10T11:32:50: Information: I-AUT-005-011: User root creating signal trigger audit_config_drop_object
2016-11-10T11:32:50: Information: I-AUT-005-011: User root creating signal trigger audit_config_alter_property
2016-11-10T11:32:50: Information: I-AUT-005-011: User root creating signal trigger audit_config_permission_denied
2016-11-10T11:32:50: Information: I-AUT-005-009: User root creating database trigger audit_config_create_class
2016-11-10T11:32:50: Information: I-AUT-005-009: User root creating database trigger audit_config_alter_class
2016-11-10T11:32:50: Information: I-AUT-005-009: User root creating database trigger audit_config_drop_class
2016-11-10T11:32:50: Information: I-AUT-005-009: User root creating database trigger audit_config_create_menu
2016-11-10T11:32:50: Information: I-AUT-005-009: User root creating database trigger audit_config_alter_menu
2016-11-10T11:32:50: Information: I-AUT-005-009: User root creating database trigger audit_config_drop_menu
2016-11-10T11:32:50: Information: I-AUT-005-009: User root creating database trigger audit_config_create_conv
2016-11-10T11:32:50: Information: I-AUT-005-009: User root creating database trigger audit_config_alter_conv
2016-11-10T11:32:50: Information: I-AUT-005-009: User root creating database trigger audit_config_drop_conv
2016-11-10T11:32:50: Information: I-AUT-005-009: User root creating database trigger audit_config_create_col_visual
2016-11-10T11:32:50: Information: I-AUT-005-009: User root creating database trigger audit_config_alter_col_visual
2016-11-10T11:32:50: Information: I-AUT-005-009: User root creating database trigger audit_config_drop_col_visual
2016-11-10T11:32:50: Information: I-AUT-005-009: User root creating database trigger audit_config_create_tool
2016-11-10T11:32:50: Information: I-AUT-005-009: User root creating database trigger audit_config_alter_tool
2016-11-10T11:32:50: Information: I-AUT-005-009: User root creating database trigger audit_config_drop_tool
2016-11-10T11:32:50: Information: I-AUT-005-009: User root creating database trigger audit_config_create_prompt
2016-11-10T11:32:50: Information: I-AUT-005-009: User root creating database trigger audit_config_alter_prompt
2016-11-10T11:32:50: Information: I-AUT-005-009: User root creating database trigger audit_config_drop_prompt
2016-11-10T11:32:50: Information: I-PRO-007-004: User root creating procedure jinsert
2016-11-10T11:32:50: Information: I-AUT-005-010: User root creating temporal trigger iduc_messages_tblclean
2016-11-10T11:32:50: Information: I-PRO-007-004: User root creating procedure automation_disable
2016-11-10T11:32:50: Information: I-PRO-007-004: User root creating procedure automation_enable
2016-11-10T11:32:50: Information: I-AUT-005-011: User root creating signal trigger backup_counterpart_down
2016-11-10T11:32:50: Information: I-AUT-005-011: User root creating signal trigger backup_counterpart_up
2016-11-10T11:32:50: Information: I-AUT-005-011: User root creating signal trigger backup_startup
2016-11-10T11:32:50: Information: I-AUT-005-011: User root creating signal trigger disconnect_iduc_missed
2016-11-10T11:32:50: Information: I-AUT-005-011: User root creating signal trigger iduc_stats_insert
2016-11-10T11:32:50: Information: I-AUT-005-009: User root creating database trigger deduplicate_iduc_stats
2016-11-10T11:32:50: Information: I-AUT-005-011: User root creating signal trigger iduc_stats_update
2016-11-10T11:32:50: Information: I-AUT-005-011: User root creating signal trigger pass_deletes
2016-11-10T11:32:50: Information: I-AUT-005-011: User root creating signal trigger resync_finished
2016-11-10T11:32:50: Information: I-AUT-005-010: User root creating temporal trigger update_service_affecting_events
2016-11-10T11:32:50: Information: I-DBI-003-006: Processing security file: C:\Program Files\IBM\tivoli\netcool\omnibus\ETC\security.sql
2016-11-10T11:32:50: Information: I-OBJ-100-008: User root creating role CatalogUser
2016-11-10T11:32:50: Information: I-OBJ-100-008: User root creating role AlertsUser
2016-11-10T11:32:50: Information: I-OBJ-100-008: User root creating role AlertsProbe
2016-11-10T11:32:50: Information: I-OBJ-100-008: User root creating role AlertsGateway
2016-11-10T11:32:50: Information: I-OBJ-100-008: User root creating role ChannelUser
2016-11-10T11:32:50: Information: I-OBJ-100-008: User root creating role DatabaseAdmin
2016-11-10T11:32:50: Information: I-OBJ-100-008: User root creating role AutoAdmin
2016-11-10T11:32:50: Information: I-OBJ-100-008: User root creating role SecurityAdmin
2016-11-10T11:32:50: Information: I-OBJ-100-008: User root creating role ToolsAdmin
2016-11-10T11:32:50: Information: I-OBJ-100-008: User root creating role DesktopAdmin
2016-11-10T11:32:50: Information: I-OBJ-100-008: User root creating role ISQL
2016-11-10T11:32:50: Information: I-OBJ-100-008: User root creating role ISQLWrite
2016-11-10T11:32:50: Information: I-OBJ-100-008: User root creating role ChannelAdmin
2016-11-10T11:32:50: Information: I-OBJ-100-011: User root creating group Probe
2016-11-10T11:32:50: Information: I-OBJ-100-011: User root creating group Gateway
2016-11-10T11:32:50: Information: I-OBJ-100-011: User root creating group ISQLWrite
2016-11-10T11:32:50: Information: I-OBJ-100-011: User root creating group ISQL
2016-11-10T11:32:50: Information: I-PRO-007-004: User root creating procedure setup_group_conversions
2016-11-10T11:32:50: Information: I-PRO-007-005: User root dropping procedure setup_group_conversions

C:\Program Files\IBM\tivoli\netcool\omnibus\bin>nco_store_resize.exe -server COL1 -messagelevel debug

2016-11-10T11:33:05: Information: I-MEM-003-004: Using lock to chain ratio of '32'.
2016-11-10T11:33:05: Debug: D-ETC-004-049: THREAD MGR: started thread REGIONCHKPT (0100EFF8)
2016-11-10T11:33:05: Debug: D-ETC-004-050: THREAD MGR: thread REGIONCHKPT (0100EFF8) running
2016-11-10T11:33:05: Information: I-REG-002-015: Restoring master_store from TAB file
2016-11-10T11:33:05: Debug: D-REG-002-010: CHKPT: started
2016-11-10T11:33:05: Information: I-REG-005-001: Verifing region 'master_store'...
2016-11-10T11:33:05: Debug: D-REG-002-011: CHKPT: sleeping ...
2016-11-10T11:33:05: Debug: D-REG-002-020: Attempting to sync region "master_store"
2016-11-10T11:33:05: Information: I-STR-001-006: Successfully extended store 'table_store' to 500MB

C:\Program Files\IBM\tivoli\netcool\omnibus\bin>nco_objserv.exe /install /cmdline "-name COL1" /instance COL1

Netcool/OMNIbus Object Server - Version 7.3.1
(C) Copyright IBM Corp. 1994, 2007

Installing the Netcool/OMNIbus Object Server (COL1) service...
The Netcool/OMNIbus Object Server (COL1) service was successfully installed

C:\Program Files\IBM\tivoli\netcool\omnibus\bin>net start NCOObjectServer$COL1

The Netcool/OMNIbus Object Server (COL1) service is starting.
The Netcool/OMNIbus Object Server (COL1) service was started successfully.

C:\Program Files\IBM\tivoli\netcool\omnibus\bin>isql.bat -S COL1 -U root -P ""

1> select count(*) from alerts.status;

2> go

COUNT( * )

-----------

(1 row affected)

Tuesday, June 14, 2016

In my next materials - discovering TBSM in TADDM and uploading back to TBSM

I plan to publish a new post, this time about TBSM as a business application in TADDM, discovered and modeled with the grouping composer in TADDM 7.2.2 and imported via XMLtoolkit to TBSM again for self-monitoring with the Tivoli Agent for TBSM. Stay tuned!

Sunday, May 15, 2016

TBSM 6.1.1 Fix Pack 4 - post-installation manual steps

New TBSM 6.1.1 Fix pack 4 was released by end of April as I informed you in my previous post.
You can find a post about it also on the IBM developerWorks site:
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli%20Business%20Service%20Manager1/page/Advanced%20Topics

Except obviously fixes the fix pack contains few new functionalities and some of them require a manual installation action from administrators. Are they worthwhile? Let me check with you.

1. IV65545 - TBSMEVENTREADER STOPS BECAUSE EVENT PROCESSOR BLOCKING ON RADEVENTSTORE

If you want to understand the context of the issue behind this fix (APAR), see here:
http://www-01.ibm.com/support/docview.wss?uid=swg1IV65545

If you let me quote:

"TBSM stops processing events because it is blocking on the TBSM
database radevent table.  Eventually all connections to the
database are used up and the processing stops.

This causes the event reader for TBSM stops reasding events
because the queue to process them is full - this is usually the
first symptom."

So we have a typical performance issue here, if there's too many events to process, because you have too many rules or service instances defined in TBSM, the data amount that needs to be stored in TBSM database, in table RADEVENTSTORE (typically a mapping between events and instances) will overwhelm the TBSM database if it was left as configured by default. A new, custom configuration is recommended to do instead.
Since the Fix pack Readme available here:
http://www-01.ibm.com/support/docview.wss?uid=swg24041505

misses the Microsoft Windows instructions, just in case, here's my recipe:
a) click Menu Start-> IBM DB2 -> DB2COPY1 (Default) (or whatever is your DB2 copy name)
b) choose Command Windows - Administrator
c) In the command line tool go to (assuming your TBSM_HOME is in Program Files):

cd "\Program Files\IBM\tivoli\tipv2\profiles\TIP
Profile\installedApps\TIPCell\isc.ear\sla.war\install"\

C:\Program Files\IBM\tivoli\tipv2\profiles\TIPProfile\installedApps\TIPCell\isc.
ear\sla.war\install>db2 connect to tbsm

   Database Connection Information

Database server        = DB2/NT64 10.1.0
SQL authorization ID   = MPALUCH
Local database alias   = TBSM

C:\Program Files\IBM\tivoli\tipv2\profiles\TIPProfile\installedApps\TIPCell\isc.
ear\sla.war\install>type AddRADEventStoreIndex.sql | db2
(c) Copyright IBM Corporation 1993,2007
Command Line Processor for DB2 Client 10.1.0

You can issue database manager commands and SQL statements from the command
prompt. For example:
    db2 => connect to sample
    db2 => bind sample.bnd

For general help, type: ?.
For command help, type: ? command, where command can be
the first few keywords of a database manager command. For example:
? CATALOG DATABASE for help on the CATALOG DATABASE command
? CATALOG          for help on all of the CATALOG commands.

To exit db2 interactive mode, type QUIT at the command prompt. Outside
interactive mode, all commands must be prefixed with 'db2'.
To list the current command option settings, type LIST COMMAND OPTIONS.

For more detailed help, refer to the Online Reference Manual.

db2 => DB20000I The SQL command completed successfully.
db2 => db2 => SQL2314W Some statistics are in an inconsistent state. The newly
collected
"INDEX" statistics are inconsistent with the existing "TABLE" statistics.
SQLSTATE=01650
db2 => db2 => DB20000I The RUNSTATS command completed successfully.
db2 =>
C:\Program Files\IBM\tivoli\tipv2\profiles\TIPProfile\installedApps\TIPCell\isc.
ear\sla.war\install>

Next, open your TIP with TBSM and Impact UI, go to folders System Configuration->Event Automation and select Data Model page. Next change project to TBSM_BASE and find TBSMDatabase data source.

And the last step is the lock timeout set to 30, re-use your DB2 session, if you lost it connect to TBSM database first and then set the lock timeout:

db2 => connect to tbsm

Database Connection Information

Database server        = DB2/NT64 10.1.0
SQL authorization ID   = MPALUCH
Local database alias   = TBSM

db2 => set current lock timeout 30
DB20000I The SQL command completed successfully.
db2 =>

2. Adding the new "Delete Service Instance" menu action.

Another manual action you want to take is adding this new action item to your right-click menu in your TIP-based ServiceInstance tree template. It is quite harsh to remove your selected instance especially if you have thousands of instances - going to that big container-like page with all instances and check boxes to select those which you want to install - is a hassle. Let's see if setting up the new menu item is fine and easy.
So basically this is what you want in your ServiceInstance.xml at the end:

<?xml version="1.0" encoding="UTF-8"?><treeTemplate name="ServiceInstance">
<columnList>
<column displayName="State"/>
<column displayName="Time"/>
<column displayName="Events"/>
</columnList>
<templateTreeMapping primaryTemplateName="DefaultTag">
<treeColumn displayName="State" attribute="serviceStatusImage"/>
<treeColumn displayName="Time" attribute="slaStatusImage"/>
<treeColumn displayName="Events" attribute="rawEventsImage"/>
<actionMapping clickType="~popupMenu" actionName="ShowServiceInstanceEditor"/>
<actionMapping clickType="~popupMenu" actionName = "DeleteServiceInstance"/>
<actionMapping clickType="~popupMenu" actionName="InstantiateOneHopServiceMap"/>
<actionMapping clickType="~popupMenu" actionName="ChooseChildrenTool"/>
<actionMapping clickType="~popupMenu" actionName="ShowMemberTemplates"/>
<actionMapping clickType="~popupMenu" actionName="ViewTools"/>
<actionMapping clickType="~popupMenu" actionName="IntegrationTools"/>
<actionMapping clickType="~popupMenu" actionName="MaintTools"/>
</templateTreeMapping>
</treeTemplate>

If, for any reason, something like this happens to you after you did put both artifacts back to DB2 and wanted to reinit your canvas:
C:\Program Files\IBM\tivoli\tbsm\bin>rad_reinitcanvas.bat tipadmin smartway
Final result - Failure : 1

simply restart your TIPProfile. It will do the job and after all you should get this new menu item available in the Service Navigator portlet in TIP while looking at Services:

What is slightly weird is that if you're in the service edit mode, even after deleting it it stays open in the editor:

No worries thought, even if you try to save it, TBSM won't let you do it.

3. The service SLAPrunePolicyActivator

This one is quite important. You may remember or be aware of this issue explained in here:
http://www.ibm.com/support/knowledgecenter/SSSPFK_6.1.1.3/com.ibm.tivoli.itbsm.doc/customization/bsmc_slac_maint.html

You're safe if you don't use SLA calculations in TBSM. But if you do, this article is nice enough to tell you which tables to clean manually yourself from time to time. So finally we have an automation doing that work for you! Let's see if it's easy to setup.

As the prerequisite you're to install new policies shipped with TBSM 6.1.1 FP4 (just one is new, others come from the previous 6.1.1 Fix packs).Unfortunately something went wrong on my TBSM on Windows:

C:\Program Files\IBM\tivoli\tbsm\bin>ApplyPoliciesFor611FP4.bat <user> <pass>
"running rad_discover_schema on SLA_OutageView"
Final result - Failure : 1
"running rad_discover_schema on SLA_ArchiveView"
Final result - Failure : 1
PUSH policy: ResEnrichReaderRecycle.js --project TBSM_BASE
Final result - Failure : 1
PUSH policy: AV_GetColorForSeverity.ipl --project TBSM_BASE
Final result - Failure : 1
PUSH policy: SLAPrune.ipl --project TBSM_BASE
Final result - Failure : 1
trigger policy: ResEnrichReaderRecycle
Final result - Failure : 1
NOTE: If there are failures during the PUSH, it is likely due to policy being l
ocked!
if this is the case, unlock the policy and re-run this script!
C:\Program Files\IBM\tivoli\tbsm\bin>

So I'll use the Impact UI instead:

Once I have the policy uploaded, I configure the service. I think it makes great sense to put that service also into the TBSM_BASE project. Let see how it works. The service's log looks not much self-explanatory:

May 14, 2016 11:36:22 PM CEST[SLAPrunePolicyActivator]SLAPrunePolicyActivator: executed
May 14, 2016 11:36:22 PM CEST[SLAPrunePolicyActivator]SLAPrunePolicyActivator: started

and the policy log is also not very talkative about the effort or how many entries where there in those 7 tables that it pruned:

14 maj 2016 23:36:22,401: [SLAPrune][pool-7-thread-63]Parser log: Starting policy SLAPrune
14 maj 2016 23:36:22,401: [SLAPrune][pool-7-thread-63]Parser log: Calling BatchDelete for DataType SLA_DurationCountWatchers with filter: timestamp < 1460669782401
14 maj 2016 23:36:22,417: [SLAPrune][pool-7-thread-63]Parser log: Calling BatchDelete for DataType SLA_TasksOfDurationCountWatcher with filter: timestamp < 1460669782401
14 maj 2016 23:36:22,417: [SLAPrune][pool-7-thread-63]Parser log: Calling BatchDelete for DataType SLA_IncidentCountWatchers with filter: timestamp < 1460669782401
14 maj 2016 23:36:22,432: [SLAPrune][pool-7-thread-63]Parser log: Calling BatchDelete for DataType SLA_TasksOfIncidentCountWatchers with filter: timestamp < 1460669782401
14 maj 2016 23:36:22,432: [SLAPrune][pool-7-thread-63]Parser log: Calling BatchDelete for DataType SLA_CumulDurationRuleState with filter: timestamp < 1460669782401
14 maj 2016 23:36:22,448: [SLAPrune][pool-7-thread-63]Parser log: Calling BatchDelete for DataType SLA_CumulRuleArchive with filter: timestamp < 1460669782401
14 maj 2016 23:36:22,448: [SLAPrune][pool-7-thread-63]Parser log: Calling BatchDelete for DataType SLA_EventStore with filter: timestamp < 1460669782401
14 maj 2016 23:36:22,464: [SLAPrune][pool-7-thread-63]Parser log: Finished policy SLAPrune and exiting now...

Too late I realized that I could check on the radeventstore data volume before I ran the service. I could only check on that after I did that:

db2 => select count(*) from tbsmbase.radeventstore
1
-----------
0
1 record(s) selected.
db2 =>

Well. next time. I believe it works ok ;)

4. TBSMDBChecker

Last not least, not sure why post-install steps are spread across the readme file, this one isn't really together with those mentioned above and you need to read the readme quite carefully in order to not miss it.

Copying the TBSMDB checker tools from Fix pack zip to TBSM tbsmdb directory could be done by the installer and it isn't, maybe because tbsmdb directory belongs to the TBSM DB management tool and it might be owned by another user, not the one which runs TBSM.

TBSM_Check_DB.bat, TBSM_Check_DB.sh and TBSMDBChecker.props are identical with those that you already have. So the only difference is in tbsmDatabaseChecker.xml which contains few extra sections related to checking on user providing a password. Checking TBSM DB is rather a simple task and should look like this (if everything works ok):

C:\Program Files\IBM\tivoli\tbsmdb\bin>TBSM_Check_DB.bat
Calling ant script tbsmDatabaseChecker.xml to check the databases
    [input] Enter administrative database userid. This user should have SYSADMIN
or SECADMIN authority and will be used to check authority of data server userid
:
db2admin
Enter the password for administrative database userid db2admin:
    [input] Enter database userid that will be used to connect the database from
the TBSM data server: [db2admin]

Enter the password for data server database userid db2admin:
    [input] Enter database(s) to be checked:
    [input]    S = Service Model
    [input]    H = Metric History
    [input]    M = Metric Marker
    [input] [ALL]
S
     [echo] Executing main TBSM Database checker task at 2016/05/15 00:17:13.994

     [echo] Checking of the following TBSM database was successful: TBSM
     [echo] See the stdout and stderr logs for more detail. The logs can be foun
d in: C:\Program Files\IBM\tivoli\tbsmdb\logs.
     [echo] TBSM Database checker ended at 2016/05/15 00:17:17.075

BUILD SUCCESSFUL
Total time: 25 seconds
C:\Program Files\IBM\tivoli\tbsmdb\bin>

If you want to learn more about the TBSM database checker tool, read this technote:
http://www-01.ibm.com/support/docview.wss?uid=swg24031960

5. Discovery library toolkit and Monitoring Agent for TBSM fix pack installation

The installation itself is just like described, I'm adding these steps to the list just to make the picture of the post-installation manual steps complete. There's no APAR for the Monitoring Agent for TBSM in the fix pack 4 hence still the one included in Fix Pack 1 is still the latest one and if you have it already installed, you're done with it. Speaking about the discovery library toolkit, there's an update in fix pack 4 to install.

This is it. A few of manual post-install steps, always easy to overlook when reading a long readme, I hope that someone finds this overview useful.

mp

Friday, May 13, 2016

Triggering TBSM Rules

Introduction

Tivoli Business Service Manager can calculate amazing things for you, if you only need them. This is thanks to the powerful rules engine being the key part of TBSM as well as the Netcool/Impact policies engine running just under the hood together with every TBSM edition. You can present your calculation results later on on a dashboard or in reports, depending if you think of a real time scorecard or historical KPI reports.

In this article, I’ll show how you can make TBSM to process its inputs by triggering various template rules in various ways. It is something that isn’t really well documented or at least it isn’t well documented in a single place.

Status, numerical and text rules triggers

In this chapter I’ll show three kinds of rules (status, numerical and text) and I’ll show how TBSM triggers them, so processes the input data, runs them and returns the outputs.

In general these three techniques always kick off TBSM rules based on the same two conditions: time and a new value. Here they are:

Omnibus Event Reader service (for incoming status rules or numerical rules or text rules based on Omnibus ObjectServer as the data feed)
TBSM data fetchers (for numerical or text rules with fetcher based data feed)

TBSM/Impact services of type Policy activator (using Impact policy with PassToTBSM function calls to send data to numerical or text rules)

Figure 1. Three techniques of triggering status, numerical and text rules in TBSM

Make note. There are ITM policy fetchers as well as highly undocumented any-policy fetchers configurable in TBSM, I’ll not comment on them in this material, however their basis is just like any fetcher: the time.

Let’s take a look at the first type of rules triggers, the most popular, the OMNIbus Event Reader service in Impact, widely used in TBSM to process events stored in ObjectServer against the service trees.

OMNIbus Event Reader

Omnibus Event Reader is an Impact service running regularly by default every 3 seconds in order to connect Netcool/OMNIbus for getting events stored in the Objectserver memory that might be affecting TBSM service tree elements. It selects the events based on the following default filter:

(Class <> 12000) AND (Type <> 2) AND ((Severity <> RAD_RawInputLastValue) or (RAD_FunctionType = '')) AND (RAD_SeenByTBSM = 0) AND (BSM_Identity <> '')

(Severity <> RAD_RawInputLastValue) is the condition ensuring that event will be tested against containing a new value in the Severity field comparing to the previous event.

The Event Reader itself can be found in Impact UI server within Services page in TIP among other services included in the Impact project called TBSM_BASE:

Figure 2. Configuration of TBSMOMNIbusEventReader service

Make note. TBSM allows you configuring other event readers but you can use just one same time.

All incoming status rules use this Event Reader by default. There’s embedded mapping between the name of the Event Reader and the Data source field in Status/Text/Numerical rules, hence just “ObjectServer” caption occurs in the new rule form:

Figure 3. Screenshot of New Incoming status rule form

Policy activators

Impact services of type Policy activators simply call a policy every defined period of time and run that policy.

Figure 4. Screenshot of a policy activator service configuration for TBSMTreeRuleHeartbeat

The policy needs to be created earlier. In order to trigger TBSM rules, it is required to call PassToTBSM() function and contain in its argument an object. Let’s say this is my TBSMTreeRulesHeartbeat policy:

Seconds = GetDate();
Time = LocalTime(Seconds, "HH:mm:ss");
randNum = Random(100000);

ev=NewEvent("TBSMTreeRuleHeartbeatService");
ev.timestamp = String(Time);
ev.bsm_identity = "AnyChild";
ev.randNum = randNum;
PassToTBSM(ev);
log(ev);

In my example, a new value generates every time when the policy is activated by using GetDate() function. Pay attention to field called ev.bsm_identity. I’ll be referring to this field later on. For simplification this field has always value “AnyChild”.

Make note. Unlike TBSM OMNIbus event reader, the policy activating services, also policies themselves don’t have to be included in TBSM_BASE impact project.

Netcool/Impact policies give you freedom of reaching to any data source, via SQL or SOAP or XML or REST or command like or anyhow you like, and processing any data you can see useful to process in TBSM. The only requirement is passing that data to TBSM via PassToTBSM function.

TBSM data fetchers

TBSM datafetchers combine an Impact policy with DirectSQL function call and Impact policy activator service. Additionally, data fetchers have a mini concept of schedule, it means you can set their run time interval to specific hour and minute and run it once a day (i.e. 12:00 am daily). It also allows postponing or rushing next runtime in case of longer taking SQL queries.

Make note. Data fetchers can be SQL fetchers, ITM policy fetchers or any policy fetchers, unfortunately the TBSM GUI was never fully adjusted to reconfigure ITM policy fetcher and was never enabled to allow configuration of any policy fetchers and in case of the last two you’ve got to perform few manual and command-line level steps instead. Fortunately, the PassToTBSM function available in Impact 6.x policies can be used instead and the any-policy fetchers aren’t that useful anymore.

Every fetcher by default runs every 5 minutes.

Figure 5. Screenshot of data fetcher Heartbeat fetcher

In the example presented on the screenshot above data fetcher connects DB2 database and runs DB2 SQL query. The new value is ensured every time in this example by calling the native DB2 random function and the whole query is the following:

select 100000*rand() as "randNum", 'AnyChild' as "bsm_identity" from sysibm.sysdummy1

Pay attention to field bsm_identity. It always returns the same value “AnyChild”, just like the policy explained before.

Triggering status, numerical, text and formula rules

In the previous chapter I presented various rules triggering methods. It’s now time to show how those triggers work in real. I’ll create a template with 5 numerical and text rules (I don’t want to change any instance status, so I won’t create any incoming status rule this time) and additionally with 3 supportive formulas and I’ll present the output values of all those rules on a scorecard in Tivoli Integrated Portal. Below you can see my intended template with rules:

Figure 6. Screenshot of t_triggerrulestest template with rules

OMNIbus Event Reader-based rules

Let’s start from 2 rules utilizing the TBSM OMNibus Event Reader service. Like I said, I won’t use a status rule, but I’ll use one text and one numerical rule, in order to return the last event’s severity and the last event’s summary. Before I do it, let me configure my simnet probe that will be sending random events to my ObjectServer. My future service instance implementing the t_triggerrulestest template will be called TestInstance (or at least it will have such a value in its event identifiers), I want one event of each type and various probability that such event will be sent:

#######################################################################

# Format:

# Node Type Probability

# where Node => Node name for alarm

# Type => 0 - Link Up/Down

# 1 - Machine Up/Down

# 2 - Disk space

# 3 - Port Error

# Probability => Percentage (0 - 100%)

#######################################################################

TestInstance 0 10

TestInstance 1 15

TestInstance 2 20

TestInstance 3 25

Let’s see if it works:

Figure 7. Screenshot of Event Viewer presenting test events sent by simnet probe.

So now my two rules, I marked important settings with green. So my Data feed is ObjectServer, the source is SimNet probe, the field containing service identifiers is Node and the output value taken back is Severity:

Figure 8. Screenshot of the rule numr_event_lastseverity form

Same here, this time the rule is Text rule and I have a fancy output expression, it is:

'AlertGroup: '+AlertGroup+', AlertKey: '+AlertKey+', Summary: '+Summary

The rule itself:

Figure 9. Screenshot of the rule txtr_event_lastsummary

Fetcher-based rule

That was easy. Now something little bit more complicated, the data fetcher. I already have my datafetcher created and show above in this material, let’s check if it works fine, the logs shows the fetcher is fine, i.e. if it fetches data every 5 minutes:

1463089289272[HeartbeatFetcher]Fetching from TBSMComponentRegistry has started on Thu May 12 23:41:29 CEST 2016

1463089289287[HeartbeatFetcher]Fetched successfully on Thu May 12 23:41:29 CEST 2016 with 1 row(s)

1463089289287[HeartbeatFetcher]Fetching duration: 00:00:00s

1463089289412[HeartbeatFetcher]1 row(s) processed successfully on Thu May 12 23:41:29 CEST 2016. Duration: 00:00:00s. The entire process took 00:00:00s

1463089589412[HeartbeatFetcher]Fetching from TBSMComponentRegistry has started on Thu May 12 23:46:29 CEST 2016

1463089589427[HeartbeatFetcher]Fetched successfully on Thu May 12 23:46:29 CEST 2016 with 1 row(s)

1463089589427[HeartbeatFetcher]Fetching duration: 00:00:00s

1463089589558[HeartbeatFetcher]1 row(s) processed successfully on Thu May 12 23:46:29 CEST 2016. Duration: 00:00:00s. The entire process took 00:00:00s

And the data preview looks good too:

Figure 10. The Heartbeat fetcher output data preview

My next rule will be just one and it will be a numerical rule to return the randNum value, I marked important settings in green again, so I select HeartbeatFetcher as the Data Feed, I select bsm_identity as service event identifier and randNum as the output value:

Figure 11. Screenshot of numr_fetcher_randNum rule

Policy activated rules

Last but not least I will create two rules getting data from my policy activated by my custom Impact Service. I did show the policy and the service in the previous chapter, let’s just make sure they both work ok. This is how the service works, every 5 minutes I get my policy activated and every time it returns another value in the randNum field:

12 maj 2016 23:41:29,652: [TBSMTreeRulesHeartbeat][pool-7-thread-87]Parser log: (PollerName=TBSMTreeRuleHeartbeatService, randNum=71648, timestamp=23:41:29, bsm_identity=AnyChild)

12 maj 2016 23:46:29,673: [TBSMTreeRulesHeartbeat][pool-7-thread-87]Parser log: (PollerName=TBSMTreeRuleHeartbeatService, randNum=8997, timestamp=23:46:29, bsm_identity=AnyChild)

12 maj 2016 23:51:29,674: [TBSMTreeRulesHeartbeat][pool-7-thread-91]Parser log: (PollerName=TBSMTreeRuleHeartbeatService, randNum=73560, timestamp=23:51:29, bsm_identity=AnyChild)

12 maj 2016 23:56:29,700: [TBSMTreeRulesHeartbeat][pool-7-thread-91]Parser log: (PollerName=TBSMTreeRuleHeartbeatService, randNum=60770, timestamp=23:56:29, bsm_identity=AnyChild)

13 maj 2016 00:01:29,724: [TBSMTreeRulesHeartbeat][pool-7-thread-92]Parser log: (PollerName=TBSMTreeRuleHeartbeatService, randNum=55928, timestamp=00:01:29, bsm_identity=AnyChild)

Let’s then create the rules. I will have two rules again, one numerical and one text. The numerical rule will have the TBSMTreeRuleHeartbeatService as the Data Feed, the bsm_identiy field will be selected as the service event identifier field and randNum field will be my output:

Figure 12. Screenshot of numr_heartbeat_randNum rule

Make note. Every time you add another field to your policy activated by your service, make sure that new field is mapped to the right data type in the Customize Fields form. You will need to add that field first:

Figure 13. Screenshot of CustomizedFields form

And the second rule looks the following, this time it is a text rule and I return the timestamp value:

Figure 14. Screenshot of txtr_heartbeat_lasttime rule

Formula rules

The last rules I’ll create will be three formula, policy-based, text rules. Each of them will go to another rules create previously and “spy” on their activity. Let’s see the first example:

Figure 15. Screenshot of nfr_triggered_by_events rule

This rule will use policy and will be a text rule. It is important to mark those fields before continuing, later the Text Rule field greys out and inactivates. After ticking both fields I click on the Edit policy button. All three rules will look the same at this level; hence I won’t include all 3 screenshots as just their names will differ. I’ll create another policy in IPL for each of them. Here’s the mapping:

Rule name	Policy name
nfr_triggered_by_events	p_triggered_by_events
nfr_triggered_by_fetcher	p_triggered_by_fetcher
nfr_triggered_by_service	p_triggered_by_service

Each policy of those three will look similar, it will just look after different rules created so far. The p_triggered_by_events policy will do this:

// Trigger numr_event_lastseverity

// Trigger txtr_event_lastsummary

Seconds = GetDate();

Time = LocalTime(Seconds, "HH:mm:ss");

Status = Time;

log("TestInstance triggered by SimNet events at "+Time);

log("Output value of numr_event_lastseverity: "+InstanceNode.numr_event_lastseverity.Value);

log("Output value of txtr_event_lastsummary: "+InstanceNode.txtr_event_lastsummary.Value);

Policy p_triggered_by_fetcher will do this:

// Trigger numr_fetcher_randnum

Seconds = GetDate();
Time = LocalTime(Seconds, "HH:mm:ss");

Status = Time;
log("TestInstance triggered by HeartbeatFetcher at "+Time);
log("Output value of numr_fetcher_randnum: "+InstanceNode.numr_fetcher_randnum.Value);

And policy p_triggered_by_service this:

// Trigger numr_heartbeat_randnum
// Trigger txtr_heartbeat_lasttime

Seconds = GetDate();
Time = LocalTime(Seconds, "HH:mm:ss");

Status = Time;
log("TestInstance triggered by TBSMHeartbeatService at "+Time);
log("Output value of numr_heartbeat_randnum: "+InstanceNode.numr_heartbeat_randnum.Value);
log("Output value of txtr_heartbeat_lasttime: "+InstanceNode.txtr_heartbeat_lasttime.Value);

You can notice that each policy starts from a comment section. This is important. This is how the formula rules get triggered. It is enough to mention another rule by its name in a comment to trigger your formula every time that other referred rule returns another output value. This is why we have the randnum-related rules in every formula. Those rules are designed to return another value every time they run. Just the first rule isn’t the same, but I assume it will trigger every time a combination of Summary, AlertGroup and AlertKey fields value in the source event changes.

The trigger numerical or text rules are also mentioned later when these policies call them and obtain their output values in order i.e. to put those values into log file. But it is not necessary to trigger my formulas. I log those trigger text and numerical rules outputs for troubleshooting purposes only.

The purpose of these 3 policies and 3 formulas is to report on time when the numerical or text values worked for the last time.

Below you can see an example of one of the policies in actual form.

Figure 16. Screenshot of one of the policies text

Testing the triggers

Now it’s time to test the trigger rules, the triggers and troubleshoot in case

Triggering rules in normal operations

In order to do that we will need a service instance implementing our newly created template. I call it TestInstance and this is its base configuration:

Figure 17. Screenshot of configuration of service instance TestInstance – templates

It is important to make sure that the right event identifiers were selected in the Identification Fields tab. I need to remember what bsm_identity I set in all rules, it is mainly AnyChild (the policy and the fetcher) and TestInstance (the SimNet probe).

Figure 18. Screenshot of configuration of service instance TestInstance – identifiers

Make note. In real life your instance will have its individual identifiers like TADDM GUID or ServiceNow sys_id. It is important to find a match between that value and the affecting events or matching KPIs and if this is necessary to define new identifiers, which will ensure such a match.

Let’s see if it works in general. I created a scorecard and a page to present on all values of my new instance. I’ll put on top also fragments of my formula related policy logs to see if the data returned in policies and timestamps match:

Figure 19. Screenshot of the scorecard with policy logs on top

Let’s take a closer look at the first section. Same event arrived just once but since formula is triggered by two rules it was triggered twice in a row. In general the last event arrived at 20:27:00 and its severity was 4 (major) and the summary was on Link Down. Both rules numr_event_lastseverity and txtr_event_lastsummary triggered m formula correctly.

The next section is about the fetcher, the latest random number is 16589,861 and the rule numr_fetcher_randnum triggered my formula correctly.
The last is the policy activated rule and formula, let’s see. This time I have two rules again and they both triggered the formula correctly. The last run was at 20:26:30. I have two different randnum values in both runs. This is caused by referring to numerical rules twice in my formula policy.

Triggering rules after TBSM server restart

I’ll now show a problem that TBSM has with rules that are not triggered by any trigger. Like I said in the previous chapters, TBSM needs rules to be triggered every now and then but also the value to change between triggers, in order to return the value again.

It causes some issues in TBSM server restart situations. If a value hasn’t changed before server restart and is still the same after the restart, TBSM may be unable to display or return it correctly, if the rule used to return it is not triggered. Server restart situation means clearing TBSM memory so no output values of no rules are preserved for after the server restart.

Here’s an example. I’ll create one new formula rule with this policy in my test template:

Status = ServiceInstance.NUMCHILDREN;

log("Service instance "+ServiceInstance.SERVICEINSTANCENAME+" ("+ServiceInstance.DISPLAYNAME+") has children: "+Status);

Here’s the rule itself:

Figure 20. Screenshot of nfr_numchildren rule configuration

As next step, I add one more column to my scorecard to show the output of the newly created rule. I also created 3 service instances and made them a child to TestInstance instance.

Figure 21. Screenshot of the scorecard shows 3 children count

Also my formula policy log will return number 3:

13 maj 2016 12:17:56,664: [p_numchildren][pool-7-thread-34 [TransBlockRunner-2]]Parser log: Service instance TestInstance (TestInstance) has children: 3

Now, if I only restart TBSM server, the value shown will be 0 and I will see no new entry in the log:

Figure 22. Screenshot of the scorecard after server restart shows 0 children

I can change this situation by taking one of three actions:

Adding new or removing old child instances from Testinstance
Modifying the formula policy
Introducing a trigger to the formula policy

However two first options don’t protect me from another server restart situation occurring.

Let’s say I add another child instance. This is how the scorecard will look like:

Before the restart	However, after the restart

Or I may want to modify my rule. After saving my changes, the value will display correctly. However another server restart will reset it back to 0 again anyway.

So let’s say I change my policy to this:

Status = ServiceInstance.NUMCHILDREN;

log("Service instance "+ServiceInstance.SERVICEINSTANCENAME+" ("+ServiceInstance.DISPLAYNAME+") has children: "+Status);

log("Service instance ID: "+ServiceInstance.SERVICEINSTANCEID);

And my policy log now contains two entries per run:

13 maj 2016 12:56:07,023: [p_numchildren][pool-7-thread-4 [TransBlockRunner-1]]Parser log: Service instance TestInstance (TestInstance) has children: 4

13 maj 2016 12:56:07,023: [p_numchildren][pool-7-thread-4 [TransBlockRunner-1]]Parser log: Service instance ID: 163

But the situation before and after the restart is the same:

Before the restart	After the restart

It’s not a frequent situation though. If your rules are normally event-triggered rules or data fetcher triggered rules you can expect frequent updates to your output values even after your TBSM server restarts. Just in case you want to present an output value in a rule that normally is not triggered, make sure you include a reference to a trigger in your rule. Let’s use one of the triggers we configured previously in my new formula policy:

// Trigger by numr_fetcher_randnum
// Trigger by numr_heartbeat_randnum

Status = ServiceInstance.NUMCHILDREN;
log("Service instance "+ServiceInstance.SERVICEINSTANCENAME+" ("+ServiceInstance.DISPLAYNAME+") has children: "+Status);
log("Service instance ID: "+ServiceInstance.SERVICEINSTANCEID);

You can already notice by following the log of the policy that there are many entries per every policy run, precisely as many entries as many times the formula was triggered by one of the trigger rules.

The first pair of entries was added after saving the rule. The next 2 pairs were added in result of the triggers working fine:

13 maj 2016 13:22:36,837: [p_numchildren][pool-7-thread-3 [TransBlockRunner-1]]Parser log: Service instance TestInstance (TestInstance) has children: 4

13 maj 2016 13:22:36,837: [p_numchildren][pool-7-thread-3 [TransBlockRunner-1]]Parser log: Service instance ID: 163

13 maj 2016 13:24:12,833: [p_numchildren][pool-7-thread-3 [TransBlockRunner-1]]Parser log: Service instance TestInstance (TestInstance) has children: 4

13 maj 2016 13:24:12,833: [p_numchildren][pool-7-thread-3 [TransBlockRunner-1]]Parser log: Service instance ID: 163

13 maj 2016 13:24:18,465: [p_numchildren][pool-7-thread-3 [TransBlockRunner-1]]Parser log: Service instance TestInstance (TestInstance) has children: 4

13 maj 2016 13:24:18,465: [p_numchildren][pool-7-thread-3 [TransBlockRunner-1]]Parser log: Service instance ID: 163

Let’s make the final test, so TBSM server restart:

Before the restart	After the restart

This excercise ends my material for tonight. I'll continue in another material on triggering the status propagation rules and numeric aggregation rules. See you soon!
mp