Feature #1367

add notifications to stalking hosts/services, not only logging/event handlers

Added by dnsmichi about 3 years ago. Updated over 2 years ago.

Status:ResolvedStart date:03/30/2011
Priority:NormalDue date:
Assignee:dnsmichi% Done:

100%

Category:Notifications
Target version:Icinga 1.x - 1.6

Description

currently stalked hosts/services match

if(temp_service->state_type==HARD_STATE && state_change==FALSE && state_was_logged==FALSE && compare_strings(old_plugin_output,temp_service->plugin_output)){
  • HARD state
  • no state change
  • state not logged
  • output changed

if this is true, it is then decided,

  • state
  • stalk_on_$state

if this matches, this event is being logged.

log_service_event(temp_service);

if stalking event handlers are enabled, you can use an event handler assigned to the host/service.

but what if you wan't to get notified about that? current example - snmp uptime. will always stay in a state, but the output will change. such things are interesting, even if they won't generate a normal alarm.

so by adding an

service_notification(temp_service,NOTIFICATION_NORMAL,NULL,NULL,NOTIFICATION_OPTION_NONE);

it should do the normal notification thingy and checkings. although it needs to be evaluated which things need to be set on the temp_service attributes to match a notification.

and it should be make a cfg option.

note aside stalking is enabled for HARD state, but what if it's always OK, and remains in SOFT state? it would be worthwile to test if soft states are also possible for stalking!


Related issues

Related to Core - Bug #1744: reduce notification load by moving notification viability... Resolved 07/22/2011
Related to Docs - Feature #2046: add stalking notifications (icinga.cfg) Resolved 11/01/2011

Associated revisions

Revision 17628022
Added by dnsmichi over 2 years ago

  • core: add notifications to stalking hosts/services, not only logging/event handlers #1367

for further details, please refer to
https://dev.icinga.org/issues/1367

refs #1367

History

#1 Updated by dnsmichi almost 3 years ago

  • Target version changed from 1.4 to 1.5

needs more investigation.

#2 Updated by dnsmichi almost 3 years ago

  • Target version changed from 1.5 to 1.6

#3 Updated by dnsmichi over 2 years ago

within the viability checks for the host/service notification, this must be passed and if so you could even control that by contact using a new notification_option - if this is set, the contact gets added to the notification list in memory and after that the notification is actually invoked.

if we make it a global option, we have to make sure that it passes everything else.

#4 Updated by ares over 2 years ago

Any news? I'd vote +1 for this. We need this for raid array monitoring (when a service is acked because of 1 disk failure and another disk gets broken we must recieve a notification - based just on output change, not state change)

#5 Updated by dnsmichi over 2 years ago

i'm still undecided if making it a config option like

stalking_notifications_for_hosts
stalking_notifications_for_services

or similar in icinga.cfg (like stalking event handlers), so that each would trigger, or do it the notification_options way and add a new option to allow such a filter just for chosen hosts or services.

stalking event handlers got one major advantage - you can already define an event handler. for a notification, the exact state must have been matched.

on the stalking side of life you have something like

  • hosts
    • stalking_options [o,d,u]
  • services
    • stalking_options [o,w,u,c]

and if stalking for that state enabled, the notification pattern like a normal notification for that host/service must match too.

so e.g. enabling stalking on critical service, the associated contacts receiving a notification then must be able to pass viability checks like normal. this is where a special contacts just for stalking might be needed. so adding that will require some doc updates then too.

and, that's bugs me the most - how to identify a stalking notification? populate the subject differently based on the host being stalked?

#6 Updated by dnsmichi over 2 years ago

requires a new notification type to be sent then, define in icinga.h

#define NOTIFICATION_STALKING           16211

will be passed when calling host|service_notification() function

#7 Updated by dnsmichi over 2 years ago

we're keeping the rest of the notifications safe,

        /* should the notification number be increased? */
        if (type == NOTIFICATION_NORMAL || (options & NOTIFICATION_OPTION_INCREMENT)) {

using our own type, and NOTIFICATION_OPTION_NONE

#8 Updated by dnsmichi over 2 years ago

furthermore, we can't actually pass the notification viability checks. stalking notification without any proper handling would result into the following

[1320141124.052582] [032.0] [pid=5989] ** Service Notification Attempt ** Host: '1367_host_001', Service: '1367_ok_01', Type: 16211, Options: 0, Current State: 0, Last Notification: Thu Jan  1 01:00:00 1970
[1320141124.052607] [032.1] [pid=5989] We shouldn't notify about this recovery.
[1320141124.052612] [032.0] [pid=5989] Notification viability test failed.  No notification will be sent out.

so in order to make this happen, we change the code in base/notifications.c a bit and handle NOTIFICATION_STALKING the same as NOTIFICATION_CUSTOM - it won't hit the place for checking if a NOTIFICATION_NORMAL either way.

#9 Updated by dnsmichi over 2 years ago

  • % Done changed from 0 to 50

how to test

install the testconfig from the wiki.

icinga.cfg

stalking_notifications_for_hosts=1
stalking_notifications_for_services=1

1367.cfg

# templates
define contact{
        name                            generic-contact-1367         ; The name of this contact template
        service_notification_period     24x7                    ; service notifications can be sent anytime
        host_notification_period        24x7                    ; host notifications can be sent anytime
        service_notification_options    w,u,c,r,f,s             ; send notifications for all service states, flapping events, and scheduled downtime events
        host_notification_options       d,u,r,f,s               ; send notifications for all host states, flapping events, and scheduled downtime events
        service_notification_commands   notify-service-by-email ; send service notifications via email
        host_notification_commands      notify-host-by-email    ; send host notifications via email
        register                        0                       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE!
        }

define host{
  name                           generic-host-1367
  check_interval                 1
  check_period                   24x7
  event_handler_enabled          1
  failure_prediction_enabled     1
  flap_detection_enabled         1
  max_check_attempts             5
  notification_interval          0
  notification_options           d,u,r,f
  notification_period            24x7
  notifications_enabled          1
  process_perf_data              1
  register                       0
  retain_nonstatus_information   1
  retain_status_information      1
  retry_interval                 1
}

define service{
  name                           generic-service-1367
  active_checks_enabled          1
  check_freshness                0
  check_interval                 1
  check_period                   24x7
  event_handler_enabled          1
  failure_prediction_enabled     1
  flap_detection_enabled         1
  is_volatile                    0
  max_check_attempts             3
  notification_interval          0
  notification_options           w,u,c,r,f
  notification_period            24x7
  notifications_enabled          1
  obsess_over_service            1
  parallelize_check              1
  passive_checks_enabled         1
  process_perf_data              1
  register                       0
  retain_nonstatus_information   1
  retain_status_information      1
  retry_interval                 1
}

# hosts

define host{
  use                            generic-host-1367
  host_name                      1367_host_001
  address                        127.0.0.1
  alias                          1367_up_001
  check_command                  test-check-host-alive!up
  check_period                   24x7
  # stalk it ok, down, up
  stalking_options               o,d,u
  # simulate passive check with output change only
  active_checks_enabled          0
  passive_checks_enabled         1
  # set special contactgroup
  contact_groups                 test_group_1367_hosts
}

# services

define service{
  service_description            1367_ok_01
  host_name                      1367_host_001
  use                            generic-service-1367
  check_command                  check_service!ok
  # stalk it ok, warning, unknown, critical
  stalking_options               o,w,u,c
  # simulate passive check with output change only
  active_checks_enabled          0
  passive_checks_enabled         1
  # set special contactgroup
  contact_groups                 test_group_1367_services
}

define service{
  service_description            1367_ok_02
  host_name                      1367_host_001
  use                            generic-service-1367
  check_command                  check_service!ok
  # stalk it ok, warning, unknown, critical
  stalking_options               o,w,u,c
  # simulate passive check with output change only
  active_checks_enabled          0
  passive_checks_enabled         1
  # set special contactgroup
  contact_groups                 test_group_1367_services
}

# contacts
define contactgroup{
        contactgroup_name       test_group_1367_hosts
        alias                   stalking notifications test
        members                 test_contact_1367_stalk_all,test_contact_1367_stalk_hosts
        }

define contactgroup{
        contactgroup_name       test_group_1367_services
        alias                   stalking notifications test
        members                 test_contact_1367_stalk_all,test_contact_1367_stalk_services
        }

define contact{
        contact_name            test_contact_1367_stalk_all
        use                     generic-contact-1367
        alias                   1367_stalk_all
        email                   root@localhost
        }

define contact{
        contact_name            test_contact_1367_stalk_hosts
        use                     generic-contact-1367
        alias                   1367_stalk_all
        email                   root@localhost
        }

define contact{
        contact_name            test_contact_1367_stalk_services
        use                     generic-contact-1367
        alias                   1367_stalk_all
        email                   root@localhost
        }

  • send custom check results via cmd.cgi / gui, and only change the checkoutput, nothing else.
  • tail -f /var/log/messages | grep 1367

output

Nov  1 11:01:37 imagine icinga: EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;1367_host_001;1367_ok_01;0;change my output2|
Nov  1 11:01:37 imagine icinga: EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;1367_host_001;1367_ok_02;0;change my output2|
Nov  1 11:01:44 imagine icinga: PASSIVE SERVICE CHECK: 1367_host_001;1367_ok_01;0;change my output2
Nov  1 11:01:44 imagine icinga: SERVICE ALERT: 1367_host_001;1367_ok_01;OK;HARD;1;change my output2
Nov  1 11:01:44 imagine icinga: SERVICE NOTIFICATION: test_contact_1367_stalk_services;1367_host_001;1367_ok_01;OK;notify-service-by-email;change my output2
Nov  1 11:01:44 imagine icinga: SERVICE NOTIFICATION: test_contact_1367_stalk_all;1367_host_001;1367_ok_01;OK;notify-service-by-email;change my output2
Nov  1 11:01:44 imagine icinga: PASSIVE SERVICE CHECK: 1367_host_001;1367_ok_02;0;change my output2
Nov  1 11:01:44 imagine icinga: SERVICE ALERT: 1367_host_001;1367_ok_02;OK;HARD;1;change my output2
Nov  1 11:01:44 imagine icinga: SERVICE NOTIFICATION: test_contact_1367_stalk_services;1367_host_001;1367_ok_02;OK;notify-service-by-email;change my output2
Nov  1 11:01:44 imagine icinga: SERVICE NOTIFICATION: test_contact_1367_stalk_all;1367_host_001;1367_ok_02;OK;notify-service-by-email;change my output2

from the debuglog, it looks different than previous debug logs because we already changed the notification viability in the git tree with #1744

[1320141704.275686] [032.0] [pid=11833] ** Service Notification Attempt ** Host: '1367_host_001', Service: '1367_ok_01', Type: 16211, Options: 0, Current State: 0, Last Notification: Thu Jan  1 01:00:00 1970
[1320141704.275702] [032.0] [pid=11833] Notification viability test passed.
[1320141704.275707] [032.1] [pid=11833] Current notification number: 0 (unchanged)
[1320141704.275710] [032.2] [pid=11833] Creating list of contacts to be notified.
[1320141704.275714] [032.1] [pid=11833] Service notification will NOT be escalated.
[1320141704.275718] [032.1] [pid=11833] Adding normal contacts for service to notification list.
[1320141704.275722] [032.2] [pid=11833] Adding members of contact group 'test_group_1367_services' for service to notification list.
[1320141704.275725] [032.2] [pid=11833] ** Checking service notification viability for contact 'test_contact_1367_stalk_all'...
[1320141704.275731] [032.2] [pid=11833] Adding contact 'test_contact_1367_stalk_all' to notification list.
[1320141704.275735] [032.2] [pid=11833] ** Checking service notification viability for contact 'test_contact_1367_stalk_services'...
[1320141704.275740] [032.2] [pid=11833] Adding contact 'test_contact_1367_stalk_services' to notification list.
[1320141704.275750] [032.2] [pid=11833] ** Notifying contact 'test_contact_1367_stalk_services'
[1320141704.275761] [032.2] [pid=11833] Raw notification command: /usr/bin/printf "%b" "***** Icinga *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
[1320141704.275785] [032.2] [pid=11833] Processed notification command: /usr/bin/printf "%b" "***** Icinga *****\n\nNotification Type: STALKING\n\nService: 1367_ok_01\nHost: 1367_up_001\nAddress: 127.0.0.1\nState: OK\n\nDate/Time: Tue Nov 1 11:01:44 CET 2011\n\nAdditional Info:\n\nchange my output2\n" | @MAIL_PROG@ -s "** STALKING Service Alert: 1367_up_001/1367_ok_01 is OK **" root@localhost
[1320141704.301726] [032.2] [pid=11833] ** Notifying contact 'test_contact_1367_stalk_all'
[1320141704.301774] [032.2] [pid=11833] Raw notification command: /usr/bin/printf "%b" "***** Icinga *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
[1320141704.301919] [032.2] [pid=11833] Processed notification command: /usr/bin/printf "%b" "***** Icinga *****\n\nNotification Type: STALKING\n\nService: 1367_ok_01\nHost: 1367_up_001\nAddress: 127.0.0.1\nState: OK\n\nDate/Time: Tue Nov 1 11:01:44 CET 2011\n\nAdditional Info:\n\nchange my output2\n" | @MAIL_PROG@ -s "** STALKING Service Alert: 1367_up_001/1367_ok_01 is OK **" root@localhost
[1320141704.326601] [032.0] [pid=11833] 2 contacts were notified.
[1320141704.326968] [032.0] [pid=11833] ** Service Notification Attempt ** Host: '1367_host_001', Service: '1367_ok_02', Type: 16211, Options: 0, Current State: 0, Last Notification: Thu Jan  1 01:00:00 1970
[1320141704.326996] [032.0] [pid=11833] Notification viability test passed.
[1320141704.327008] [032.1] [pid=11833] Current notification number: 0 (unchanged)
[1320141704.327020] [032.2] [pid=11833] Creating list of contacts to be notified.
[1320141704.327031] [032.1] [pid=11833] Service notification will NOT be escalated.
[1320141704.327043] [032.1] [pid=11833] Adding normal contacts for service to notification list.
[1320141704.327053] [032.2] [pid=11833] Adding members of contact group 'test_group_1367_services' for service to notification list.
[1320141704.327064] [032.2] [pid=11833] ** Checking service notification viability for contact 'test_contact_1367_stalk_all'...
[1320141704.327082] [032.2] [pid=11833] Adding contact 'test_contact_1367_stalk_all' to notification list.
[1320141704.327095] [032.2] [pid=11833] ** Checking service notification viability for contact 'test_contact_1367_stalk_services'...
[1320141704.327112] [032.2] [pid=11833] Adding contact 'test_contact_1367_stalk_services' to notification list.
[1320141704.327142] [032.2] [pid=11833] ** Notifying contact 'test_contact_1367_stalk_services'
[1320141704.327164] [032.2] [pid=11833] Raw notification command: /usr/bin/printf "%b" "***** Icinga *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
[1320141704.327249] [032.2] [pid=11833] Processed notification command: /usr/bin/printf "%b" "***** Icinga *****\n\nNotification Type: STALKING\n\nService: 1367_ok_02\nHost: 1367_up_001\nAddress: 127.0.0.1\nState: OK\n\nDate/Time: Tue Nov 1 11:01:44 CET 2011\n\nAdditional Info:\n\nchange my output2\n" | @MAIL_PROG@ -s "** STALKING Service Alert: 1367_up_001/1367_ok_02 is OK **" root@localhost
[1320141704.347860] [032.2] [pid=11833] ** Notifying contact 'test_contact_1367_stalk_all'
[1320141704.347904] [032.2] [pid=11833] Raw notification command: /usr/bin/printf "%b" "***** Icinga *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
[1320141704.348015] [032.2] [pid=11833] Processed notification command: /usr/bin/printf "%b" "***** Icinga *****\n\nNotification Type: STALKING\n\nService: 1367_ok_02\nHost: 1367_up_001\nAddress: 127.0.0.1\nState: OK\n\nDate/Time: Tue Nov 1 11:01:44 CET 2011\n\nAdditional Info:\n\nchange my output2\n" | @MAIL_PROG@ -s "** STALKING Service Alert: 1367_up_001/1367_ok_02 is OK **" root@localhost
[1320141704.360213] [032.0] [pid=11833] 2 contacts were notified.

actually, the notification script would need a seperated handler on the notification type, so that STALKING could be wrapped somehow. problem would remain that the core can't keep current and previous checkoutput putting that into a macro being available on notifications.

disabled icinga.cfg feature

syslog

Nov  1 11:20:17 imagine icinga: EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;1367_host_001;1367_ok_01;0;change me again|
Nov  1 11:20:17 imagine icinga: EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;1367_host_001;1367_ok_02;0;change me again|
Nov  1 11:20:21 imagine icinga: PASSIVE SERVICE CHECK: 1367_host_001;1367_ok_01;0;change me again
Nov  1 11:20:21 imagine icinga: SERVICE ALERT: 1367_host_001;1367_ok_01;OK;HARD;1;change me again
Nov  1 11:20:21 imagine icinga: PASSIVE SERVICE CHECK: 1367_host_001;1367_ok_02;0;change me again
Nov  1 11:20:21 imagine icinga: SERVICE ALERT: 1367_host_001;1367_ok_02;OK;HARD;1;change me again

icinga.debug

Pattern not found  (press RETURN)

don't stalk on OK

define service{
  service_description            1367_ok_02
  host_name                      1367_host_001
  use                            generic-service-1367
  check_command                  check_service!ok
  # stalk it ok, warning, unknown, critical
  #stalking_options               o,w,u,c
  stalking_options               w,u,c
  # simulate passive check with output change only
  active_checks_enabled          0
  passive_checks_enabled         1
  # set special contactgroup
  contact_groups                 test_group_1367_services
}

so the first service will notify (and also log the ALERT because stalking enabled), but the second just stays calm

Nov  1 11:22:52 imagine icinga: EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;1367_host_001;1367_ok_01;0;svc01 will notify, svc02 is not stalking on ok|
Nov  1 11:22:52 imagine icinga: EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;1367_host_001;1367_ok_02;0;svc01 will notify, svc02 is not stalking on ok|
Nov  1 11:23:01 imagine icinga: PASSIVE SERVICE CHECK: 1367_host_001;1367_ok_01;0;svc01 will notify, svc02 is not stalking on ok
Nov  1 11:23:01 imagine icinga: SERVICE ALERT: 1367_host_001;1367_ok_01;OK;HARD;1;svc01 will notify, svc02 is not stalking on ok
Nov  1 11:23:01 imagine icinga: SERVICE NOTIFICATION: test_contact_1367_stalk_services;1367_host_001;1367_ok_01;OK;notify-service-by-email;svc01 will notify, svc02 is not stalking on ok
Nov  1 11:23:02 imagine icinga: SERVICE NOTIFICATION: test_contact_1367_stalk_all;1367_host_001;1367_ok_01;OK;notify-service-by-email;svc01 will notify, svc02 is not stalking on ok
Nov  1 11:23:02 imagine icinga: PASSIVE SERVICE CHECK: 1367_host_001;1367_ok_02;0;svc01 will notify, svc02 is not stalking on ok

this can be tested with all states, the code is merely the same for triggering as the event handlers, it just needs it globally enabled in icinga.cfg because it's an opt-in feature next to the normal stalking logging.

#10 Updated by dnsmichi over 2 years ago

possible todos - how to filter assigned contacts for that host/service for stalking notifications.

  • use the same notification_options for the states
  • enable that contact explicitely to receive stalking notifications?

#11 Updated by dnsmichi over 2 years ago

  • Status changed from Assigned to Resolved
  • % Done changed from 50 to 100

discussed that with my colleague, and the initial implementation will make that notification dependant on enabling stalking, and stalking notifications overall. the main problem with notification_options will be the filter for "ok", and various other logic might need to be different. so we'll leave that as it is for now. for further discussions, please open a new issue with new ideas/questions.

#12 Updated by ares over 2 years ago

Just fyi we tested this and it works perfecly for our needs. Thank you very much.

#13 Updated by dnsmichi over 2 years ago

thanks for the feedback - very nice indeed that others can benefit too :-)

Also available in: Atom PDF