Feature #2537

add trigger_time to downtimes to allow calculating of flexible downtimes endtime

Added by dnsmichi almost 2 years ago. Updated almost 2 years ago.

Status:ResolvedStart date:04/22/2012
Priority:UrgentDue date:
Assignee:dnsmichi% Done:

90%

Category:Scheduled Downtime
Target version:Icinga 1.x - 1.7

Description

as we have learned in #2536 the core won't keep track about the downtime trigger time, but only the start, end, entry times are kept.

the problem is that on a flexible downtime with a duration less that end-start time, this will result in relooping/rescheduling the short flexible downtime for the host/service even if this has ended after one duration.

in order to allow fixing #2536, we need to add an entry to the downtime section - my proposal is "trigger_time" which gets populated only once - when the downtime is started, when downtime ends, this must be reset to 0L.

as this an objects change again, it needs to be kept the compatible abi breaking workaround at the end of the downtime struct.

furthermore, this requires changes for the event broker, as well idoutils recognizing the trigger_time as well.


Related issues

Related to Core - Bug #2536: scheduled_downtime_depth falsely incremented if in flexib... Resolved 04/22/2012
Related to IDOUtils - Feature #2539: add is_in_effect and trigger_time to scheduleddowntime an... Resolved 04/23/2012
Related to Classic UI - Feature #2538: add is_in_effect and trigger_time to downtime view for ht... Resolved 04/22/2012

Associated revisions

Revision f03dbcdb
Added by dnsmichi almost 2 years ago

core: add trigger_time to downtimes to allow calculating of flexible downtimes endtime #2537

in order to fix #2536 we must introduce that as a
feature. fetching the actual time when a downtime
is started (triggered) is mandatory for calculating
the flexible downtimes, especially when the duration
is less than end-start time.
the bug found in #2536 actually does not trigger the
endtime after trigger time + duration (because it is
unknown), but waits for the end time provided by the
external command.
this leads into rescheduling the flexible downtime
looping duration by duration til endtime, incrementing
the scheduled_downtime_depth counter as long as
possible.
the docs cleary state that a flexible downtime only
lasts one duration, and then exits. the overlapping
is NOT what we want.

furthermore, introducing this as a basis for fixing
issue #2536 we can actually re-use that in future
changes to show the user the actual time a downtime
was entered - and not the start time which could
be somewhere in the past.

entry_time is only the time when the command was
sent and does not help here.

refs #2537
refs #2536

Revision 8d315d03
Added by dnsmichi almost 2 years ago

core: fix scheduled_downtime_depth falsely incremented if in flexible downtime with duration < end-starttime window #2536

since we now got support for the trigger_time
of a scheduled downtime, we can now decide if
a flexible downtime is to be ended after trigger
time + duration, or not.

adding further tap tests is currently not possible
as we would have to workaround the not-to-be-found
host in skiplist in handle_scheduled_downtime which
is a pita and requires more rework on clearly
abstracted functionality, beyond the checks on hosts
and services existing, to create the relation to
the downtime being handled.

the issue #2536 holds further analysis and debug logs
on the tests, which has been enhanced in #2537 as well.

refs #2536
refs #2537

Revision e17125c3
Added by dnsmichi almost 2 years ago

fix copy paste error in reading trigger_time from status.dat #2537

refs #2537

Revision b7a29ac4
Added by dnsmichi almost 2 years ago

idoutils: add is_in_effect and trigger_time to scheduleddowntime and downtimehistory tables #2539 - MF

requires change on doing neb callback after
having fetched all necessary data when
starting/triggering a downtime.

all 3 rdbms get scheduleddowntime and
downtimehistory populated.

db sqls and upgrade scripts require tests!

refs #2537
refs #2539

Revision 2bfc1d46
Added by dnsmichi almost 2 years ago

fix int vs unsigned long mismatches from previous commits #2536 #2537

refs #2536
refs #2537

Revision 51997db4
Added by dnsmichi almost 2 years ago

core: add trigger_time to downtimes to allow calculating of flexible downtimes endtime #2537

in order to fix #2536 we must introduce that as a
feature. fetching the actual time when a downtime
is started (triggered) is mandatory for calculating
the flexible downtimes, especially when the duration
is less than end-start time.
the bug found in #2536 actually does not trigger the
endtime after trigger time + duration (because it is
unknown), but waits for the end time provided by the
external command.
this leads into rescheduling the flexible downtime
looping duration by duration til endtime, incrementing
the scheduled_downtime_depth counter as long as
possible.
the docs cleary state that a flexible downtime only
lasts one duration, and then exits. the overlapping
is NOT what we want.

furthermore, introducing this as a basis for fixing
issue #2536 we can actually re-use that in future
changes to show the user the actual time a downtime
was entered - and not the start time which could
be somewhere in the past.

entry_time is only the time when the command was
sent and does not help here.

refs #2537
refs #2536

Conflicts:

Changelog

Revision dc1569b6
Added by dnsmichi almost 2 years ago

core: fix scheduled_downtime_depth falsely incremented if in flexible downtime with duration < end-starttime window #2536

since we now got support for the trigger_time
of a scheduled downtime, we can now decide if
a flexible downtime is to be ended after trigger
time + duration, or not.

adding further tap tests is currently not possible
as we would have to workaround the not-to-be-found
host in skiplist in handle_scheduled_downtime which
is a pita and requires more rework on clearly
abstracted functionality, beyond the checks on hosts
and services existing, to create the relation to
the downtime being handled.

the issue #2536 holds further analysis and debug logs
on the tests, which has been enhanced in #2537 as well.

refs #2536
refs #2537

Conflicts:

Changelog

Revision 8422f242
Added by dnsmichi almost 2 years ago

fix int vs unsigned long mismatches from previous commits #2536 #2537

refs #2536
refs #2537

History

#1 Updated by dnsmichi almost 2 years ago

adding this requires further changes.

  • xssdefault.c - save and read statusdata
  • xrddefault.c - save and read retained data over restarts
  • nebstructs.h+broker.h+broker.c - send trigger_time to neb api+struct so neb modules can match on that
    • with that, we will add is_in_effect as well.
hostdowntime {
        host_name=localhost
        downtime_id=4
        entry_time=1335122669
        start_time=1335122648
        end_time=1335123848
        triggered_by=0
        fixed=0
        duration=180
        is_in_effect=1
        author=icinga
        comment=test flex fix
        trigger_time=1335122710
        }

[1335122669.965720] [512.0] [pid=2290] Scheduled Downtime Details:
[1335122669.965723] [512.0] [pid=2290]  Type:        Host Downtime
[1335122669.965725] [512.0] [pid=2290]  Host:        localhost
[1335122669.965728] [512.0] [pid=2290]  Fixed/Flex:  Flexible
[1335122669.965731] [512.0] [pid=2290]  Start:       04-22-2012 21:24:08
[1335122669.965733] [512.0] [pid=2290]  End:         04-22-2012 21:44:08
[1335122669.965736] [512.0] [pid=2290]  Duration:    0h 3m 0s
[1335122669.965738] [512.0] [pid=2290]  Downtime ID: 4
[1335122669.965741] [512.0] [pid=2290]  Trigger ID:  0
[1335122710.055156] [512.0] [pid=2290] Flexible downtime (id=4) for host 'localhost' starting now...
[1335122710.055162] [512.0] [pid=2290] Host 'localhost' starting flexible scheduled downtime (id=4) with depth=0, starttime=1335122648, entrytime=1335122669, endtime=1335123848, duration=180.
[1335122710.055166] [512.0] [pid=2290] Host 'localhost' has entered a period of scheduled downtime (id=4) at triggertime=1335122710.
[1335122890.014894] [512.0] [pid=2290] Host 'localhost' ending flexible scheduled downtime (id=4) with depth=1, starttime=1335122648, entrytime=1335122669, triggertime=1335122710, endtime=1335123848, duration=180.
[1335122890.014901] [512.0] [pid=2290] Host 'localhost' has exited from a period of scheduled downtime (id=4).

#2 Updated by dnsmichi almost 2 years ago

  • Category changed from Downtimes to Scheduled Downtime
  • Priority changed from Normal to Urgent

#3 Updated by dnsmichi almost 2 years ago

to clarify what we do for fixing #2536

        /* have we come to the end of the scheduled downtime? */
        if (temp_downtime->is_in_effect == TRUE && ( /* downtime needs to be in effect and ... */
                (temp_downtime->fixed == TRUE && current_time >= temp_downtime->end_time) || /* fixed downtime, endtime means end of downtime */
                (temp_downtime->fixed == FALSE && current_time >= (temp_downtime->trigger_time+temp_downtime->duration)) /* flexible downtime, endtime of downtime is trigger_time+duration */
                )){

if we happen to trigger the flexible downtime, we check if the currenttime is greater equal than trigger_time (time when the flex downtime started) plus added the duration it lasts. so we can be sure about the 1x duration it should last, and can safely expire the downtime.

this change requires further tests for all variants of course.

#4 Updated by dnsmichi almost 2 years ago

  • Status changed from Assigned to Feedback
  • % Done changed from 0 to 90

tests required.

#5 Updated by dnsmichi almost 2 years ago

  • Status changed from Feedback to Resolved

works for me, as it keeps up with a new attribute only. the if condition is done with #2536

Also available in: Atom PDF