Bug #531

Icinga shuts down, when starting ido2db

Added by ABauer almost 4 years ago. Updated almost 4 years ago.

Status:ResolvedStart date:06/23/2010
Priority:UrgentDue date:
Assignee:dnsmichi% Done:

100%

Category:Event Broker
Target version:1.0.2
Icinga Version: OS Version:

Description

Something weird, just occured:

As you can see, ido2db wasn't running, quite a time. When I started ido2b, sigsev occured.

Did I do something wrong, or is it a bug?

[1277295748] idomod: Still unable to connect to data sink.  24198 items lost, 5000 queued items to flush.
[1277295764] idomod: Still unable to connect to data sink.  24454 items lost, 5000 queued items to flush.
[1277295780] idomod: Still unable to connect to data sink.  24602 items lost, 5000 queued items to flush.
[1277295796] idomod: Still unable to connect to data sink.  24860 items lost, 5000 queued items to flush.
[1277295811] Warning: Return code of 255 for check of service 'Syslog_Critical_Errors' on host 'AppDirector1' was out of bounds.
[1277295812] idomod: Successfully connected to data sink.  25087 items lost, 5000 queued items to flush.
[1277295812] Caught SIGSEGV, shutting down...
[1277296082] Icinga 1.0.1 starting... (PID=533)
[1277296082] Local time is Wed Jun 23 14:28:02 CEST 2010
[1277296082] LOG VERSION: 2.0
[1277296082] livestatus: Version 1.1.3 initializing. Socket path: '/usr/local/icinga/var/rw/live'
[1277296082] livestatus: Livestatus has been brought to you by Mathias Kettner
[1277296082] livestatus: Please visit us at http://mathias-kettner.de/
[1277296082] livestatus: Removed old left over socket file /usr/local/icinga/var/rw/live
[1277296082] livestatus: Created UNIX control socket at /usr/local/icinga/var/rw/live
[1277296082] livestatus: Opened UNIX socket /usr/local/icinga/var/rw/live
[1277296082] livestatus: successfully finished initialization
[1277296082] Event broker module '/usr/local/lib/mk-livestatus/livestatus.o' initialized successfully.
[1277296082] idomod: IDOMOD 1.0.1 (03-03-2010) Copyright (c) 2005-2008 Ethan Galstad (nagios@nagios.org), Copyright (c) 2009-2010 Icinga Development Team (http://www.icinga.org))
[1277296082] idomod: Successfully connected to data sink.  0 queued items to flush.
[1277296082] Event broker module '/usr/local/icinga/bin/idomod.o' initialized successfully.

Associated revisions

Revision b3a9fcbb
Added by dnsmichi almost 4 years ago

remove catching a SIGSEGV on not dumping core and running as daemon

-* core: only catch SIGSEGV if we're not dumping core and running as a daemon (Andreas Ericsson)

applying this will cause idomod's 'error writing to data sink'
problems resulting in a caught SIGSEGV and shutting down the
core. Although it's an obvious problem of idomod removed for
the upcoming release.
It has to be resolved in idomod itsself afterwards.

Kudos to Andreas Bauer for reporting that.

fixes #531

Revision 53b0014c
Added by dnsmichi almost 4 years ago

make state based escalation ranges optional by configure

currently, the object definitions used by mk_livestatus
are directly copied from nagios 3.2.0 which leads to the
problem that different exported symbols and variables are
expected.

the state based escalation ranges change that, and this
will lead into mk_livestatus throwing a segfault and
producing a core dump.

in order to give the mk_livestatus developer more time to
resolve this issue, the original patch for #306 has been
reworked into optional selection through configure.

this will be changed when mk_livestatus becomes ready
to fully support icinga core.

refs #306
refs #531
refs #535

History

#1 Updated by dnsmichi almost 4 years ago

  • Status changed from New to Feedback

mh this is not good, leaving the idomod loaded if not using ido2db as data sink. as you can see the idomod buffer is completely stuffed and missing every single object being probcessed. but when starting ido2db, there shouldn't be such a sigsegv forcing a restart.

this is really weird and needs more testing. are you able to reproduce that while turning the debuglevel in icinga.cfg and ido2db.cfg to the highest?

#2 Updated by dnsmichi almost 4 years ago

  • Status changed from Feedback to Assigned
  • Assignee set to dnsmichi
  • Priority changed from Normal to Urgent
  • Target version set to 1.0.2

i can reproduce that and i am pretty sure how to resolve that.

[1277461852] Icinga 1.0.1 starting... (PID=21855)
[1277461852] Local time is Fri Jun 25 12:30:52 CEST 2010
[1277461852] LOG VERSION: 2.0
[1277461852] idomod: IDOMOD 1.0.1 (03-03-2010) Copyright (c) 2005-2008 Ethan Galstad (nagios@nagios.org), Copyright (c) 2009-2010 Icinga Development Team (http://www.icinga.org))
[1277461852] idomod: Successfully connected to data sink.  0 queued items to flush.
[1277461852] Event broker module '/usr/bin/idomod.o' initialized successfully.
[1277461852] livestatus: Version 1.1.6 initializing. Socket path: '/var/nagios/rw/live'
[1277461852] livestatus: Livestatus has been brought to you by Mathias Kettner
[1277461852] livestatus: Please visit us at http://mathias-kettner.de/
[1277461852] livestatus: Created UNIX control socket at /var/nagios/rw/live
[1277461852] livestatus: Opened UNIX socket /var/nagios/rw/live
[1277461852] livestatus: successfully finished initialization
[1277461852] Event broker module '/usr/lib64/mk-livestatus/livestatus.o' initialized successfully.
[1277461853] Finished daemonizing... (New PID=21858)
[1277461853] livestatus: Starting 10 client threads
[1277461853] livestatus: Entering main loop, listening on UNIX socket. PID is 21858
[1277461912] idomod: Error writing to data sink!  Some output may get lost...
[1277461912] idomod: Please check remote ido2db log, database connection or SSL Parameters
[1277461919] Caught SIGSEGV, shutting down...

https://git.icinga.org/?p=icinga-core.git;a=blobdiff;f=base/utils.c;h=cc67ebcc4d95061d17bcf2d51f4e87a9a8ee31f1;hp=33a0c1d8ff8e55c6205b34acfbf8a1e5744435af;hb=e50ac31e1e8698ec2597ba2e071e5a2a70d6a293;hpb=1b066981c68feb7ac85bf56ea933c8483dac3413

reverted that fix - it does not really make sense that a nebmodule causes a SIGSEGV on the core in daemon mode.

now it's looking fine again.

[1277463651] Icinga 1.0.1 starting... (PID=3107)
[1277463651] Local time is Fri Jun 25 13:00:51 CEST 2010
[1277463651] LOG VERSION: 2.0
[1277463651] livestatus: Version 1.1.6 initializing. Socket path: '/var/nagios/rw/live'
[1277463651] livestatus: Livestatus has been brought to you by Mathias Kettner
[1277463651] livestatus: Please visit us at http://mathias-kettner.de/
[1277463651] livestatus: Created UNIX control socket at /var/nagios/rw/live
[1277463651] livestatus: Opened UNIX socket /var/nagios/rw/live
[1277463651] livestatus: successfully finished initialization
[1277463651] Event broker module '/usr/lib64/mk-livestatus/livestatus.o' initialized successfully.
[1277463651] idomod: IDOMOD 1.0.1 (03-03-2010) Copyright (c) 2005-2008 Ethan Galstad (nagios@nagios.org), Copyright (c) 2009-2010 Icinga Development Team (http://www.icinga.org))
[1277463651] idomod: Successfully connected to data sink.  0 queued items to flush.
[1277463651] Event broker module '/usr/bin/idomod.o' initialized successfully.
[1277463651] Finished daemonizing... (New PID=3110)
[1277463651] livestatus: Starting 10 client threads
[1277463651] livestatus: Entering main loop, listening on UNIX socket. PID is 3110
[1277463674] idomod: Error writing to data sink!  Some output may get lost...
[1277463674] idomod: Please check remote ido2db log, database connection or SSL Parameters
[1277463690] idomod: Successfully reconnected to data sink!  3570 items lost, 5000 queued items to flush.
[1277463742] idomod: Successfully flushed 5000 queued items to data sink.

this originates from idomod and need to be resolved over there of course - in a dev branch.

thanks for testing and reporting back!

#3 Updated by dnsmichi almost 4 years ago

  • Status changed from Assigned to Resolved
  • % Done changed from 0 to 100

Also available in: Atom PDF