Bug #702

Solaris 10: Bus Error (core dumped) when starting icinga

Added by antonxx over 3 years ago. Updated over 3 years ago.

Status:ResolvedStart date:08/11/2010
Priority:HighDue date:
Assignee:dnsmichi% Done:

100%

Category:Other
Target version:1.2 (Stable)
Icinga Version: OS Version:

Description

Hi,

I did now the same steps on solaris which I did when compiling on linux.

My actual status:

icinga with classical web interface works on suse linux 11.1 64 bit.

On solaris 10 (sparc) I stumble over the step in
the quickstart documentation:

---------------------------------------
#> /usr/local/icinga/bin/icinga -v /usr/local/icinga/etc/icinga.cfg

Icinga 1.0.2
Copyright (c) 2009-2010 Icinga Development Team (http://www.icinga.org)
Copyright (c) 2009 Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 06-30-2010
License: GPL

Bus Error (core dumped)
#>
---------------------------------------

When looking at the dump, I see:

--------------------------------------
#> pstack /var/core/core_utuw57_icinga_0_0_1281357332_10848

core '/var/core/core_utuw57_icinga_0_0_1281357332_10848' of 10848:      /usr/local/icinga/bin/icinga -v /usr/local/icinga/etc/icinga.cfg
 ff056e5c _morecore (2011, b2648, b4438, ff1392ac, 7ffffc00, 0) + 178
 ff05662c _malloc_unlocked (600, b2648, 1, b2648, ff1303a8, 0) + 1fc
 ff05635c _smalloc (10, 0, da08c, ff056538, ffffffff, ff139214) + 4c
 ff056414 malloc   (9, 1, d9fd8, 0, ff1303a8, ff13a518) + 4c
 ff0693dc strdup   (8b6e0, 0, 0, 0, 0, 99) + c
 0003f5e8 init_macrox_names (b01d4, 0, 0, 0, 0, 0) + 28
 0003ff80 init_macros (ffffffff, 3c, 1e, 1, aec00, ae800) + 4
 0004ba8c reset_variables (b1050, 0, e, ff13a050, 1, 1) + 460
 0001ef90 main     (aec00, ffbffb9c, ffbffbac, b01c8, ff350100, 0) + a98
 0001e390 _start   (0, 0, 0, 0, 0, 0) + 5c
-------------------------------------------

Note: I just compiled nagion 3.2.1 + the nagios plugins 1.4.15 (the same used with icinga)
and the system works on solaris ... so it must be a difference since the fork...


Related issues

Duplicated by Core - Bug #572: Segmentation Fault (core dumped) on solaris 10 (x86) whil... Closed 07/05/2010

Associated revisions

Revision 69d5fab5
Added by dnsmichi over 3 years ago

disable eventprofiler on Solaris gcc3, preventing core dumps #702 #572

eventprofiler patch is doing some realloc/malloc magic outside of
main core process, allocating memory in a way it will fail on
Solaris gcc3 (while gcc4 works). Made optional by compiler
detection flag.

refs #702
refs #572

Revision 8ca33ed1
Added by dnsmichi over 3 years ago

move eventprofiler init after config parsing, checking if enabled, making it optional all over #572 #702

refs #572
refs #702
refs #312

History

#1 Updated by dnsmichi over 3 years ago

how is this built? which compiler?

#2 Updated by dnsmichi over 3 years ago

ok, some things to think about.

this patch add profiler_init() without any checks on enabled/disabled in icinga.c
https://git.icinga.org/?p=icinga-core.git;a=commitdiff;h=bea4a961cfdacc1eefe7791849649e42408580d4

could be a possible leak for solaris dumping the core.

althouth the trace leads the way to verifying the config.
https://git.icinga.org/?p=icinga-core.git;a=blob;f=base/icinga.c;h=4392d77bb5b0267a74aea5b190033eceb98d76f9;hb=HEAD#l489
compared to this
http://git.nagiosprojects.org/?p=nagios.git;a=blob;f=base/nagios.c;h=79c38b3525e841a8cd124b397fe7daccada908b5;hb=HEAD#l477

but as a matter of fact, the output in #572 points out that the drop_privileges function with getgid and getuid are faulty.

this leads to the following ideas:

  • wrong free / wrong call, sth changed in icinga core?
  • does configure link wrong library for users checking
  • does the sun compiler studio use cflags in a different way - check configure+makefile

and check on what's been changing since 1.0.1 as this was working fine.

#3 Updated by dnsmichi over 3 years ago

and furthermore, after dropping the prvilegues, probably the reading of the objects fails in some way?

read_object_config_data => xodtemplate.

#4 Updated by antonxx over 3 years ago

GCC version.

Note: I compile as normal user.

As normal user I get:

gcc --version
gcc (GCC) 3.4.6
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE

as root I get

gcc --version
sparc-sun-solaris2.10-gcc (GCC) 4.0.4 (gccfss)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

does this help?

#5 Updated by raindog over 3 years ago

I've encountered the same issue on Solaris 10 with Icinga 1.0.3, complied as non root - core with cgis only.
Version 1.0.1 works fine. Never tested 1.0.2.

bash-3.00$ /sw_ux/scripts/icinga checkconfig
Running configuration check...Bus Error - core dumped
CONFIG ERROR! Check your icinga configuration.

bash-3.00$ gcc --version
gcc (GCC) 3.4.3 (csl-sol210-3_4-branch+sol_rpath)
Copyright (C) 2004 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Also another issue when compiling that's been around for a while. I know the work around to copy the sprintf.o to the common directory.

bash-3.00$ make all
cd ./base && make
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c broker.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c nebmods.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c -o ../common/shared.o ../common/shared.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c checks.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c config.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c commands.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c events.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c flapping.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c logging.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c -o macros-base.o ../common/macros.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c netutils.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c notifications.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c sehandlers.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c -o skiplist.o ../common/skiplist.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c utils.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c profiler.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c -o retention-base.o sretention.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c -o xretention-base.o ../xdata/xrddefault.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c -o comments-base.o ../common/comments.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c -o xcomments-base.o ../xdata/xcddefault.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c -o objects-base.o ../common/objects.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c -o xobjects-base.o ../xdata/xodtemplate.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c -o statusdata-base.o ../common/statusdata.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c -o xstatusdata-base.o ../xdata/xsddefault.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c -o perfdata-base.o perfdata.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c -o xperfdata-base.o ../xdata/xpddefault.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c -o downtime-base.o ../common/downtime.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c -o xdowntime-base.o ../xdata/xdddefault.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c ../common/snprintf.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -o icinga icinga.c broker.o nebmods.o ../common/shared.o checks.o config.o commands.o events.o flapping.o logging.o macros-base.o netutils.o notifications.o sehandlers.o skiplist.o utils.o profiler.o retention-base.o xretention-base.o comments-base.o xcomments-base.o objects-base.o xobjects-base.o statusdata-base.o xstatusdata-base.o perfdata-base.o xperfdata-base.o downtime-base.o xdowntime-base.o ../common/snprintf.o -lm -lsocket -lnsl -lpthread -ldl -lrt -lnsl -lsocket
gcc: ../common/snprintf.o: No such file or directory
  • Error code 1
    make: Fatal error: Command failed for target `icinga'
    Current working directory /sw_ux/downloads/icinga-1.0.3/base
  • Error code 1
    make: Fatal error: Command failed for target `all'

#6 Updated by dnsmichi over 3 years ago

  • Priority changed from Urgent to High

the snprintf target was an attempt for solaris in this issues. it has not been touched ever since missing any more feedback.

https://dev.icinga.org/issues/521
https://dev.icinga.org/issues/524
https://dev.icinga.org/issues/526

changes remain in https://git.icinga.org/?p=icinga-core.git;a=shortlog;h=refs/heads/mfriedrich/sun

it would be great if you can test that branch, and report feedback on this.

besides - are there any ready-to-use solaris vm's available?

#7 Updated by raindog over 3 years ago

Tried your changes for the snprintf issue ...
@
bash-3.00$ make all
cd ./common && make
gcc -fPIC -g -O2 -DHAVE_CONFIG_H -c -o snprintf.o snprintf.c
cd ./base && make
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c broker.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c nebmods.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c -o ../common/shared.o ../common/shared.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c checks.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c config.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c commands.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c events.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c flapping.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c logging.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c -o macros-base.o ../common/macros.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c netutils.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c notifications.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c sehandlers.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c -o skiplist.o ../common/skiplist.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c utils.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c profiler.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c -o retention-base.o sretention.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c -o xretention-base.o ../xdata/xrddefault.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c -o comments-base.o ../common/comments.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c -o xcomments-base.o ../xdata/xcddefault.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c -o objects-base.o ../common/objects.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c -o xobjects-base.o ../xdata/xodtemplate.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c -o statusdata-base.o ../common/statusdata.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c -o xstatusdata-base.o ../xdata/xsddefault.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c -o perfdata-base.o perfdata.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c -o xperfdata-base.o ../xdata/xpddefault.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c -o downtime-base.o ../common/downtime.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -c -o xdowntime-base.o ../xdata/xdddefault.c
gcc -g -O2 -DHAVE_CONFIG_H -DNSCORE -o icinga icinga.c broker.o nebmods.o ../common/shared.o checks.o config.o commands.o events.o flapping.o logging.o macros-base.o netutils.o notifications.o sehandlers.o skiplist.o utils.o profiler.o retention-base.o xretention-base.o comments-base.o xcomments-base.o objects-base.o xobjects-base.o statusdata-base.o xstatusdata-base.o perfdata-base.o xperfdata-base.o downtime-base.o xdowntime-base.o ../common/snprintf.o -lm -lsocket -lnsl -lpthread -ldl -lrt -lnsl -lsocket
Undefined first referenced
symbol in file
vasprintf logging.o
asprintf /var/tmp//ccQCWsIE.o
ld: fatal: Symbol referencing errors. No output written to icinga
collect2: ld returned 1 exit status
  • Error code 1
    make: Fatal error: Command failed for target `icinga'
    Current working directory /sw_ux/downloads/icinga-core/base
  • Error code 1
    make: Fatal error: Command failed for target `all'
    @

#8 Updated by antonxx over 3 years ago

dnsmichi wrote:

...

besides - are there any ready-to-use solaris vm's available?

After registration you can go to:

http://www.oracle.com/technetwork/server-storage/solaris/downloads/index.html

and here you can grab a virtualbox appliance (get it from www.virtualbox.org).

After unzipping the zip file, start your virtualbox and go to:

File -> import appliance

You can use this vm for free, but as I understand, only for development purposes,
so you are not allowed to set up a production system.

(By the way oracle just announced they would stop OpenSolaris!)

#9 Updated by Meier over 3 years ago

It is already known that the change in question was frmo 1.0.1 to 1.0.2

https://dev.icinga.org/issues/572#note-11

Why is this not a duplicate of https://dev.icinga.org/issues/572 ?

#10 Updated by Meier over 3 years ago

antonxx wrote:

dnsmichi wrote:

...

besides - are there any ready-to-use solaris vm's available?

After registration you can go to:

http://www.oracle.com/technetwork/server-storage/solaris/downloads/index.html

and here you can grab a virtualbox appliance (get it from www.virtualbox.org).

After unzipping the zip file, start your virtualbox and go to:

File -> import appliance

You can use this vm for free, but as I understand, only for development purposes,
so you are not allowed to set up a production system.

(By the way oracle just announced they would stop OpenSolaris!)

And they just released Solaris 10u9. Also there are some plans about Solaris Express.

#11 Updated by LarsEngels over 3 years ago

FWIW: I got the same error on a SPARC machine (Solaris 10 Update 7, Compiler: gcc 3.4.6).

#12 Updated by LarsEngels over 3 years ago

gdb shows:

gdb ./icinga /var/core/core_ecpmon01_icinga_0_0_1284475992_11339
GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "sparc-sun-solaris2.10"...
Reading symbols from /lib/libm.so.2...done.
Loaded symbols for /lib/libm.so.2
Reading symbols from /lib/libsocket.so.1...done.
Loaded symbols for /lib/libsocket.so.1
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libpthread.so.1...
warning: Lowest section in /lib/libpthread.so.1 is .dynamic at 00000074
done.
Loaded symbols for /lib/libpthread.so.1
Reading symbols from /usr/local/lib/libltdl.so.7...done.
Loaded symbols for /usr/local/lib/libltdl.so.7
Reading symbols from /lib/librt.so.1...done.
Loaded symbols for /lib/librt.so.1
Reading symbols from /usr/local/ssl/lib/libssl.so.0.9.8...done.
Loaded symbols for /usr/local/ssl/lib/libssl.so.0.9.8
Reading symbols from /usr/local/ssl/lib/libcrypto.so.0.9.8...done.
Loaded symbols for /usr/local/ssl/lib/libcrypto.so.0.9.8
Reading symbols from /lib/libc.so.1...done.
Loaded symbols for /lib/libc.so.1
Reading symbols from /usr/local/lib/libgcc_s.so.1...done.
Loaded symbols for /usr/local/lib/libgcc_s.so.1
Reading symbols from /lib/libaio.so.1...done.
Loaded symbols for /lib/libaio.so.1
Reading symbols from /lib/libmd.so.1...done.
Loaded symbols for /lib/libmd.so.1
Reading symbols from /lib/libdl.so.1...
warning: Lowest section in /lib/libdl.so.1 is .hash at 000000b4
done.
Loaded symbols for /lib/libdl.so.1
Reading symbols from /platform/sun4v/lib/libc_psr.so.1...done.
Loaded symbols for /platform/SUNW,SPARC-Enterprise-T5120/lib/libc_psr.so.1
Reading symbols from /lib/ld.so.1...done.
Loaded symbols for /lib/ld.so.1
Core was generated by `./icinga /usr/local/icinga/etc/icinga.cfg'.
Program terminated with signal 10, Bus error.
[New process 76875 ]
#0 0xfee570c4 in _morecore () from /lib/libc.so.1
(gdb) bt
#0 0xfee570c4 in _morecore () from /lib/libc.so.1
#1 0xfee5689c in _malloc_unlocked () from /lib/libc.so.1
#2 0xfee565cc in _smalloc () from /lib/libc.so.1
#3 0xfee56684 in malloc () from /lib/libc.so.1
#4 0xfee6964c in strdup () from /lib/libc.so.1
#5 0x0003fbac in init_macrox_names () at ../common/macros.c:2509
#6 0x00040544 in init_macros () at ../common/macros.c:2464
#7 0x0004c06c in reset_variables () at utils.c:4735
#8 0x0001ebb4 in main (argc=720896, argv=0xffbff6bc, env=0xb0000) at icinga.c:657
(gdb)

#13 Updated by LarsEngels over 3 years ago

common/macros.c line 2509

add_macrox_name(HOSTNAME);

Macro:
#define add_macrox_name(name) macro_x_names[MACRO_##name] = strdup(#name)

#14 Updated by dnsmichi over 3 years ago

  • Assignee set to dnsmichi

i consider gcc3 as root of all evil, and as a matter of fact that #define trick does not work with gcc3 then. the strdup cannot duplicate the string as there is no source address in memory - best guess so far.

in order to remove this bug, I'll revert the commit d60c8afdbfbec89245fd9eb8459ebaea72d92dbe but leave the notificationsescalated macrofix in place.

#15 Updated by dnsmichi over 3 years ago

ok. taking gurrent git master from 23-09-2010 18:00 dbe4749b6a5fdd670a55ef44deb7d56d9b390fcd

gcc version 3.4.6

installed like this, with some ssl configure hacks: https://dev.icinga.org/projects/icinga-core/wiki/Setup_Solaris_VM

40b98f218bcda9eb3f5ca1ecc772f5b6e23e4b3c in mfriedrich/solaris

compiled as user, installed via sudo into /usr/local/icinga

run as daemon, root: fine

-bash-3.00# /usr/local/icinga/bin/icinga /usr/local/icinga/etc/icinga.cfg

run via init-script

-bash-3.00# /etc/init.d/icinga start
-n Running configuration check...
Segmentation Fault - core dumped
CONFIG ERROR! Start aborted. See /usr/local/icinga/var/icinga.chk for details.

but with -d it does not on the shell.

-bash-3.00# truss -f /etc/init.d/icinga start

6540:   stat64("/usr/local/lib/libc.so.1", 0x08047470)  Err#2 ENOENT
6540:   stat64("/usr/local/ssl/lib/libc.so.1", 0x08047470) Err#2 ENOENT
6540:   mmap(0x00010000, 24576, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xD0C60000
6540:   munmap(0xD0F40000, 32768)                       = 0
6540:   getcontext(0x08047A40)
6540:   getrlimit(RLIMIT_STACK, 0x08047A38)             = 0
6540:   getpid()                                        = 6540 [6539]
6540:   lwp_private(0, 1, 0xD0C62A00)                   = 0x000001C3
6540:   setustack(0xD0C62A60)
6540:   sigfillset(0xD0C74DD0)                          = 0
6540:   mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1, 0) = 0xD0F40000
6540:   sysconfig(_CONFIG_SEM_VALUE_MAX)                = 2147483647
6540:   sysconfig(_CONFIG_STACK_PROT)                   = 7
6540:   sysi86(SI86FPSTART, 0xD0C75740, 0x0000133F, 0x00001F80) = 0x00000001
6540:   brk(0x080E7DC8)                                 = 0
6540:   brk(0x080E9DC8)                                 = 0
6540:   ioctl(1, TCGETA, 0x08047014)                    Err#25 ENOTTY
6540:   fstat64(1, 0x08047040)                          = 0
6540:   brk(0x080E9DC8)                                 = 0
6540:   brk(0x080EDDC8)                                 = 0
6540:   fstat64(1, 0x08046F80)                          = 0
6540:   umask(022)                                      = 022
6540:   open("/usr/local/icinga/etc/icinga.cfg", O_RDONLY) = 3
6540:   fxstat(2, 3, 0x08047BC0)                        = 0
6540:   mmap(0x00000000, 46182, PROT_READ, MAP_PRIVATE, 3, 0) = 0xD0B20000
6540:   open("/usr/local/icinga/etc/resource.cfg", O_RDONLY) = 4
6540:   fxstat(2, 4, 0x08047B90)                        = 0
6540:   mmap(0x00000000, 1304, PROT_READ, MAP_PRIVATE, 4, 0) = 0xD0B00000
6540:       Incurred fault #6, FLTBOUNDS  %pc = 0xD0B94E98
6540:         siginfo: SIGSEGV SEGV_MAPERR addr=0x080EE000
6540:       Received signal #11, SIGSEGV [default]
6540:         siginfo: SIGSEGV SEGV_MAPERR addr=0x080EE000
6539:   lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF [0x0000FFFF]
6539:   waitid(P_PID, 6540, 0x08047B70, WEXITED|WTRAPPED|WNOWAIT) (sleeping...)
6539:   waitid(P_PID, 6540, 0x08047B70, WEXITED|WTRAPPED|WNOWAIT) = 0
6539:   ioctl(0, TIOCGPGRP, 0x08047BE8)                 = 0
6539:   ioctl(0, TCGETS, 0x08075ED0)                    = 0
6539:   waitid(P_PID, 6540, 0x08047B70, WEXITED|WTRAPPED) = 0
Segmentation Fault - core dumped
6539:   write(2, " S e g m e n t a t i o n".., 33)      = 33
CONFIG ERROR! Start aborted. See /usr/local/icinga/var/icinga.chk for details.
6539:   write(1, " C O N F I G   E R R O R".., 79)      = 79
6539:   _exit(1)

-bash-3.00# truss -f /usr/local/icinga/bin/icinga -v /usr/local/icinga/etc/icinga.cfg

Icinga 1.0.3
6543:   write(1, "\n I c i n g a   1 . 0 .".., 14)      = 14
Copyright (c) 2009-2010 Icinga Development Team (http://www.icinga.org)
6543:   write(1, " C o p y r i g h t   ( c".., 72)      = 72
Copyright (c) 2009 Nagios Core Development Team and Community Contributors
6543:   write(1, " C o p y r i g h t   ( c".., 75)      = 75
Copyright (c) 1999-2009 Ethan Galstad
6543:   write(1, " C o p y r i g h t   ( c".., 38)      = 38
Last Modified: 08-18-2010
6543:   write(1, " L a s t   M o d i f i e".., 26)      = 26
License: GPL

6543:   write(1, " L i c e n s e :   G P L".., 14)      = 14
6543:   brk(0x080E9DC8)                                 = 0
6543:   brk(0x080EBDC8)                                 = 0
6543:   umask(022)                                      = 022
Reading configuration data...
6543:   write(1, " R e a d i n g   c o n f".., 30)      = 30
6543:   open("/usr/local/icinga/etc/icinga.cfg", O_RDONLY) = 3
6543:   fxstat(2, 3, 0x08047BC0)                        = 0
6543:   mmap(0x00000000, 46182, PROT_READ, MAP_PRIVATE, 3, 0) = 0xD0B20000
6543:   brk(0x080EBDC8)                                 = 0
6543:   brk(0x080EDDC8)                                 = 0
6543:   open("/usr/local/icinga/etc/resource.cfg", O_RDONLY) = 4
6543:   fxstat(2, 4, 0x08047B90)                        = 0
6543:   mmap(0x00000000, 1304, PROT_READ, MAP_PRIVATE, 4, 0) = 0xD0B00000
6543:   munmap(0xD0B00000, 1304)                        = 0
6543:   close(4)                                        = 0
6543:   openat(-3041965, "/tmp", O_RDONLY|O_NDELAY|O_LARGEFILE) = 4
6543:   fcntl(4, F_SETFD, 0x00000001)                   = 0
6543:   fstat64(4, 0x08047BB0)                          = 0
6543:   close(4)                                        = 0

next hackup - echo the initscript output of chkconfig, where the segfault happens.

-bash-3.00# /usr/local/icinga/bin/icinga -v /usr/local/icinga/etc/icinga.cfg > /usr/local/icinga/var/icinga.chk 2>&1
Segmentation Fault (core dumped)

-bash-3.00# /usr/local/icinga/bin/icinga -v /usr/local/icinga/etc/icinga.cfg > /dev/null 2>&1
Segmentation Fault (core dumped)

taken old init-script from 1.0.1 - same dump. so it has to do something with the > ... param somwhow. opened pipe while opening files? prohibited by some mechanism like selinux?

the segfault clearly shows an access violation in mmap. which leads into the shared.c directive introduced after 1.0.1

#16 Updated by dnsmichi over 3 years ago

regarding memory allocation.

i've now done reversed quicksort. got commit sha1 from 1.0.2 and 1.0.1 and stepped half way down, running gdb all the time on the checked out branches (iirc i was at test25 then)

$ git checkout -b test1 <sha1>
$ make distclean && ./configure --with-command-group=icinga --with-httpd-conf=/usr/local/apache/conf --enable-event-broker --with-icinga-user=icinga --with-icinga-group=icinga --prefix=/usr/local/icinga --with-gd-lib=/usr/local/lib --with-gd-inc=/usr/local/include && make icinga

point is, that

WORKS OK
https://git.icinga.org/?p=icinga-core.git;a=commit;h=59eeccbcf53f2062f107fc52bf5228f034c4894a

SEGFAULT
https://git.icinga.org/?p=icinga-core.git;a=commit;h=bea4a961cfdacc1eefe7791849649e42408580d4

this is when the eventprofiler steps in.

running through what it does.

icinga.c

profiler_init(); is called. even if event_profiling is disabled.

profiler.c

within profiler_init() several profiler_add() calls.

profiler_add() allocates memory like this

profiler = realloc(profiler,(sizeof(profiler_item) * (++profiler_item_count)));

afterwards, nothing special happens.

realloc

http://opensolaris.org/jive/message.jspa?messageID=89269

profiler_item is int, int, double, char*
profiler_item_count is incremented each call, if a higher event number is triggered.

ok, so it just re-allocates more memory.

what if it allocates too much for the current process?
what if solaris links not to libc and uses another realloc implementation?
what if solaris realloc copies everytime remapping is not really needed?
what if this copying enforces memory addresses not to be allocatable afterwards?

ok, man pages.

To quote the realloc(3) manpage...
realloc() changes the size of the memory block pointed to by ptr to
size bytes. The contents will be unchanged to the minimum of the old
and new sizes; newly allocated memory will be uninitialized. If ptr is
NULL, the call is equivalent to malloc(size); if size is equal to zero,
the call is equivalent to free(ptr). Unless ptr is NULL, it must have
been returned by an earlier call to malloc(), calloc() or realloc().
If the area pointed to was moved, a free(ptr) is done.

========

what can be resolved

=> comment profiler_init(); call in icinga.c - everything works fine (x86 and sparc tested).
=> make this call optional -

  • at compile time
  • from config

#17 Updated by dnsmichi over 3 years ago

  • Category set to Other
  • Status changed from New to Assigned
  • Target version set to 1.2 (Stable)

http://www.totalviewtech.com/support/documentation/tips/realloc_issue.html

needed to debug if dangling pointers might happen.

#18 Updated by dnsmichi over 3 years ago

  • % Done changed from 0 to 90

tests for longer runs needed, til monday :)

#19 Updated by dnsmichi over 3 years ago

  • Status changed from Assigned to Resolved
  • % Done changed from 90 to 100

runs fine on x86 and sparc. x86 gdb session over the weekend did not throw anything special.

re-open if you consider any other error.

Also available in: Atom PDF