Bug #436

Use execv to execute active check commands

Added by dnsmichi almost 4 years ago. Updated almost 4 years ago.

Status:ResolvedStart date:05/16/2010
Priority:NormalDue date:
Assignee:dnsmichi% Done:

100%

Category:Active Checks
Target version:1.0.3
Icinga Version: OS Version:

Description

http://tracker.nagios.org/view.php?id=86

      Currently Nagios uses popen and system to run active check commands, with shell intepretation. Suggest we use execv (or similar) instead so that there is no shell expansion required.

This means that 1 less process (sh) is required to execute an active check, which should give a performance improvement.

When running the active check, see if there are any shell metacharacters. If there are, then fallback to the shell invocation. Otherwise use the new execv method.

http://article.gmane.org/gmane.network.nagios.devel/6971

From: Hiren Patel <hir3npatel <at> gmail.com>
Subject: tracker id 86 - software freedom day event
Newsgroups: gmane.network.nagios.devel
Date: 2009-09-19 21:31:52 GMT (34 weeks, 19 hours and 6 minutes ago)

the Johannesburg South Africa software freedom day event had a session 
where users could work on bugs/features on any project of their choice, 
I chose to work on tracker id 86 in the event.

attached are diffs for the lists review, only very basic testing has 
been done. any further testing or comments and concerns appreciated.

one area I'm not too sure of, is if waitpid can be interrupted by a 
signal and needs to be handled differently to what it currently is.

I'd like to take this opportunity on software freedom day to thank the 
nagios project for its work, and the free software community at large.

--- nagios/base/checks.c    2009-08-11 19:29:52.000000000 +0200
+++ checks.c    2009-09-19 23:12:46.000000000 +0200
@@ -338,7 +338,7 @@
     int fork_error=FALSE;
     int wait_result=0;
     host *temp_host=NULL;
-    FILE *fp=NULL;
+    FILE *fp=NULL, *chldfp;
     int pclose_result=0;
     mode_t new_umask=077;
     mode_t old_umask;
@@ -346,6 +346,9 @@
     double old_latency=0.0;
     dbuf checkresult_dbuf;
     int dbuf_chunk=1024;
+    int pipefds[2], chldstatus, i;
+    char *chldargs[MAXCHLDARGS];
+    char *s , *p;
 #ifdef USE_EVENT_BROKER
     int neb_result=OK;
 #endif
@@ -771,22 +774,102 @@

             /* run the plugin check command */
-            fp=popen(processed_command,"r");
-            if(fp==NULL)
-                _exit(STATE_UNKNOWN);
+            if (!has_shell_metachars(processed_command)){
+                if (pipe(pipefds) == -1){
+                    logit(NSLOG_RUNTIME_WARNING,TRUE,"error creating pipe: %s\n",strerror(errno));
+                    _exit(STATE_UNKNOWN);
+                }

-            /* initialize buffer */
-            strcpy(output_buffer,"");
+                pid = fork();
+                if (pid == -1){
+                    logit(NSLOG_RUNTIME_WARNING,TRUE,"fork error\n");
+                    _exit(STATE_UNKNOWN);
+                }

-            /* get all lines of plugin output - escape newlines */
-            while(fgets(output_buffer,sizeof(output_buffer)-1,fp)){
-                temp_buffer=escape_newlines(output_buffer);
-                dbuf_strcat(&checkresult_dbuf,temp_buffer);
-                my_free(temp_buffer);
+                if (pid == 0){
+                    close(pipefds[0]);
+
+                    if (dup2(pipefds[1],STDOUT_FILENO) == -1){
+                        logit(NSLOG_RUNTIME_WARNING,TRUE,"dup2 error\n");
+                        _exit(EXIT_FAILURE);
+                    }
+
+                    if (dup2(pipefds[1],STDERR_FILENO) == -1){
+                        logit(NSLOG_RUNTIME_WARNING,TRUE,"dup2 error\n");
+                        _exit(EXIT_FAILURE);
+                    }
+                    close(pipefds[0]);
+
+                    s = strchr(processed_command,' ');
+                    if (s){
+                        *s = '\0';
+                        p = s+1;
+
+                        chldargs[0] = processed_command;
+                        for(i=1;i<MAXCHLDARGS-2;i++){
+                            s = strchr(p,' ');
+                            chldargs[i] = p;
+
+                            if (s){
+                                *s = '\0';
+                                p = s+1;
+                            }
+
+                            if (!s)
+                                break;
+                        }
+
+                        if (i >= MAXCHLDARGS-2){
+                            logit(NSLOG_RUNTIME_WARNING,TRUE,"overlimit args for command %s\n",chldargs[0]);
+                            _exit(EXIT_FAILURE);
+                        }
+                        else
+                            chldargs[++i] = '\0';
+                    }
+                    else{
+                        chldargs[0] = processed_command;
+                        chldargs[1] = '\0';
+                    }
+
+                    log_debug_info(DEBUGL_CHECKS,0,"running process %s via execv\n",processed_command);
+                    execv(chldargs[0],chldargs);
+                    _exit(EXIT_FAILURE);
                 }

-            /* close the process */
-            pclose_result=pclose(fp);
+                close(pipefds[1]);
+
+                chldfp = fdopen(pipefds[0],"r");
+                if (chldfp == NULL){
+                    logit(NSLOG_RUNTIME_WARNING,TRUE,"fdopen error\n");
+                    _exit(EXIT_FAILURE);
+                }
+
+                while(fgets(output_buffer,sizeof(output_buffer)-1,chldfp)){
+                    temp_buffer=escape_newlines(output_buffer);
+                    dbuf_strcat(&checkresult_dbuf,temp_buffer);
+                    my_free(temp_buffer);
+                }
+
+                fclose(chldfp);
+                waitpid(pid,&chldstatus,0);
+                pclose_result=WEXITSTATUS(chldstatus);
+            }
+            else{
+                log_debug_info(DEBUGL_CHECKS,0,"running process %s via popen\n",processed_command);
+                fp=popen(processed_command,"r");
+                if(fp==NULL)
+                    _exit(STATE_UNKNOWN);
+    
+                strcpy(output_buffer,"");
+    
+                while(fgets(output_buffer,sizeof(output_buffer)-1,fp)){
+                    temp_buffer=escape_newlines(output_buffer);
+                    dbuf_strcat(&checkresult_dbuf,temp_buffer);
+                    my_free(temp_buffer);
+                    }
+    
+                pclose_result=pclose(fp);
+            }

             /* reset the alarm */
             alarm(0);

--- nagios/include/nagios.h    2008-12-14 16:52:23.000000000 +0200
+++ nagios.h    2009-09-19 18:18:08.000000000 +0200
@@ -44,6 +44,7 @@
    command file. EG 10/19/07
 */
 #define MAX_PLUGIN_OUTPUT_LENGTH                8192    /* max length of plugin output (including perf data) */
+#define MAXCHLDARGS    20

@@ -807,6 +808,7 @@

 char *get_program_version(void);
 char *get_program_modification_date(void);
+int has_shell_metachars(const char *);

 mmapfile *mmap_fopen(char *);                /* open a file read-only via mmap() */
 int mmap_fclose(mmapfile *);

--- utils.c    2009-08-11 19:29:52.000000000 +0200
+++ /tmp/utils.c    2009-09-19 13:31:19.000000000 +0200
@@ -4589,7 +4589,12 @@
     return (char *)PROGRAM_MODIFICATION_DATE;
         }

-
+int has_shell_metachars(const char *s){
+    if (strpbrk(s,"!$^&*()~[]\\|{};<>?'\""))
+        return 1;
+    else
+        return 0;
+}

 /******************************************************************/
 /*********************** CLEANUP FUNCTIONS ************************/

nagios_no_popen.patch Magnifier (5.28 KB) dnsmichi, 05/31/2010 08:34 pm

nagios_no_popen_r2.patch Magnifier (5.8 KB) dnsmichi, 06/02/2010 04:21 pm


Related issues

Related to Core - Bug #617: check_mk failure in 1.0.2 Resolved 07/15/2010

Associated revisions

Revision ba8364e2
Added by dnsmichi almost 4 years ago

modify execv to execvp, accepting 4096 cmd args, for both host and service checks with adapted error handling

the previous version only allowed 20-1 chld args, and was limited on the
error output. it is now a combined diff from Hiren, Christoph and Matthieu,
having all features applied and several bugs fixed (pipe returns -1 on error)

Furthermore, it's also applied for host checks.

execvp searches in path too, e.g. if plugin path is not set correctly but
is in PATH.

The run check procedure still falls back on popen execution, if the plugin
command contains shell meta characters. (Can be compared to Perl's exec vs
system)

kudos to Hiren Patel, Christoph Maser and Matthieu Kermagoret

refs #436

Revision 67420d5e
Added by dnsmichi almost 4 years ago

execvp searches in PATH too like popen, and returns if an error occurs, outputting the error string

revamped the previous attempt a bit. there might be discussions about
allowing PATH or not PATH env variables. But by completely breaking
compatibility this won't be nice.

the current attempt only checks if the first argument is set and then
calling execvp - this looks in $PATH if there is no trailing slash set
on the executable.
In case of an error, execvp returns -1 and this is where the error is
put into syslog, exiting with state unknown. (a further adaption would be
setting the output of the plugin like popen behaves)

if you consider not using the PATH for that, you can simply change execvp to
execv and recompile the code. The checks on executable/there are still valid,
except PATH lookups.

refs #436
refs #617

Revision 9d265f32
Added by dnsmichi over 3 years ago

modify execv to execvp, accepting 4096 cmd args, for both host and service checks with adapted error handling

the previous version only allowed 20-1 chld args, and was limited on the
error output. it is now a combined diff from Hiren, Christoph and Matthieu,
having all features applied and several bugs fixed (pipe returns -1 on error)

Furthermore, it's also applied for host checks.

execvp searches in path too, e.g. if plugin path is not set correctly but
is in PATH.

The run check procedure still falls back on popen execution, if the plugin
command contains shell meta characters. (Can be compared to Perl's exec vs
system)

kudos to Hiren Patel, Christoph Maser and Matthieu Kermagoret

refs #436

Revision 74160140
Added by dnsmichi over 3 years ago

execvp searches in PATH too like popen, and returns if an error occurs, outputting the error string

revamped the previous attempt a bit. there might be discussions about
allowing PATH or not PATH env variables. But by completely breaking
compatibility this won't be nice.

the current attempt only checks if the first argument is set and then
calling execvp - this looks in $PATH if there is no trailing slash set
on the executable.
In case of an error, execvp returns -1 and this is where the error is
put into syslog, exiting with state unknown. (a further adaption would be
setting the output of the plugin like popen behaves)

if you consider not using the PATH for that, you can simply change execvp to
execv and recompile the code. The checks on executable/there are still valid,
except PATH lookups.

refs #436
refs #617

History

#1 Updated by dnsmichi almost 4 years ago

please check on this patch too.

-------- Original Message --------
Subject:     [Nagios-devel] Do not launch a shell for each check
Date:     Mon, 31 May 2010 16:28:53 +0200
From:     Matthieu Kermagoret <mkermagoret@merethis.com>
Reply-To:     Nagios Developers List <nagios-devel@lists.sourceforge.net>
To:     Nagios Developers List <nagios-devel@lists.sourceforge.net>

Hi list,

I'd like to propose a performance patch for Nagios that reduces the
number of Nagios' descendant processes.

The action of launching a single check involves multiple processes :

  - child 1 (not executed if child_processes_fork_twice is disabled)
    - launch child 2
    - waits for child 2 termination

  - child 2
    - reset signal handling
    - launch a shell (connected with a pipe via *popen*)
    - read plugin output from the pipe
    - write plugin output to a file

  - shell
    - executed as /sh -c command/
    - variable expansion
    - globbing
    - every shell stuff you have on the command line
    - launch the check itself

  - check itself
    - the obvious

In this event flow, we found that the shell execution wasn't required
in most cases. So, by avoiding /sh/ we could save fork/exec time and
reduce the number of concurrent processes at the same time. And indeed
shell's magic would still be available using /sh -c/ as the check
command with no performance penalty compared to the current situation.

What do you think about it ?

Best regards,

-- 
Matthieu KERMAGORET | Développeur

mkermagoret@merethis.com

MERETHIS est éditeur du logiciel Centreon.

#2 Updated by dnsmichi almost 4 years ago

-------- Original Message --------
Subject: Re: [Nagios-devel] Do not launch a shell for each check
Date: Tue, 1 Jun 2010 15:49:50 +0200
From: Matthieu Kermagoret <mkermagoret@merethis.com>
Reply-To: Nagios Developers List <nagios-devel@lists.sourceforge.net>
To: Nagios Developers List <nagios-devel@lists.sourceforge.net>

On Tue, Jun 1, 2010 at 11:21 AM, Andreas Ericsson <ae@op5.se> wrote:
> On 05/31/2010 04:28 PM, Matthieu Kermagoret wrote:
>> I'd like to propose a performance patch for Nagios that reduces the
>> number of Nagios' descendant processes.
>>
>
> Thanks. The idea behind the patch is good. The patch itself is not so
> stellar though.
>

Thanks for your feedback. I rewrote it using Andreas' advice and Hiren
Patel's patch (who indeed already proposed a patch doing the same
thing here nine months ago :
http://article.gmane.org/gmane.network.nagios.devel/6971).

The patch I propose, handle simple commands with shell quoting (simple
and double quote). Every command containing any of these characters
(escaped or not) will be handled by the shell --> !$^&*()~[]|{};<>?`
<--

So any feedback on this new proposal ?

-- 
Matthieu KERMAGORET | Développeur

mkermagoret@merethis.com

MERETHIS est éditeur du logiciel Centreon.

#3 Updated by dnsmichi almost 4 years ago

  • Assignee changed from magellanic to dnsmichi

checking on the latest patch and rework it. could be sth with #547

#4 Updated by Meier almost 4 years ago

dnsmichi wrote:

checking on the latest patch and rework it. could be sth with #547

i doubt that please see comment over at #547

#5 Updated by Meier almost 4 years ago

I wonder if we really need 3 execution modes (epn, execv, popen) is/was shell syntax ever officially documented supported? Maybe we can drop the popen all together.
Also the same patches should be applied to run_async_host_check_3x

+C

#6 Updated by dnsmichi almost 4 years ago

it seems that MAXCHLDARGS set to 20 is too low.

include/icinga.h
#define MAXCHLDARGS    20

base/checks.c
+                                if (i >= MAXCHLDARGS-2){
+                                    logit(NSLOG_RUNTIME_WARNING,TRUE,"overlimit args for command %s\n",chldargs[0]);
+                                    _exit(EXIT_FAILURE);
+                                }

-------- Original Message --------
Subject:     [icinga-users] "overlimit args for command" error with icinga core 1.0.2
Date:     Wed, 7 Jul 2010 16:53:56 +0000 (GMT)
From:     tontonitch-pro yahoo.fr
Reply-To:     icinga-users@lists.sourceforge.net
To:     icinga-users@lists.sourceforge.net

Hi,

I've just upgraded my Icinga server (core) to the last stable release, 1.0.2, and  I'm facing a problem with checks using the following command:

$USER1$/check_nrpe -H $HOSTADDRESS$ -p $USER3$ -t 60 -c CheckMem -a MaxWarn=$ARG1$ MaxCrit=$ARG2$ type=physical MaxWarn=$ARG3$ MaxCrit=$ARG4$ type=virtual MaxWarn=$ARG5$ MaxCrit=$ARG6$ type=paged

In the log file, I can see for each check attempt of that kind of checks the following entries:

[1278520196] overlimit args for command /usr/local/icinga/libexec/check_nrpe
[1278520198] SERVICE ALERT: myhost;Memory usage;WARNING;HARD;3;(null)

In full debug mode (-1), I cannot see anything on that problem.

Apparently, it's related to the changes done for the Bug #436
https://dev.icinga.org/issues/436

Is there a way to make this check working as previously?

Regards,

Yannick

I consider the reworked patch, (r2) which is attached here, combines the command parsing logic a bit better than the current one.

at least regarding the command arg max

#define MAX_CMD_ARGS 4096

#7 Updated by dnsmichi almost 4 years ago

  • Category changed from Other to Active Checks

#8 Updated by dnsmichi almost 4 years ago

  • Status changed from New to Resolved
  • Target version set to 1.0.3
  • % Done changed from 0 to 100

i consider that being resolved for the time being. dropping popen needs a different method of handling shell meta characters.

Also available in: Atom PDF