Feature #1363

Oracle:cleanup_table_by_instance_time "no data" errors in syslog

Added by Tommi about 3 years ago. Updated almost 3 years ago.

Status:ResolvedStart date:03/28/2011
Priority:NormalDue date:
Assignee:Tommi% Done:

100%

Category:RDBMS
Target version:Icinga 1.x - 1.4

Description

Found tons of "errors" within my syslog, but suspect there is no error but really no data to clean. For me is no reason to pump this into syslog
Mar 28 21:30:59 host ido2db: ERROR: MSG 'ORA-01403: no data found '
Mar 28 21:30:59 host ido2db: ERROR: QUERY 'BEGIN clean_table_by_instance_time(:X1, :X2, :X3, :X4); END;'
catching this exception should be sufficient.

Associated revisions

Revision bd2d05b1
Added by Tommi about 3 years ago

recreate_icinga13_objects.sql: change exception handling in PL/SQL functions refs #1363

refs #1363

Revision 874e2ad1
Added by Tommi almost 3 years ago

includes all patchesi and features in upgrade/*.sql
refs #1354,#1355,#1358,#1363,#1401,#1173

History

#1 Updated by dnsmichi about 3 years ago

well i figured recently, that those errors where not resolved by just returning others in exception condition in the procedure.
but i'd like to know how to catch such an exception - because if in ido2db code, you might create a dictionary of error codes, which to spit out and which to ignore. i'd rather not want to do that.

#2 Updated by Tommi about 3 years ago

 exception
    when no_data_found then null;

will do the job. Dont had these messages over the day.

Anyway, i cant see any advantage using a SP over conventional coding in this particular case, because it will only move the literals from oci to SP execute immediate string. The problems are the literals with a lot of different table names and timestamps which cant be avoided. Also they are to often execetuted.
For my opinion we can safely go to put cleanup back into the code, but start it only once a day(max once at hour) or offer a crontab job to do the all the cleanup to be scheduled by the user.

#3 Updated by Tommi about 3 years ago

usually i will ignore the following return codes

$noerr=array(0,1403,24347);//No Error,No Data,Null in Aggregate

But with newer oracle versions or custom error codes it will be more when using these features.
I would like to suggest a central ido_oci_execute function covering execute and error checking/logging

#4 Updated by dnsmichi about 3 years ago

Tommi wrote:

[...]
will do the job. Dont had these messages over the day.

Anyway, i cant see any advantage using a SP over conventional coding in this particular case, because it will only move the literals from oci to SP execute immediate string. The problems are the literals with a lot of different table names and timestamps which cant be avoided. Also they are to often execetuted.

the different table names can't be avoided as it's depending on ~60 tables to be cleaned (some at startup, some regularly by time). as written in the other issue, putting those strings straight up into a query is not an option.

i thought of putting that into a procedure, allowing the DBA to tweak and fix that him/herself.

For my opinion we can safely go to put cleanup back into the code, but start it only once a day(max once at hour) or offer a crontab job to do the all the cleanup to be scheduled by the user.

cleaning interval has been reduced to 1hour per default in my branch, but the users still can set that to another level throughout the config. afterall, you are adjusting that for the rdbms you are using and it's not determined which one will be cleaned. at least on non optimized systems and automated cleanup would be sufficient. letting 4k services, checkinterval 5 minutes run against an oracle db will cause ~4mio rows / day. this is when you want regular housekeeping.
furthermore, housekeeping is being run as own thread, so the connection and such are independant of the rest.

on the logging/executing - if it's really necessary to put up some error codes to be ignored, define them globally within the header files, using it then somewhere around the code.

#5 Updated by Tommi about 3 years ago

dnsmichi wrote:

Tommi wrote:

[...]
will do the job. Dont had these messages over the day.

Anyway, i cant see any advantage using a SP over conventional coding in this particular case, because it will only move the literals from oci to SP execute immediate string. The problems are the literals with a lot of different table names and timestamps which cant be avoided. Also they are to often execetuted.

the different table names can't be avoided as it's depending on ~60 tables to be cleaned (some at startup, some regularly by time).

this i agree

as written in the other issue, putting those strings straight up into a query is not an option.

this i dont would say. I suggest a similar functionality of the procedure within the db*.c code with the sql generated with (a)sprintf and execute direct from oci. At the end your procedure will do exactly the same. We can stay with the procedure but i would point out there is no advantage to use a SP in this way.

i thought of putting that into a procedure, allowing the DBA to tweak and fix that

him/herself.

with "execute immediate" a dba wont be able to improve anything concerning the conditions you mentioned above

For my opinion we can safely go to put cleanup back into the code, but start it only once a day(max once at hour) or offer a crontab job to do the all the cleanup to be scheduled by the user.

cleaning interval has been reduced to 1hour per default in my branch, but the users still can set that to another level throughout the config. afterall, you are adjusting that for the rdbms you are using and it's not determined which one will be cleaned. at least on non optimized systems and automated cleanup would be sufficient. letting 4k services, checkinterval 5 minutes run against an oracle db will cause ~4mio rows / day. this is when you want regular housekeeping.

Yes, of cource i will have cleaning, but not checking every minute 60 tables with x mio rows. If i have 100 mio rows within the table its better to check and delete 4 mio rows once a day instead of checking 100mio rows once per minute to find only 4k to delete.

furthermore, housekeeping is being run as own thread, so the connection and such are independant of the rest.

fine. We should modify at least the connect function to apply a module name to the session identified which job ist just doing the queries. It will help to assign bottlenecks to a particular sub function

on the logging/executing - if it's really necessary to put up some error codes to be ignored, define them globally within the header files, using it then somewhere around the code.

Do you like Logfiles with tousands of lines for an unimportand issue hiding these lines which are really important to care?

#6 Updated by dnsmichi about 3 years ago

Tommi wrote:

this i dont would say. I suggest a similar functionality of the procedure within the db*.c code with the sql generated with (a)sprintf and execute direct from oci. At the end your procedure will do exactly the same. We can stay with the procedure but i would point out there is no advantage to use a SP in this way.

let's say, you'll do it right away like you proposed on the oci side of life. then i'll do my work on the code (most likely fulfilling mysql and postgresql) and i change a global variable. should i then recheck everything for oracle, because there was an exception made just for oracle?
idoutils support 3 rdbms, it was hard to get there, and everything is being made possible to keep that way. we often ran in to things that just mysql worked, and the rest did not. so was the thing with only oracle worked, and libdbi did not. i don't want to see that happen over here. there are rules and conditions on the environment (idoutils, especially ido2db and handling the data from the socket into the db), which need to be met in the first place. if they are met, an exception likewise for functionality and/or performance reasons can be raised.

i don't see any benefit here, only that the solaris asprintf needs a review on the encoding. this might be a bug with higher priority too - did you check everything else too? there are quite a few asprintfs all around the code, also affecting other data, not only the tablenames themselves.

with "execute immediate" a dba wont be able to improve anything concerning the conditions you mentioned above

my dba's told me that execute immediate would be the best option for that conditions (explained above why) and my recherche encouraged me to do that in such away. what you can improve now - you can fix the no_data exception straight away. you don't need to compile or upgrade anything. that's mostly the point, why putting that onto the rdbms itsself.

Yes, of cource i will have cleaning, but not checking every minute 60 tables with x mio rows. If i have 100 mio rows within the table its better to check and delete 4 mio rows once a day instead of checking 100mio rows once per minute to find only 4k to delete.

it's not 60. it's just those which are provided within ido2db.cfg: max_\*_age, only historical related. when we had 4 mio rows, we disabled writing servicechecks either way, since it's a status only rdbms right on. and it will be replaced by postgresql soon either way.
check my ido branch and you will see that the trimming interval is set to 3600 as default. (idoutils does provide that to the user since the very first beginning, so it is and was possible for >1 year now to set that yourself :) i'm only working on that (including the threads) on a seperate branch. it's work in progress, and does not fit my maintainer work on core and classicui.

fine. We should modify at least the connect function to apply a module name to the session identified which job ist just doing the queries. It will help to assign bottlenecks to a particular sub function

the conninfo table gets the checkin inserted, differing in the name, if that would help in any way. that agent name is also kept within the idi struct.
the threading is on of my next todos, so i'd rather not touch that now if i were you.

Do you like Logfiles with tousands of lines for an unimportand issue hiding these lines which are really important to care?

no. i normally grep them. and no_data can be dangerous. so in order to allow each sysadmin to set that like he wants i'll go one step further, and propose a comma seperated cfg option for error codes to be ignored by oci.

#7 Updated by Tommi about 3 years ago

dnsmichi wrote:

Tommi wrote:

this i dont would say. I suggest a similar functionality of the procedure within the db*.c code with the sql generated with (a)sprintf and execute direct from oci. At the end your procedure will do exactly the same. We can stay with the procedure but i would point out there is no advantage to use a SP in this way.

let's say, you'll do it right away like you proposed on the oci side of life. then i'll do my work on the code (most likely fulfilling mysql and postgresql) and i change a global variable. should i then recheck everything for oracle, because there was an exception made just for oracle?
idoutils support 3 rdbms, it was hard to get there, and everything is being made possible to keep that way. we often ran in to things that just mysql worked, and the rest did not. so was the thing with only oracle worked, and libdbi did not. i don't want to see that happen over here. there are rules and conditions on the environment (idoutils, especially ido2db and handling the data from the socket into the db), which need to be met in the first place. if they are met, an exception likewise for functionality and/or performance reasons can be raised.

see my notes in #1362. i suggest a common application layer and a database specific data layer. Within the data layer all database specific issues like exceptions and error codes should be handled there. A specific function can raise specific error and some of the errors we can expect in a particular use cae(like no data while delete or duplicate key when doing "upserts"). Then we should handle expectable errors within the code (sql or oci func.) and there is no need to disturb the user. Maybe a duplicate key is not wanted in another case, then we have to raise the error.

i don't see any benefit here, only that the solaris asprintf needs a review on the encoding. this might be a bug with higher priority too - did you check everything else too? there are quite a few asprintfs all around the code, also affecting other data, not only the tablenames themselves.

the problem with the extra character on table name i have on a linux box. and is not related to this case.I know better than me howto handle strings in C

with "execute immediate" a dba wont be able to improve anything concerning the conditions you mentioned above

my dba's told me that execute immediate would be the best option for that conditions (explained above why) and my recherche encouraged me to do that in such away. what you can improve now - you can fix the no_data exception straight away. you don't need to compile or upgrade anything. that's mostly the point, why putting that onto the rdbms itsself.

Yes, it is the only way to get it work - in sql. As i stated before, i can live with it only pointing out, there is no special benefit (saving oracle resources) over the "mysql" version

the conninfo table gets the checkin inserted, differing in the name, if that would help in any way. that agent name is also kept within the idi struct.
the threading is on of my next todos, so i'd rather not touch that now if i were you.

This is only a suggestion to make the oracle DBAs on this world happy. They will only see entries in v$session, not these in conninfo

no. i normally grep them. and no_data can be dangerous. so in order to allow each sysadmin to set that like he wants i'll go one step further, and propose a comma seperated cfg option for error codes to be ignored by oci.

see my notes before about expectable errors and real errors

#8 Updated by Tommi about 3 years ago

  • Category set to RDBMS
  • Status changed from New to Feedback
  • Assignee set to Tommi
  • % Done changed from 0 to 100

ignore (only NO_DATA_FOUND) exception in functions
Commit bd2d05b1f61c3f7f20e50ac9e5e20dde391509a8

#9 Updated by Tommi almost 3 years ago

  • Status changed from Feedback to Resolved
  • Target version set to 1.4

included in V1.4 upgrade scripts

Also available in: Atom PDF