libtblog.pm.in 43 KB
Newer Older
Kevin Atkinson's avatar
Kevin Atkinson committed
1 2 3 4
#!/usr/bin/perl -w

#
# EMULAB-COPYRIGHT
5
# Copyright (c) 2005, 2006 University of Utah and the Flux Group.
Kevin Atkinson's avatar
Kevin Atkinson committed
6 7 8
# All rights reserved.
#

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
=head1 NAME

libtblog - Logging library for testbed

=head1 BASIC USAGE

See the REFERENCE section for detailed descriptions of any of the
functions mentioned here.

=head2 Quick Start

Every perl script that can possible report a testbed related error
should use libtblog.  This should be done in the same place that
the other testbed libraries are included:

    use lib "@prefix@/lib";
    ...
    use libtblog;
    ...

This will activate the logging subsystem.  All output to STDOUT
and STDERR will, by default, be captured and turned into log
messages, in addition to being printed.  Handlers are also installed
for die/warn.  To turn this feature off use tblog_stop_capture(...).

If the script may be used at a top level than tblog_set_info(...)
should be used.

Than all output should use one of the tb* functions calls as follows:

    tberror("message")   # to report an error
    tbwarn("message")    # to report a warning
    tbnotice("message")  # to report important information that not
                         # really a warning.
    tbinfo("message")    # for normal log messages

    tbdie("message")     # to report an error than die using perl "die"

The tb* log functions (except for tbinfo) will automatically prefix
the message with "***" and other appropriate formatting so don't
include any.  However, multiline messages are okay.

Kevin Atkinson's avatar
Kevin Atkinson committed
51
Normal perl C<die> can and should still be used when reporting an
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
internal error such as calling a function incorrectly.  The message
will still be logged but it will be flagged as an internal error.

For basic usage that it.

=head2 Associating a Cause With An Error

One of the goals of this system is to be able to determine the cause
of an error.  The current causes are:

    temp - resources temp unavailable
    user - user error
    internal - software bug, should not happen
    software - software problem, like a bad image
    hardware - hardware problem
67
    canceled - canceled
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
    unknown - default

The cause for an error can be set directly when calling tberror by passing
in the "cause" parameter.  For example:

    tberror({cause => 'temp'}, "message")

Or, a default cause can be set for a script using
"tblog_set_default_cause", for example:

    tblog_set_default_cause ('temp')

The cause is only relevant for errors.  For that reason
tblog_set_default_cause will only set the default cause for errors.

When using the normal perl C<die> or C<warn> the cause is always set
to "internal", since they may come from perl itself.

Kevin Atkinson's avatar
Kevin Atkinson committed
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103
=head2 Primary vs Secondary Error

A primary error is an error which directly identifies the cause of the
problem such as "not enough nodes free".  A secondary error is the
result of a primary error such as "assign failed".  Tblog uses this
information to help determine when it found the root cause.  By
default an error is neither primary or secondary.  To flag an error
as either set the "type" parameter to "primary" or "secondary"
respectfully.  For example:

  tberror({type => 'primary'}, "message");
  rberror({type => 'secondary'}, "message");

Note, multiple parameters can be specified, so that the "type"
parameter can be combined with the "cause" parameter.  For example:

  tberror({cause => 'temp', type => 'primary'}, "message");

104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219
=head2 Finding the Cause of the Error

Use tblog_find_error.

=head1 ADVANCE USAGE

=head2 Understanding How tblog_find_error Works

When something goes wrong tblog_find_error will reconstruct a trace of
what script called what.  Of all the scripts which reported errors, it
will then use the last one at the deepest level.

For example consider the following:

    e (top)
       * (a)
       e (b)
          e (1)
       * (c)
       e (d)
          E (2)
       * (e)

In this example "top" called script "a", "b", "c", "d", "e".  When it
called "b" something went wrong but it continued anyway, it than
called "d" which failed.  However script "d" also called "2" which
also failed.  Thus libtblog will report the errors coming from "2",
the fact that "e" was called after "d" is irrelevant, since it didn't
report any errors.

However, this strategy doesn't always work.  The rest of this section
will deal with the cases when it doesn't and how to help it.

=head2 Extra, and Summary Log Messages

In the case when an error message provides I<extra> information, that
the user should know about, on what just happened with a script at a
lower level, the "type" parameter should be used to set the message
type to "extra":

    tberror {type => 'extra'}, ...

A good example of using the "extra" type to good effect is in
assign_wrapper:

    fatal({type=>'extra', cause=>'user'},
          "Experiment can not be run on an empty testbed. ".
          "Please fix the experiment.");

Where C<fatal> here just calls tberror and then exists.

In the case when the error message provides a useful I<summary> of the
errors at the lower level, and should replace them, the message type
should be set to "summary":

    tberror {type => 'summary'}, ...

=head2 Using Sublevels

In some cases a script does more than one abstract thing such that
when one thing fails it will report an error than latter on report
that X failed.  Since both errors are coming from the same script
both will be reported, but in reality only the error
from the first thing should be reported.  To correct this situation
the "sublevel" parameter should be used.  This parameter tweaks the
call trace so that some errors are at a deeper level than others, even
though they are coming form the same script.  The default sublevel is
0.  Negative sublevels can be used.

For example, consider the following code:

    $res = do_something()

    tbdie "do_something failed!" unless $res;

    sub do_something() {
        ...
        tberror "foo not responding!";
        return 0;
       ...
    }

Without using sublevel the reported errors will be:

    foo not responding!
    do_something failed!

However, the second error is not needed.  To fix this, the sublevel
parameter should be added to one of the tb* calls.  This can either be
accomplished by setting the sublevel to 1 on the "foo not
responding!" error, or -1 on the "do_something failed!" error.  Since
do_something may have many errors it is often easier to set the
sublevel on the "do_something failed!" error:

    ...
    tbdie {sublevel => -1}, do_something failed!" unless $res;
    ...

=head2 Sub Processes

In some cases, such as when forking children to run in parallel,
simply using the sublevel parameter is not enough.  In this case
a new sub-process should be started with tblog_sub_process.

=head1 OTHER SCRIPTS

Currently libtblog is only usable from perl scripts running on boss.
To get errors coming form non-perl programs, or scripts running on ops,
into the system the output should be parsed for errors.  For examples
of this see F<assign_wrapper2> and F<parse-ns>.

=head1 REFERENCE

=over

=cut
Kevin Atkinson's avatar
Kevin Atkinson committed
220 221 222 223 224 225

package libtblog;
use Exporter;

@ISA = "Exporter";
@EXPORT = qw (tblog tberror tberr tbwarn tbwarning tbnotice tbinfo tbdebug 
226
	      tbdie tbreport tblog_set_info tblog_set_default_cause
227
	      tblog_sub_process tblog_find_error tblog_email_error
228
	      tblog_start_capture tblog_stop_capture
229 230
	      tblog_new_process tblog_new_child_process
	      tblog_init_process tblog_exit
231
	      tblog_session tblog_lookup_error tblog_format_error
232 233
	      tblog_set_attempt tblog_inc_attempt tblog_get_attempt
	      tblog_set_cleanup tblog_get_cleanup
234
	      copy_hash
Kevin Atkinson's avatar
Kevin Atkinson committed
235
	      TBLOG_EMERG TBLOG_ALERT TBLOG_CRIT TBLOG_ERR 
236 237 238
	      TBLOG_WARNING TBLOG_NOTICE TBLOG_INFO TBLOG_DEBUG
	      SEV_DEBUG SEV_NOTICE SEV_WARNING SEV_SECONDARY
	      SEV_ERROR SEV_ADDITIONAL SEV_IMMEDIATE);
239
@EXPORT_OK = qw (dblog *SOUT *SERR);
Kevin Atkinson's avatar
Kevin Atkinson committed
240 241

# After package decl.
242
# DO NOT USE "use English" in this module
Kevin Atkinson's avatar
Kevin Atkinson committed
243 244 245
use POSIX qw(isatty setsid);
use File::Basename;
use IO::Handle;
246
use Text::Wrap;
247
use Text::Tabs;
248
use Carp;
Kevin Atkinson's avatar
Kevin Atkinson committed
249 250 251

use strict;

Kevin Atkinson's avatar
Kevin Atkinson committed
252 253
#use Data::Dumper;

Kevin Atkinson's avatar
Kevin Atkinson committed
254 255 256 257 258
#
# Testbed Support libraries
#
use lib "@prefix@/lib";
use libtestbed;
259 260 261
use libdb qw(NewTBDBHandle DBQueryN DBQueryWarnN DBQueryFatalN
	     DBQuoteSpecial $DBQUERY_MAXTRIES DBWarn DBFatal);
use libtblog_simple;
Kevin Atkinson's avatar
Kevin Atkinson committed
262

263 264
my $REAL_SCRIPTNAME = $SCRIPTNAME;
undef $SCRIPTNAME; # signal to use $ENV{TBLOG_SCRIPTNAME}
Kevin Atkinson's avatar
Kevin Atkinson committed
265

266
my $DB;
267
my $TBLOG_PID;
Kevin Atkinson's avatar
Kevin Atkinson committed
268

269
my $REVISION_STR = '$Revision: 2.24 $';
Kevin Atkinson's avatar
Kevin Atkinson committed
270

Kevin Atkinson's avatar
Kevin Atkinson committed
271 272 273 274 275 276
#
# Internal Utility Functions
#

sub check_env_def ( $ )
{
277
  croak "Environment variable \"$_[0]\" not defined" unless defined $_[0];
Kevin Atkinson's avatar
Kevin Atkinson committed
278 279 280 281 282
}

sub check_env_num ( $ )
{
    check_env_def $_[0];
283
    croak "Environment variable \"$_[0]\" not a positive integer" 
Kevin Atkinson's avatar
Kevin Atkinson committed
284 285 286 287 288 289 290
	unless $ENV{$_[0]} =~ /^[0-9]+$/;
}

sub check_env ()
{
    check_env_num 'TBLOG_LEVEL';
    check_env_num 'TBLOG_SESSION';
291
    check_env_num 'TBLOG_EXPTIDX';
Kevin Atkinson's avatar
Kevin Atkinson committed
292
    check_env_num 'TBLOG_INVOCATION';
293
    check_env_num 'TBLOG_PARENT_INVOCATION';
Kevin Atkinson's avatar
Kevin Atkinson committed
294
    check_env_num 'TBLOG_UID';
295 296
    check_env_num 'TBLOG_ATTEMPT';
    check_env_num 'TBLOG_CLEANUP';
297 298 299
    check_env_num 'TBLOG_SCRIPTNUM';
    check_env_def 'TBLOG_SCRIPTNAME';
    check_env_def 'TBLOG_BASE_SCRIPTNAME';
Kevin Atkinson's avatar
Kevin Atkinson committed
300 301
}

302 303 304
my %CAUSE_MAP = (# Don't notify testbed-ops
		 temp => 'temp', # resources temp unavailable
		 user => 'user', # user error
305
		 canceled => 'canceled', # canceled
306
		 # Notify testbed-ops
307
		 internal => 'internal', # software bug, should not happen
308 309 310 311 312 313 314 315
		 software => 'software', # software problem, like a bad image
		 hardware => 'hardware', # hardware problem
		 unknown => '');

sub normalize_cause ( $ ) {
    my $cause = $CAUSE_MAP{$_[0]};
    croak "Unknown cause \"$cause\"" unless defined $cause;
    return $cause;
Kevin Atkinson's avatar
Kevin Atkinson committed
316 317
}

318 319 320 321 322 323 324 325 326 327 328 329
sub indent ( $$ ) {
    my ($text, $prefix) = @_;
    $text =~ s/\n$//;
    my $res = '';
    foreach (split /\n/, $text) {
	$res .= $prefix;
	$res .= $_;
	$res .= "\n";
    }
    return $res;
}

330 331 332 333 334 335 336 337 338
sub add_prefix ( $$ ) {
    my ($prefix, $mesg) = @_;
    if ($mesg =~ /\n./) {
	return "$prefix:\n$mesg";
    } else {
	return "$prefix: $mesg";
    }
}

339 340 341 342 343 344 345 346 347 348 349
#
#
#


#
# Standard DBQuery functions from dblog but use private database handle
#
sub DBQuery ( $ )      {return DBQueryN($DB, $_[0]);}
sub DBQueryFatal ( $ ) {return DBQueryFatalN($DB, $_[0]);}
sub DBQueryWarn ( $ )  {return DBQueryWarnN($DB, $_[0]);}
Kevin Atkinson's avatar
Kevin Atkinson committed
350 351 352 353 354 355 356

#
# Like DBQueryFatal but also fail if the query didn't return any results
#
sub DBQuerySingleFatal ( $ )
{
    my ($query) = @_;
357
    my $query_result = DBQueryFatalN($DB, $query);
Kevin Atkinson's avatar
Kevin Atkinson committed
358 359 360 361 362 363
    DBFatal("DB Query \"$query\" didn't return any results") 
	unless $query_result->numrows > 0;
    my @row = $query_result->fetchrow_array();
    return $row[0];
}

364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384
#
# Convert the script name to a number
#
sub script_name_to_num( $ ) {
    my ($scriptname) = @_;
    my $scriptnum;

    my $query_result = DBQueryFatal
	sprintf("select script from scripts where script_name=%s",
		DBQuoteSpecial $scriptname);
    if ($query_result->num_rows > 0) {
	$scriptnum = ($query_result->fetchrow_array())[0];
    } else {
	DBQueryFatal 
	    sprintf("insert into scripts (script_name) values (%s)",
		    DBQuoteSpecial $scriptname);
	$scriptnum = DBQuerySingleFatal 'select LAST_INSERT_ID()';
    }
    return $scriptnum;
}

Kevin Atkinson's avatar
Kevin Atkinson committed
385 386 387 388
#
# Forward Decals
#

389
sub dblog ( $$@ );
Kevin Atkinson's avatar
Kevin Atkinson committed
390
sub tblog ( $@ );
391 392 393
sub tblog_new_process(@);
sub tblog_init_process(@);
sub informative_scriptname();
Kevin Atkinson's avatar
Kevin Atkinson committed
394

395 396 397

#
# tblog_init
Kevin Atkinson's avatar
Kevin Atkinson committed
398
#
399
# Called automatically when a script starts.
Kevin Atkinson's avatar
Kevin Atkinson committed
400
#
401 402
# Will: Get the priority mapping (string -> int) from the database and
# than call tblog_new_process
Kevin Atkinson's avatar
Kevin Atkinson committed
403 404
#
sub tblog_init() {
405 406 407 408 409 410 411 412 413
    # Connect to database

    $DB = NewTBDBHandle();

    # Reset default cause

    $ENV{TBLOG_CAUSE} = '';

    # ...
Kevin Atkinson's avatar
Kevin Atkinson committed
414

415 416 417 418 419 420
    tblog_new_process(if_defined($main::FAKE_SCRIPTNAME,
				 $REAL_SCRIPTNAME),
		      @ARGV);
};

#
421 422 423
# tblog_new_process CMD, ARGV
#
# Enter a new (possible fake) process, calls tblog_init_process
424 425 426 427 428 429 430 431 432 433
#
# If used to start a new fake process it is advised to make a local
# copy of %ENV using perl "local".  See tblog_sub_process for an
# explanation.
#
sub tblog_new_process(@) {
    delete $ENV{TBLOG_BASE_SCRIPTNAME};
    tblog_init_process(@_);
}

434 435 436 437 438 439 440 441 442
#
# tblog_new_child_process
#
# Enter a new child process, which is called after fork.
#
sub tblog_new_child_process() {
    tblog_init_process(undef);
}

443
#
444 445 446
# tblog_init_process CMD, ARGV
#
# Init a new process
447 448 449 450 451 452 453 454
#
# Will: (1) Get the unique ID for the script name,  (2) Creating an
# "entring" log message in the database, (3) get the session id and
# set up the environmental variables if they are not already set,
# (4) Get the invocation id, and (5) increment the level
#
# NOTE: Everything is currently stored in the %ENV hash.
#
455
sub tblog_init_process(@) {
456
    my ($script, @argv) = @_;
457
    local $DBQUERY_MAXTRIES = 3;
458

459 460 461
    # Set TBLOG_PID so that we can detect when we are a child. 
    $TBLOG_PID = $$;

462
    if (defined $script) {
463

464 465 466 467 468 469 470 471
	# Get script name
	
	$ENV{TBLOG_SCRIPTNAME} = $script;
	$ENV{TBLOG_BASE_SCRIPTNAME} = $script unless defined $ENV{TBLOG_BASE_SCRIPTNAME};
	
	# Get script number
	
	$ENV{TBLOG_SCRIPTNUM} = script_name_to_num($ENV{TBLOG_SCRIPTNAME});
472

473
	# Reset the child field
474

475
	delete $ENV{TBLOG_CHILD};
476

477 478 479 480 481 482 483 484 485
    } else {

	# We are a child process after a fork

	$ENV{TBLOG_CHILD} = 1;

	@argv = ('...');
    }
	
Kevin Atkinson's avatar
Kevin Atkinson committed
486 487 488 489
    # ...

    if (defined $ENV{'TBLOG_SESSION'}) {
	check_env();
490 491
	$ENV{TBLOG_LEVEL}++;
	$ENV{TBLOG_PARENT_INVOCATION} = $ENV{TBLOG_INVOCATION};
492 493 494
	my $id = dblog
	    ($NOTICE, {type => 'entering'},
	     'Entering "', join(' ', informative_scriptname(), @argv), '"')
495 496
	  or die;
	$ENV{TBLOG_INVOCATION} = $id;
Kevin Atkinson's avatar
Kevin Atkinson committed
497 498 499 500
	DBQueryFatal("update log set invocation=$id where seq=$id");
    } else {
	$ENV{TBLOG_SESSION} = 0;
	$ENV{TBLOG_INVOCATION} = 0;
501
	$ENV{TBLOG_PARENT_INVOCATION} = 0;
Kevin Atkinson's avatar
Kevin Atkinson committed
502
	$ENV{TBLOG_LEVEL} = 0;
503
	$ENV{TBLOG_EXPTIDX} = 0;
Kevin Atkinson's avatar
Kevin Atkinson committed
504
	$ENV{TBLOG_UID} = 0;
505 506
	$ENV{TBLOG_ATTEMPT} = 0;
	$ENV{TBLOG_CLEANUP} = 0;
507 508 509
	my $id = dblog
	    ($NOTICE, {type => 'entering'},
	     'Entering "', join(' ', informative_scriptname(), @ARGV), '"')
510
	  or die;
Kevin Atkinson's avatar
Kevin Atkinson committed
511 512 513 514 515
	# set SESSION in database
	$ENV{TBLOG_SESSION} = $id;
	$ENV{TBLOG_INVOCATION} = $id;
	DBQueryFatal("update log set session=$id,invocation=$id where seq=$id");
    }
516 517 518

}

519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537

=item tblog_sub_process NAME, ARGV

Began a sub process.  It is advised to make a local copy of %ENV using
perl "local".  This can be done with:

    local %ENV = %ENV

however due to a bug in perl 5.005_03 if "-T" is used than the above will
taint the path, instead use:

    my $old_env = \%ENV
    local %ENV;
    copy_hash %ENV, $old_env

See perlsub(1) for more info on "local"

=cut

538 539 540 541 542 543
sub tblog_sub_process($@) {
    my $name = shift;
    tblog_init_process("$ENV{TBLOG_BASE_SCRIPTNAME}-$name",
		       @_);
}

544 545 546 547 548 549 550

=item copy_hash %DEST, \%SRC

Utility function, see tblog_sub_process

=cut

551 552 553 554 555
sub copy_hash(\%$) {
    my ($new, $old) = @_;
    foreach (keys %$old) {
	$new->{$_} = $old->{$_};
    }
Kevin Atkinson's avatar
Kevin Atkinson committed
556 557
}

558 559 560 561 562 563 564
=item tblog_set_info PID, EID, UID

Sets info in the database which can't be derived automatically with
init.  Needs to be called at least once during a session.

=cut

565
sub tblog_set_info ( $$$ )
Kevin Atkinson's avatar
Kevin Atkinson committed
566 567 568
{
    check_env();
    my ($pid, $eid, $uid) = @_;
569
    local $DBQUERY_MAXTRIES = 3;
570
    $ENV{'TBLOG_EXPTIDX'} = 
Kevin Atkinson's avatar
Kevin Atkinson committed
571 572 573
	DBQuerySingleFatal("select idx from experiments where pid='$pid' and eid='$eid'");
    $ENV{'TBLOG_UID'} = $uid;
    DBQueryFatal
574
	sprintf('replace into session_info (session, exptidx, uid) values(%d,%d,%d)',
575
		$ENV{TBLOG_SESSION}, $ENV{TBLOG_EXPTIDX}, $ENV{TBLOG_UID});
Kevin Atkinson's avatar
Kevin Atkinson committed
576 577
}

578 579 580 581 582 583
=item tblog_set_default_cause CAUSE

Set the default cause.

=cut

584 585 586 587 588 589
sub tblog_set_default_cause ( $ )
{
    check_env();
    $ENV{TBLOG_CAUSE} = $_[0];
}

590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649
=item tblog_set_attempt NUM

Set the attempt number to NUM

=cut

sub tblog_set_attempt ( $ )
{
    check_env();
    $ENV{TBLOG_ATTEMPT} = $_[0];
}

=item tblog_inc_attempt

Increment the attempt number

=cut

sub tblog_inc_attempt ()
{
    check_env();
    $ENV{TBLOG_ATTEMPT}++;
}

=item tblog_get_attempt

Get the attempt number

=cut

sub tblog_get_attempt ()
{
    check_env();
    return $ENV{TBLOG_ATTEMPT};
}

=item tblog_set_cleanup BOOL

Set the cleanup bit to BOOL.

=cut

sub tblog_set_cleanup ( $ )
{
    check_env();
    $ENV{TBLOG_CLEANUP} = $_[0];
}

=item tblog_get_cleanup

Get the value of the cleanup bit.

=cut

sub tblog_get_cleanup ()
{
    check_env();
    return $ENV{TBLOG_CLEANUP};
}

650 651 652 653 654 655
=item tblog_exit

Exits a script or sub-process.  Generally called automatically when a
script exists but may be called explistly when ending a sub-process.

=cut
656

Kevin Atkinson's avatar
Kevin Atkinson committed
657 658 659
sub tblog_exit() {
    return unless defined $ENV{'TBLOG_SESSION'};
    check_env();
660 661
    dblog($INFO, {type=>'exiting'}, "Leaving \"", informative_scriptname(), 
	  " ...\"");
Kevin Atkinson's avatar
Kevin Atkinson committed
662 663 664
}

#
665 666 667
# informative_scriptname()
#
sub informative_scriptname() {
668
    my $name;
669
    if ($ENV{TBLOG_BASE_SCRIPTNAME} eq $REAL_SCRIPTNAME) {
670
	$name = $ENV{TBLOG_SCRIPTNAME};
671
    } else {
672 673 674 675
	$name = "$ENV{TBLOG_SCRIPTNAME} (but really $REAL_SCRIPTNAME)";
    }
    if ($ENV{TBLOG_CHILD}) {
	$name = "child of $name";
676
    }
677
    return $name;
678 679 680
}


681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721
=item tblog PRIORITY, MESG, ...

=item tblog PRIORITY, {PARM=>VALUE,...}, MESG, ...

The main log function.  Logs a message to the database and print
the message to STDERR with an approate prefix depending on the
severity of the error.  If more than one string is given for the
message than they will concatenated.  If the env. var. TBLOG_OFF
is set to a true value than nothing will be written to the
database, but the message will still be written to STDERR.

Useful parms: sublevel, cause, type

=item tberror [{PARM=>VALUE,...},] MESG, ...

=item tberr [{PARM=>VALUE,...},] MESG, ...

=item tbwarn [{PARM=>VALUE,...},] MESG, ...

=item tbwarning [{PARM=>VALUE,...},] MESG, ...

=item tbnotice [{PARM=>VALUE,...},] MESG, ...

=item tbinfo [{PARM=>VALUE,...},] MESG, ...

=item tbdebug [{PARM=>VALUE,...},] MESG, ...

Usefull alias functions.  Will call tblog with the appropriate priority.

=item tbdie [{PARM=>VALUE,...},] MESG, ...

Log the message to the database as an error and than die.  An
optional set of paramaters may be specified as the first argument.
not exactly like die as the message must be specified.

=cut

#
# NOTE: tblog (and friends) defined in libtblog_simple
#

722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747
#
# new_seq_num
#
my $sent_cur_log_seq_error_mail = 0;
sub new_seq_num (;$) {
    my ($failure_action) = @_;
    $failure_action = sub {DBFatal("DB Query failed")} unless defined $failure_action;
    my $result;
    $result = DBQuery("UPDATE emulab_indicies SET idx=LAST_INSERT_ID(idx+1) WHERE name = 'cur_log_seq'");
    if (!$result) {
	&$failure_action; 
	return;
    } elsif ($result->affectedrows <= 0) {
	my $subject = "Row \"cur_log_seq\" does not exist in emulab_indicies.";
	my $message = "$subject\n";
	$message .= "Please see \"database-migrate.txt\".";
	SENDMAIL(TB_OPSEMAIL, "DBError - $subject", $message)
	    unless $sent_cur_log_seq_error_mail;
	$sent_cur_log_seq_error_mail = 1;
	die "$subject\n";
	return;
    }
    my $seq = $result->insertid();
    return $seq;
}

748 749
#
# dblog(priority, {parm=>value,...}, mesg, ...)
Kevin Atkinson's avatar
Kevin Atkinson committed
750 751
#   Internal function.  Logs a message to the database.  Doesn't print
#   anything. Will not die, instead return 0 on error, with the error
752
#   message in $@.  Otherwise will return the seq number.
753
#   Valid parms: sublevel, cause, type
Kevin Atkinson's avatar
Kevin Atkinson committed
754
#
755 756
use vars '$in_dblog';
$in_dblog = 0; # Used to avoid an infinite recursion when
Kevin Atkinson's avatar
Kevin Atkinson committed
757 758 759
	       # DBQueryFatal fails as a log entry is made to
	       # record the failure, which will than likely cause
	       # another failure and so on
760
sub dblog_real ( $$@ ) {
761
    my ($priority, $parms, @mesg) = @_;
Kevin Atkinson's avatar
Kevin Atkinson committed
762
    my $mesg = join('',@mesg);
763
    my $seq;
764
    #print SERR "===$priority $parms @mesg\n";
Kevin Atkinson's avatar
Kevin Atkinson committed
765 766 767 768
    return if $ENV{TBLOG_OFF} || $in_dblog;
    $in_dblog = 1;
    eval {
	check_env();
769

Kevin Atkinson's avatar
Kevin Atkinson committed
770 771 772 773 774 775 776 777 778
	if ($TBLOG_PID != $$) {
	    # Do it here to avoid any possiblty of infinite recursion 
	    # since we are about to set $in_dblog to 0. ...
	    $TBLOG_PID = $$;
	    # $in_dblog is reset locally since tblog_new_child_process 
	    # will call dblog in order to log entering the child process
	    local $in_dblog = 0;
	    tblog_new_child_process();
	}
779

780 781 782 783 784 785
	my $cause;
	$cause = normalize_cause($parms->{cause}) 
	  if defined $parms->{cause};
	$cause = $priority <= $WARNING ? $ENV{TBLOG_CAUSE} : '' 
	  unless defined $cause;

786 787
	my $failure_action;
	local $DBQUERY_MAXTRIES;
788
	if ($priority <= $NOTICE) {
789 790 791 792
	    $DBQUERY_MAXTRIES = 3;
	    $failure_action = sub {
		DBFatal("Could not log entry to DB, tblog_find_error may report incorrect results");
	    };
793
	} else {
794 795
	    $DBQUERY_MAXTRIES = 1;
	    $failure_action = sub {
Kevin Atkinson's avatar
Kevin Atkinson committed
796
		DBFatal("Could not log entry to DB", 1);
797
	    };
Kevin Atkinson's avatar
Kevin Atkinson committed
798
	}
799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818
	
	$seq = new_seq_num($failure_action);
	my $result = DBQuery
	    (sprintf('insert into log (seq,stamp,session,attempt,cleanup,parent,invocation,script,level,sublevel,priority,inferred,cause,type,mesg) '. 
		     'VALUES (%d,UNIX_TIMESTAMP(now()),%d,%d,%d,%d,%d,%d,%d,%d,%d,%d,%s,%s,%s)',
		     $seq,
		     $ENV{TBLOG_SESSION}, 
		     $ENV{TBLOG_ATTEMPT},
		     $ENV{TBLOG_CLEANUP},
		     $ENV{TBLOG_PARENT_INVOCATION}, 
		     $ENV{TBLOG_INVOCATION},
		     $ENV{TBLOG_SCRIPTNUM}, 
		     $ENV{TBLOG_LEVEL},
		     if_defined($parms->{sublevel}, 0),
		     $priority,
		     if_defined($parms->{inferred}, 0),
		     DBQuoteSpecial $cause,
		     DBQuoteSpecial if_defined($parms->{type}, 'normal'),
		     DBQuoteSpecial $mesg));
	&$failure_action unless $result;
Kevin Atkinson's avatar
Kevin Atkinson committed
819 820
    };
    $in_dblog = 0;
821
    # Print a warning on failure but don't log the results to the database
822
    # as that is likely to fail also
823 824
    print SERR format_message(scriptname(), $WARNING, 
			      add_prefix("dblog failed", $@)) if $@;
Kevin Atkinson's avatar
Kevin Atkinson committed
825
    return 0 if $@;
826 827 828 829 830 831
    if ($parms->{error}) {
	tbreport($parms->{severity},
		 {seq => $seq, script => $parms->{real_script}},
		 @{$parms->{error}});
    }
    return $seq;
Kevin Atkinson's avatar
Kevin Atkinson committed
832
}
833 834 835
{
  local $^W = 0;
  *dblog = \&dblog_real;
Kevin Atkinson's avatar
Kevin Atkinson committed
836 837
}

838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925
use constant CONTEXT_MAP => {
    'assign_type_precheck' =>
	['report_context', 'vc0',      'i0',  'i1', 'i2', 'vc1'],
	#                  vtype, requested, slots,  max, round
    'assign_mapping_precheck' =>
	['report_context', 'vc0', 'vc1', 'vc2',      'i0',  'i1'],
	#                  vnode, class,  type, requested, count
    'assign_fixed_node' =>
	['report_context', 'vc0', 'vc1', 'vc2'],
	#                  class, vnode, pnode
    'over_disk_quota' =>
	['report_context',   'vc0'],
	#                  control
    'update_aborted' =>
	['report_context',  'vc0'],
	#                  result
    'set_experiment_state_failed' =>
	['report_context', 'vc0'],
	#                  state
    'archive_op_failed' =>
	['report_context',     'vc0', 'vc1', 'vc2'],
	#                  operation,  type,   dir
    'modify_firewall_not_allowed' =>
	['report_context',     'vc0', 'vc1'],
	#                  operation, state
    'os_node_reset_failed' =>
	['report_context', 'vc0'],
	#                   type
    'assign_wrapper_failed' =>
	['report_context',   'i0'],
	#                  status
    'invalidate_bootblock_failed' =>
	['report_context', 'vc0'],
	#                   node
    'run_command_failed' =>
	['report_context',   'vc0', 'vc1'],
	#                  command,  node
    'node_lacks_linkdelay_support' =>
	['report_context', 'vc0', 'vc1',  'vc2'],
	#                   node,   lan, osname
    'invalid_os' =>
	['report_context', 'vc0',  'vc1', 'vc2'],
	#                   type, osname, pname
    'copy_ns_file_failed' =>
	['report_context', 'vc0', 'vc1'],
	#                    src,  dest
    'bad_data' =>
	['report_context', 'vc0', 'vc1'],
	#                  field,  data
    'bogus_ns_file' =>
	['report_context', 'vc0'],
	#                   path
    'disallowed_directory' =>
	['report_context', 'vc0'],
	#                   path
    'node_boot_failed' =>
	['report_context', 'vc0', 'vc1', 'vc2'],
	#                   node,  type,  osid
    'node_load_failed' =>
	['report_context', 'vc0', 'vc1', 'vc2'],
	#                   node,  type,  osid
    'file_not_found' =>
	['report_context', 'vc0', 'vc1', 'vc2'],
	#                   type,  path,  node
    'invalid_variable' =>
	['report_context', 'vc0', 'vc1'],
	#                   type,   var
    'create_vlan_failed' =>
	['report_context', 'vc0'],
	#                   vlan
    'get_port_status_failed' =>
	['report_context', 'vc0'],
	#                   port
    'device_not_in_stack' =>
	['report_context',  'vc0'],
	#                  device
    'invalid_switch_stack' =>
	['report_context', 'vc0'],
	#                  stack

    'assign_violation' => [
	'report_assign_violation', 
	'unassigned', 'pnode_load', 'no_connect', 'link_users',
	'bandwidth', 'desires', 'vclass', 'delay',
	'trivial_mix', 'subnodes', 'max_types', 'endpoints',
    ],
};

926 927 928 929 930 931
sub tbreport( $@ ) {
    my ($severity) = shift;
    my $parms = {};
    $parms = shift if ref $_[0] eq 'HASH';
    my ($error_type, @context) = @_;
    my $seq = $parms->{seq};
932

933 934 935
    eval {
	local $DBQUERY_MAXTRIES = 3;
	check_env();
936

937 938 939 940 941 942
	my $script_num = $ENV{TBLOG_SCRIPTNUM};
	$script_num = script_name_to_num($parms->{script})
	    if defined $parms->{script};

	$seq = new_seq_num() unless defined $seq;
	
943 944 945
	my $session    = $ENV{TBLOG_SESSION};
	my $invocation = $ENV{TBLOG_INVOCATION};
	my $attempt    = $ENV{TBLOG_ATTEMPT};
946 947 948

	croak("error_type must be _0-9A-Za-z") unless $error_type =~ /^\w+$/;

949
	my $sql = sprintf("insert into report_error (seq, stamp, session, invocation, attempt, severity, script, error_type) values(%d, UNIX_TIMESTAMP(now()), %d, %d, %d, %d, %d, %s)",
Kevin Atkinson's avatar
Kevin Atkinson committed
950
			  $seq, $session, $invocation, $attempt, $severity, $script_num, DBQuoteSpecial($error_type));
951

952 953 954 955 956 957 958 959 960 961 962 963 964 965
	DBQueryFatal($sql);

	if (@context > 0) {
	    croak("$error_type has no associated entry in CONTEXT_MAP")
		unless defined(CONTEXT_MAP->{$error_type}); 

	    my ($table, @columns) = @{CONTEXT_MAP->{$error_type}};

	    my $sql = sprintf("insert into %s (seq, " . join(', ', @columns) .
			      ") values(%d" . ", %s" x @columns . ")",
			      $table, $seq, map(DBQuoteSpecial($_), @context));

	    DBQueryFatal($sql);
	}
966 967 968 969 970 971 972 973 974 975 976
    };
    # Print a warning on failure but don't log the results to the database
    # as that is likely to fail also
    print SERR format_message(scriptname(), $WARNING, 
			      add_prefix("tbreport failed", $@)) if $@;
    return 0 if $@;
    return $seq;
}

=item tblog_find_error
    
977 978
Attempts to find the relevant error.

979 980
Will act in a way that is safe to use in an END block, that is (1)
never die, (2) don't modify the exit code. 
981

982 983
Will, also, print the results to and some additional info to STDERR
for diagnosis purposed.
984

985 986
The results will be stored the "errors" table and also returned in a
hash with the following..
987

988 989 990 991 992 993 994 995
    mesg: text of the error
    cause: 
    confidence:
    script:
    session: 
    exptidx:
    err: the text of any internal errors in tblog_find_error,
      otherwise undefined
996 997 998

To retrieve the results from the database:

999
    SELECT ... WHERE session = <Session ID> FROM errors
1000 1001 1002 1003 1004 1005 1006 1007

The relevant errors are also flagged using "relevant" flag:

    SELECT ... WHERE session = <Session ID>
                 AND relevant != 0 form log

=cut

Kevin Atkinson's avatar
Kevin Atkinson committed
1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033
sub tblog_determine_single_error ( ;$$ );
sub tblog_store_error ($);
sub tblog_dump_error ($);

sub tblog_find_error(;$)
{
    my ($options) = (@_);
    $options = {} unless defined $options;

    my $saved_exitcode = $?;
    my $res;

    eval {
       $res = tblog_determine_single_error();

       my @to_skip;
       my $r = $res;
       while ($r->{cause} eq 'canceled') {
	   push @to_skip, @{$r->{related}};
	   my $to_skip = 'seq NOT IN ('.join(',',@to_skip).')';
	   $r = tblog_determine_single_error(1, $to_skip);
       }
       $res->{other_error} = $r unless $r eq $res;
       my $other_error = $res->{other_error};

       tblog_store_error($res);
Kevin Atkinson's avatar
Kevin Atkinson committed
1034

Kevin Atkinson's avatar
Kevin Atkinson committed
1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055
       tblog_dump_error($res) unless $options->{quiet};
    };

    if ($@) {
       $res->{err} = $@;
       eval {SENDMAIL(TB_OPSEMAIL, "tblog_find_error failed",
		      "Experiment: $ENV{TBLOG_EXPTIDX}\n".
		      "User: $ENV{TBLOG_UID}\n".
		      "Session: $ENV{TBLOG_SESSION}\n".
		      "Script: $ENV{TBLOG_SCRIPTNAME}\n".
		      "\n".
		      "$res->{err}\n")};

       eval {tblog $WARNING, add_prefix("tblog_find_error failed", $res->{err})};
    }

    $? = $saved_exitcode;
    return $res;
}

sub tblog_determine_single_error ( ;$$ ) {
Kevin Atkinson's avatar
Kevin Atkinson committed
1056 1057 1058 1059 1060 1061 1062 1063

    my ($rank, $filter) = @_;
    $rank = 0   unless defined $rank;
    if (defined $filter && $filter) {
	$filter = "AND ($filter)";
    } else {
	$filter = '';
    }
Kevin Atkinson's avatar
Kevin Atkinson committed
1064
    
1065 1066 1067 1068
    my $session = 0;
    my $error = '';
    my $script = '';
    my $cause = '';
1069
    my $cause_desc;
1070
    my $confidence = 0.0;
1071

1072
    local $DBQUERY_MAXTRIES = 3;
Kevin Atkinson's avatar
Kevin Atkinson committed
1073

Kevin Atkinson's avatar
Kevin Atkinson committed
1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094
    check_env();
    $session = $ENV{TBLOG_SESSION};

    my ($tblog_revision) = $REVISION_STR =~ /Revision: ([^ \$]+)/ 
	or die "bad REVISON string";

    #
    # Build the Tree
    #
    # Tree data structure:
    # invocation = {invocation => INT, parent => INT, 
    #           [{seq => int ...} || {seq => int, child => invox}]}
    #    
    
    my $root = {invocation => 0, log => []};
    my %lookup = (0 => $root);
    my @log;
    
    my $query_result = DBQueryFatal "select seq,parent,invocation,sublevel,priority,mesg,cause,script_name,type,inferred from log natural join scripts where session = $session and priority <= $NOTICE and attempt <= 1 and not cleanup $filter order by seq";
    
    for (my $i = 0; $i < $query_result->num_rows; $i++) {
Kevin Atkinson's avatar
Kevin Atkinson committed
1095
	
Kevin Atkinson's avatar
Kevin Atkinson committed
1096 1097
	my ($seq, $parent, $invocation, $sublevel, $priority, $mesg, $cause, $script, $type, $inferred) 
	    = $query_result->fetchrow;
Kevin Atkinson's avatar
Kevin Atkinson committed
1098
	
Kevin Atkinson's avatar
Kevin Atkinson committed
1099
	if (not exists $lookup{$invocation}) {
Kevin Atkinson's avatar
Kevin Atkinson committed
1100
	    
Kevin Atkinson's avatar
Kevin Atkinson committed
1101 1102 1103 1104 1105 1106 1107 1108
	    my $p = $lookup{$parent};
	    die "Parent Doesn't Exists!" unless defined $p;
	    $lookup{$invocation} = {invocation => $invocation, 
				    parent => $parent, 
				    script=>$script, 
				    log => []};
	    push @{$p->{log}}, {seq => $invocation, 
				child => $lookup{$invocation}};
Kevin Atkinson's avatar
Kevin Atkinson committed
1109 1110
	    
	}
Kevin Atkinson's avatar
Kevin Atkinson committed
1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121
	
	push @{$lookup{$invocation}{log}}, {seq => $seq, 
					    invocation=>$invocation, 
					    sublevel=> $sublevel,
					    priority => $priority,
					    type => $type,
					    cause => $cause,
					    inferred => $inferred,
					    mesg => $mesg};
	
    }
1122

Kevin Atkinson's avatar
Kevin Atkinson committed
1123 1124
    my $handle_sublevels;
    $handle_sublevels = sub {
1125

Kevin Atkinson's avatar
Kevin Atkinson committed
1126
	my ($tree) = @_;
1127

Kevin Atkinson's avatar
Kevin Atkinson committed
1128 1129
	my $log = $tree->{log};
	return unless defined $log;
1130

Kevin Atkinson's avatar
Kevin Atkinson committed
1131 1132 1133 1134 1135 1136
	# normalize sublevels
	my $min_sublevel = 200;
	foreach (@$log) {
	    $min_sublevel = $_->{sublevel} 
	    if exists $_->{sublevel} && $_->{sublevel} < $min_sublevel;
	}
1137

Kevin Atkinson's avatar
Kevin Atkinson committed
1138
	for (my $i = 0; $i < @$log; $i++) {
1139

Kevin Atkinson's avatar
Kevin Atkinson committed
1140
	    local $_ = $log->[$i];
1141

Kevin Atkinson's avatar
Kevin Atkinson committed
1142
	    if (exists $_->{sublevel} && $_->{sublevel} > $min_sublevel) {
1143

Kevin Atkinson's avatar
Kevin Atkinson committed
1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156
		my @sublog = ($_);
		my $j = $i + 1;
		while ($j < @$log && 
		       exists $log->[$j]->{sublevel} &&
		       $log->[$j]->{sublevel} > $min_sublevel) {
		    push @sublog, $log->[$j];
		    $j++;
		}
		my $repl = {
		    child => {log => [@sublog]}
		};
		splice(@$log, $i, $j - $i, $repl);
		$handle_sublevels->($repl->{child})
1157 1158 1159 1160 1161 1162

		} elsif (exists $_->{child}) {

		    $handle_sublevels->($_->{child});

		}
Kevin Atkinson's avatar
Kevin Atkinson committed
1163
	}
1164

Kevin Atkinson's avatar
Kevin Atkinson committed
1165
    };
1166

Kevin Atkinson's avatar
Kevin Atkinson committed
1167
    $handle_sublevels->($root);
1168 1169


Kevin Atkinson's avatar
Kevin Atkinson committed
1170 1171 1172 1173 1174 1175 1176
    #
    # Now find the relevant errors.
    #
    
    my @related_errors;
    my $find_relevant;
    $find_relevant = sub {
1177

Kevin Atkinson's avatar
Kevin Atkinson committed
1178
	my ($tree) = @_;
1179

Kevin Atkinson's avatar
Kevin Atkinson committed
1180 1181
	my @posib_errs;
	my @extra_info;
1182

Kevin Atkinson's avatar
Kevin Atkinson committed
1183
	foreach (reverse @{$tree->{log}}) {
1184

Kevin Atkinson's avatar
Kevin Atkinson committed
1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195
	    if (exists $_->{child}) {
		
		my @errs = $find_relevant->($_->{child});
		if (@errs) {
		    return (reverse(@extra_info),@errs);
		} 
		
	    } elsif ($_->{priority} <= $ERR) {
		
		push @posib_errs, $_;
		push @related_errors, $_;
Kevin Atkinson's avatar
Kevin Atkinson committed
1196

Kevin Atkinson's avatar
Kevin Atkinson committed
1197 1198 1199 1200 1201 1202 1203 1204
		if ($_->{type} eq 'summary') {
		    last;
		} elsif ($_->{type} eq 'extra') {
		    push @extra_info, $_;
		}
		
	    }
	    
1205 1206
	}

Kevin Atkinson's avatar
Kevin Atkinson committed
1207 1208
	return reverse @posib_errs;
    };
Kevin Atkinson's avatar
Kevin Atkinson committed
1209

Kevin Atkinson's avatar
Kevin Atkinson committed
1210 1211 1212 1213 1214 1215 1216 1217
    my @relevant = $find_relevant->($root);
    #
    # Get the most relevent script
    #
    $script = '';
    if (@relevant) {
	$script = $lookup{$relevant[0]->{invocation}}->{script};
    }
Kevin Atkinson's avatar
Kevin Atkinson committed
1218

Kevin Atkinson's avatar
Kevin Atkinson committed
1219 1220 1221 1222 1223
    #
    # Figure out the cause;
    #
    my $type = '';
    my $inferred = -1;
Kevin Atkinson's avatar
Kevin Atkinson committed
1224

Kevin Atkinson's avatar
Kevin Atkinson committed
1225 1226 1227 1228 1229 1230 1231 1232 1233
    # Assumes that 'extra' info will come first
    
    foreach (@relevant) {
	if (!$cause && $_->{cause}) {
	    $cause = $_->{cause};
	    $type = $_->{type};
	} elsif ($_->{cause} && $_->{type} ne 'extra' &&
		 $_->{type} eq $type && $_->{cause} ne $cause) {
	    $cause = 'unknown';
Kevin Atkinson's avatar
Kevin Atkinson committed
1234
	}
Kevin Atkinson's avatar
Kevin Atkinson committed
1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247
	$inferred = $_->{inferred} if $inferred < $_->{inferred};
    }
    $cause = 'unknown' unless $cause;

    #
    # Determine need_more_info from error type
    #
    my $need_more_info = -1;
    foreach (@relevant) {
	if ($_->{type} eq 'summary' || $_->{type} eq 'primary') {
	    $need_more_info = 0;
	} elsif ($_->{type} eq 'secondary') {
	    $need_more_info = 1 if $need_more_info == -1;
Kevin Atkinson's avatar
Kevin Atkinson committed
1248
	}
Kevin Atkinson's avatar
Kevin Atkinson committed
1249
    }
1250

Kevin Atkinson's avatar
Kevin Atkinson committed
1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267
    #
    # From script determine confidence 
    #
    $confidence = 0.5;
    if ($script =~ /^assign/ && ($cause eq 'temp' || $cause eq 'user')) {
	$confidence = 0.9;
    } elsif ($script =~ /^parse/ && $cause eq 'user') {
	$confidence = 0.9;
    } elsif ($script =~ /^os_setup/ && $type eq 'summary') {
	$confidence = 0.9;
    } elsif ($inferred == 0) {
	$confidence = 0.7;
    } elsif ($inferred == 1) {
	$confidence = 0.6;
    } elsif ($cause ne 'unknown') {
	$confidence = 0.6;
    }
Kevin Atkinson's avatar
Kevin Atkinson committed
1268

Kevin Atkinson's avatar
Kevin Atkinson committed
1269

Kevin Atkinson's avatar
Kevin Atkinson committed
1270 1271 1272 1273 1274 1275 1276 1277 1278 1279
    # 
    # Finally print/store the relevant errors
    #
    
    local $Text::Wrap::columns = 72;
    
    my $prev;
    foreach (@relevant) {
	# avoid printing the exact same error twice
	next if (defined $prev && $prev->{mesg} eq $_->{mesg});
1280

Kevin Atkinson's avatar
Kevin Atkinson committed
1281
	$error .= "\n" if defined $prev;
Kevin Atkinson's avatar
Kevin Atkinson committed
1282

Kevin Atkinson's avatar
Kevin Atkinson committed
1283 1284 1285 1286 1287
	if ($_->{mesg} !~ /\n/) {
	    $error .= expand(wrap('','', "$_->{mesg}\n"));
	} else {
	    # if multiline don't reformat
	    $error .= "$_->{mesg}\n";
Kevin Atkinson's avatar
Kevin Atkinson committed
1288 1289
	}

Kevin Atkinson's avatar
Kevin Atkinson committed
1290 1291 1292 1293 1294 1295 1296 1297 1298
	$error .= "...\n" if $need_more_info == 1;

	$prev = $_;
    }
    if (length $error == 0) {
	if ($rank == 0) {
	    $error = "No clue as to what went wrong!\n";
	} else {
	    $error = "";
Kevin Atkinson's avatar
Kevin Atkinson committed
1299 1300
	}
    }
Kevin Atkinson's avatar
Kevin Atkinson committed
1301
    chop $error;
Kevin Atkinson's avatar
Kevin Atkinson committed
1302

Kevin Atkinson's avatar
Kevin Atkinson committed
1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314
    $cause_desc = DBQuerySingleFatal
	"select cause_desc from causes where cause = '$cause'";

    my $script_num =
      ($script ne ''
       ? (DBQuerySingleFatal
	  sprintf ("select script from scripts where script_name=%s",
		   DBQuoteSpecial $script))
       : 0);
    
    return {source=>'find_error', session=>$session, rank=>$rank,
	    exptidx=>$ENV{TBLOG_EXPTIDX},
1315
	    mesg=>$error, cause=>$cause, cause_desc=>$cause_desc,
Kevin Atkinson's avatar
Kevin Atkinson committed
1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375
	    confidence=>$confidence, inferred=>$inferred, 
	    need_more_info=>$need_more_info,
	    script=>$script, script_num=>$script_num,
	    relevant=>[map {$_->{seq}} @relevant],
	    related=>[map {$_->{seq}} @related_errors],
	    tblog_revision => $tblog_revision};
}

sub tblog_store_error ($)
{
    my ($d) = @_;

    croak "Data must be from tblog_find_error not tblog_lookup_error.\n" 
	unless $d->{source} eq 'find_error';

    DBQueryFatal
       sprintf("delete from errors where session = %d", $d->{session});

    DBQueryFatal
	sprintf("insert into errors ".
		"(stamp, exptidx, session, rank, cause, confidence, inferred, need_more_info, script, mesg, tblog_revision) ".
		"values ".
		"(UNIX_TIMESTAMP(now()), %d, %d, %d, %s, %f, %d, %s, %d, %s, %s)",
		$d->{exptidx},
		$d->{session}, $d->{rank}, DBQuoteSpecial($d->{cause}),
		$d->{confidence}, $d->{inferred}, 
		($d->{need_more_info} == -1 ? 'NULL' : $d->{need_more_info}),
		$d->{script_num},
		DBQuoteSpecial($d->{mesg}),
		DBQuoteSpecial($d->{tblog_revision}));
    
    DBQueryFatal
	sprintf("update log set relevant=1 where seq in (%s)",
		join(',', @{$d->{relevant}}))         if @{$d->{relevant}};
}

sub tblog_dump_error ($)
{
    my ($d) = @_;
    print SERR "**** Experimental information, please ignore ****\n";
    print SERR "Session ID = $d->{session}\n";
    print SERR "Likely Cause of the Problem:\n";
    print SERR indent($d->{mesg}, '  ');
    print SERR "Cause: $d->{cause}\n";
    print SERR "Confidence: $d->{confidence}\n";
    print SERR "Script: $d->{script}\n";
    if ($d->{cause} eq 'canceled') {
       my $other_error = $d->{other_error};
       print SERR "\n";
       if (length($other_error->{mesg}) == 0) {
	   print SERR "No other error found.\n";
       } else {
	   print SERR "Possible Error Before Cancellation:\n";
	   print SERR indent($other_error->{mesg}, '  ');
	   print SERR "Cause: $other_error->{cause}\n";
	   print SERR "Confidence: $other_error->{confidence}\n";
	   print SERR "Script: $other_error->{script}\n";
       }
    }
    print SERR "**** End experimental information ****\n";
1376 1377
}

1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397
=item tblog_lookup_error [SESSION]

Attempts to retrive the error for SESSION from the database.  Returns
undef if it could't find anything.

=cut

sub tblog_lookup_error ( ;$ ) {

    my ($session) = @_;
    $session = $ENV{TBLOG_SESSION} unless defined $session;

    local $DBQUERY_MAXTRIES = 3;

    my $saved_exitcode = $?;

    my $query_result = DBQueryFatal
	("select session, exptidx, mesg, e.cause, cause_desc, confidence, script_name as script".
	 "  from errors as e, scripts as s, causes as c".
	 "  where e.script = s.script and e.cause = c.cause ".
Kevin Atkinson's avatar
Kevin Atkinson committed
1398
	 "    and session = $session and rank = 0");
1399 1400 1401 1402 1403 1404 1405 1406 1407 1408

    $? = $saved_exitcode;

    if ($query_result->numrows > 0) {
	return $query_result->fetchrow_hashref;
    } else {
	return undef;
    }
}    

Kevin Atkinson's avatar
Kevin Atkinson committed
1409
=item tblog_email_error DATA, TO, WHAT, EIDPID, FROM, HEADERS, TBOBS_HEADERS, PREFIX, @FILES
1410 1411 1412 1413 1414 1415 1416 1417 1418

Email the user and possible testbed-ops the error.

DATA is the object returned form tblog_find_error.  It is OK if it is undefined.

WHAT is something like "Swap In Failure", "Swap Out Failure", etc.

=cut

Kevin Atkinson's avatar
Kevin Atkinson committed
1419
sub tblog_email_error ( $$$$$$$$@ ) {
Kevin Atkinson's avatar
Kevin Atkinson committed
1420 1421 1422 1423 1424

    my ($d, $to, $what, $pideid, $from, $headers, $tbops_headers, $prefix, @files) = @_;

    carp "TBOBS_HEADERS must be a complete header!  Expect SENDMAIL to fail."  
	unless $tbops_headers =~ /:/;
1425

1426
    my $threshold = 0.55;
1427 1428

    unless ($d->{confidence} > $threshold 
1429 1430
	    && ($d->{cause} eq 'temp' || $d->{cause} eq 'user' 
		|| $d->{cause} eq 'canceled'))
1431 1432
    {
	$headers .= "\n" if defined $headers && length($headers) > 0;
Kevin Atkinson's avatar
Kevin Atkinson committed
1433
	$headers .= "$tbops_headers";
1434
	$headers .= "\nX-NetBed-Cc: testbed-ops";
1435
    } else {
1436
	$from = "Testbed Ops <@TBOPSEMAIL@>";
1437
	$headers .= "\n" if defined $headers && length($headers) > 0;
1438 1439
	$headers .= "Bcc: @TBERRORSEMAIL@";
	$headers .= "\nX-NetBed-Cc: testbed-errors";
1440 1441
    }

Kevin Atkinson's avatar
Kevin Atkinson committed
1442 1443
    my $which = $pideid ? ": $pideid" : "";
    my $subject = "$what$which";
1444 1445

    if ($d->{confidence} > $threshold && $d->{cause} ne 'unknown') {
Kevin Atkinson's avatar
Kevin Atkinson committed
1446
	$subject = "$what: $d->{cause_desc}$which";
1447 1448 1449 1450
    }
    
    my $body;

Kevin Atkinson's avatar
Kevin Atkinson committed
1451 1452
    $body .= "$prefix\n\n" if $prefix;

1453 1454 1455 1456 1457 1458 1459 1460
    if ($d->{confidence} > $threshold) {

	$body .= $d->{mesg};
	$body .= "\n";

	if ($d->{cause} eq 'temp') {
	    $body .= "\n";
	    $body .= "Please take a look at this Knowledge Base entry for hints on what to do:\n\n";
1461
	    $body .= "  @TBDOCBASE@/kb-show.php3?xref_tag=no_resources\n";
1462 1463 1464 1465 1466 1467 1468
	}

	$headers .= "\nX-NetBed-Cause: $d->{cause}";

    } else {

	$body = 
Kevin Atkinson's avatar
Kevin Atkinson committed
1469
	    ("Please look at the log below to see what happened.");
1470 1471 1472
    }

    SENDMAIL($to, $subject, $body, $from, $headers, @files);
Kevin Atkinson's avatar
Kevin Atkinson committed
1473 1474
}

1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504
=item tblog_format_error DATA

Format the information in DATA in a format suitable for printing.

DATA is the object returned form tblog_find_error.  It is OK if it is
undefined.

=cut

sub tblog_format_error( $ ) 
{
    my ($d) = @_;
    
    unless (defined $d) {
	$d = {mesg => "No clue as to what went wrong!",
	      cause => 'unknown', cause_desc => 'Cause Unknown',
	      confidence => 0}
    }
	
    my $mesg = '';
    $mesg .= "ERROR:: $d->{cause_desc}\n";
    $mesg .= "\n";
    $mesg .= "$d->{mesg}\n";
    $mesg .= "\n";
    $mesg .= "Cause: $d->{cause}\n";
    $mesg .= "Confidence: $d->{confidence}\n";

    return $mesg;
}

Kevin Atkinson's avatar
Kevin Atkinson committed
1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515
#
# Perl Tie Methods, see perltie(1)
#

sub TIEHANDLE {
    my ($classname, $glob) = @_;
    bless \$glob, $classname;
}

sub PRINT {
    my $this = shift;
1516 1517
    if (IO::Handle::opened($$this)) { # $$this->opened doesn't work 
                                      # with perl 5.005
Kevin Atkinson's avatar
Kevin Atkinson committed
1518 1519
	print {$$this} @_;
    } else {
1520
	carp "print() on unopened filehandle";
Kevin Atkinson's avatar
Kevin Atkinson committed
1521
    }
Kevin Atkinson's avatar
Kevin Atkinson committed
1522
    local $_ = join '', @_; # NOTE: This doesn't take into account "$,"
Kevin Atkinson's avatar
Kevin Atkinson committed
1523
			    # or output_field_separator
Kevin Atkinson's avatar
Kevin Atkinson committed
1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538
    s/\n$//;
    if (/warning:/i) {
      dblog $WARNING, {inferred=>2}, $_;
    } elsif (/\*\*\*/) {
      dblog $ERR, {inferred=>2}, $_;
    } else {
      dblog $INFO, {inferred=>3}, $_;
    }
}

sub PRINTF {
    my ($this,$format) = (shift, shift);
    &PRINT($this, sprintf($format, @_));
}

1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552
sub FILENO {
    my $this = shift;
    fileno($$this);
}

sub OPEN {
    my $this = shift;
    my ($caller_package) = caller;
    # The open must be executed in the callers package becuase of
    # things like "open FH, '>&LOG'" will fail if open is executed in
    # the libtblog package since "LOG" is in the callers package.
    # Since package is a compile time directive, the only way to do
    # this is to do an eval on a string.
    eval("package $caller_package;".
Kevin Atkinson's avatar
Kevin Atkinson committed
1553 1554 1555
	 '@_ == 1 ? open($$this, $_[0]) : 
          @_ == 2 ? open($$this, $_[0], $_[1]) :
          open($$this, shift, shift, @_)');
1556 1557 1558 1559 1560 1561 1562 1563
}

sub CLOSE {
    # Don't do anything: If we really close $$this than we will lose
    # the fact that SOUT and STDOUT have the same underlying fileno
    # (ditto for SERR and STDERR)
}

1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575
=item tblog_start_capture [LIST]

Capture all output to STDOUT and STDERR and turn them into log
messages.  Use SOUT and SERR to print to the real STDOUT and STDERR
respectfully.  Does NOT capture output of subprocesses.  Will also
install handlers for "die" and "warn".

If LIST is present than only capture the parts present in list, can be
any one of "stdout", "stderr", "die", or "warn".

=cut

Kevin Atkinson's avatar
Kevin Atkinson committed
1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589
#
# Implementation node: tie is used to catch prints to STDOUT and
#  STDERR as that seams to be the only sane way to do it.  "print" is
#  a special function in perl and can not be overridden normally.
#  Using "*print = &myprint" or even "*IO::Handle::print = &myprint"
#  will only catch the calls to print without a file handle.  Although
#  it may be possible to catch the other type of call to print I don't
#  know how.
#
sub tblog_start_capture( ;@ ) {
    my (@what) = @_;
    @what = qw(stdout stderr die warn) unless @what;
    foreach (@what) {
	if ($_ eq 'stdout') {
1590
	    tie *STDOUT, 'libtblog', \*SOUT;
Kevin Atkinson's avatar
Kevin Atkinson committed
1591
	} elsif ($_ eq 'stderr') {
1592
	    tie *STDERR, 'libtblog', \*SERR;
Kevin Atkinson's avatar
Kevin Atkinson committed
1593 1594
	} elsif ($_ eq 'die') {
	    $SIG{__DIE__} = sub {
1595 1596
		return unless defined $^S; # In Parser
		return if $^S;             # In Eval Block
1597
		tblog_stop_capture();
Kevin Atkinson's avatar
Kevin Atkinson committed
1598 1599
		local $_ = $_[0];
		s/\n$//;
1600 1601
		dblog($ERR, {inferred=>1, cause=>'internal'}, $_);
		die format_message(scriptname(), $ERR, $_);
Kevin Atkinson's avatar
Kevin Atkinson committed
1602 1603 1604
	    };
	} elsif ($_ eq 'warn') {
	    $SIG{__WARN__} = sub {
1605
		warn $_[0] unless defined $^S; # In Parser
Kevin Atkinson's avatar
Kevin Atkinson committed
1606 1607
		local $_ = $_[0];
		s/\n$//;
1608
		dblog($WARNING, {inferred=>1, cause=>'internal'}, $_);
Kevin Atkinson's avatar
Kevin Atkinson committed
1609 1610 1611
		print SERR "$_\n";
	    };
	} else {
1612
	    carp "Unknown flag in tblog_start_capture: $_";
Kevin Atkinson's avatar
Kevin Atkinson committed
1613 1614 1615 1616
	}
    }
}

1617 1618 1619 1620 1621 1622 1623 1624 1625 1626
=item tblog_stop_capture [LIST]

Stop capture of STDOUT and STDERR, and remove handles for die and
warn.

If LIST is present than only stop capture of the parts present in
list, can be any one of "stdout", "stderr", "die", or "warn".

=cut

Kevin Atkinson's avatar
Kevin Atkinson committed
1627 1628 1629 1630 1631 1632
sub tblog_stop_capture( ;@ ) {
    my (@what) = @_;
    @what = qw(stdout stderr die warn) unless @what;
    foreach (@what) {
	if    ($_ eq 'stdout') {untie *STDOUT}
	elsif ($_ eq 'stderr') {untie *STDERR}
1633 1634
	elsif ($_ eq 'die')    {$SIG{__DIE__}  = ''}  
	elsif ($_ eq 'warn')   {$SIG{__WARN__} = ''}
Kevin Atkinson's avatar
Kevin Atkinson committed
1635
	else 
1636
	  {carp "Unknown flag in tblog_stop_capture: $_"}
Kevin Atkinson's avatar
Kevin Atkinson committed
1637 1638 1639
    }
}

1640 1641 1642 1643 1644 1645
=item tblog_session

Returns the current session or undefined if it has