This appendix lists error messages you might encounter while using IRIX NetWorker BusinesSuite Module for Oracle and provides suggestions to resolve problems described.
The Enterprise Backup Utility stores information and error messages in the log file specified by the log parameter in the Enterprise Backup Utility command script. You should review the output in this log file after each backup job.
Enterprise Backup Utility errors always begin with the prefix OBK. If the target database is Oracle7 release 7.3.2 or higher, you may use the Oracle error utility oerr to obtain information about error messages and courses of action. For example, to find information on a specific Enterprise Backup Utility error, use the following command, replacing error_number with the actual number of the error:
% oerr obk error_number
If the target database is an Oracle7 release prior to 7.3.2, you may use the berr.sh script, included with the Enterprise Backup Utility, to obtain information about error messages and courses of action. For example, to find information on a specific Enterprise Backup Utility error, enter the following command, again replacing error_number with the actual number of the error:
% berr.sh obk error_number
The Enterprise Backup Utility also writes messages to the file given by the trace specifier. This file is useful for debugging, in combination with the file specified by the NSR_DEBUG_FILE environment variable.
|Caution: Using the trace specifier in an Enterprise Backup Utility command script will degrade performance.|
NetWorker error messages are displayed in the NetWorker Administrator window. The display lists the messages encountered during the past 24 hours. The messages are also written to the /nsr/messages/daemon.log file.
NetWorker error messages appear in this format:
day hh:mm:ss daemon_or_program_name: message
This section lists error messages encountered during a NetWorker backup. The messages are organized alphabetically by NetWorker daemon name and program name to make them easier to match to the NetWorker message displayed.
The flag file signifying the end of the first part of index compression is already in use by another instance of the nsrck program, or by the nsrindexd daemon. Since disaster will ensue if two processes access the same index at the same time, nsrck will refuse to act on the named file.
Informative message that the files associated with the named client are being inspected.
Displayed as the program finishes, provided some form of checking was accomplished.
more space needed to compress clientname index, size required
The nsrck program cannot find enough disk space to hold the temporary file db.CMP. The operator should free some disk space on any local filesystem and retry the command. Use the df command to see how much free space is available on any filesystem.
After a reboot, if index compression completed its first copy, the compression is rolled forward.
WARNING no valid savetimes - cross-check not performed for clientname
During a cross-check, no save sets were found for this client. Since this situation can occur during disaster recovery, nsrck avoids deleting the entire contents of the client index and instead does nothing.
/ path /nsrexecd: Can't make pipe/ path /nsrexecd: Can't fork fork: No more processes
The specified client-side resource has been exceeded. There are too many other services running on the client while savegrp is running. Inspect the client and determine why it has run out of resources. The client may need to be rebooted. You should also consider rescheduling any jobs automatically started on the client (as with cron(1M)) that run while savegrp is running.
/ path /nsrexecd: Couldn't look up address for your host/ path /nsrexecd: Host address mismatch for server
The nsrexecd daemon on the client managed to look up the server in the client's host table, but the address listed there did not match the address of the server. Every interface of the server must have a unique name listed in the host table (possibly with non-unique aliases or CNAMEs), and each unique name must be listed as a valid server to nsrexecd.
/ path /nsrexecd: Host server cannot request command execution/ path /nsrexecd: Your host cannot request command execution
The server is not listed in nsrexecd's list of valid servers on the specified client. The list of valid servers is either on the nsrexecd command line (with one or more -s server options to nsrexecd), or in a file (with the -f file option to nsrexecd). It may also be the case that the server is not listed in one or more of /etc/hosts, NIS, or DNS, on the client, in which case nsrexecd cannot validate the server until the client's host naming configuration is fixed.
/ path /nsrexecd: Invalid authenticator/ path /nsrexecd: Invalid command
These two messages should never occur in a savegroup completion message. They mean that savegrp did not follow its protocol correctly.
These similar messages are generated by nsrexecd and rshd, respectively. In either case, the server does not have permission to execute commands on the client. In the case of the first message, make sure that the server is listed as a valid server on the client (see “/path/nsrexecd: Host server cannot request command execution” for details). In the case of the second message, which does not mention nsrexecd, make sure that servername is listed in the client's /.rhosts file (or, if you have set the remote user attribute for this client, the .rhosts file in the home directory for that user on the client).
This message is generated when the remote user attribute for the client is not set to a valid login on the client. Verify that the remote user attribute for the client is set to the correct login name. You may see this message even when running nsrexecd if nsrexecd has not been started (or was killed) on the client.
The client does not seem to support the TCP/IP protocol stack, or has not used a privileged port for setting up its connection. The latter could occur if you use nsrexecd but did not start it as root on the specified client. The nsrexecd daemon must run as root on each client.
Informative message that will eventually follow the “waiting for lock...” message.
This message indicates that another program is accessing the same file that is required by the nsrindexd daemon. The daemon waits for the advisory lock to be cleared.
Another copy of nsrmmdbd(1M) is currently running and has exclusive access to the media database. Only one nsrmmdbd process should be running on a given machine at a time. This can happen if the previous nsrmmdbd was not properly killed off. Use nsr_shutdown(1M) or ps(1) and kill(1) to identify and kill off all the NetWorker daemons before restarting nsrd(1M) again.
An internal error. Check the permissions on the /nsr/tmp and /nsr/mm directories.
Printed when the daemon is dumping its records to a temporary file when the database is being backed up. The service is unavailable while the database is dumping.
Printed when the nsrmmdbd daemon is reloading its database. The service is unavailable while the data is being reloaded.
Printed each time the daemon is restarted. Upon startup, the daemon sanity checks its records before providing its service.
Printed after any of the previous messages are printed to indicate that the service is once again available.
This message, generated by the save command on client, means that save is not setuid root. Make sure that the save command on the client is owned by root and has its setuid bit set. If save is on an NFS mounted filesystem, make sure the filesystem was not mounted on that client using the “–nosuid” option.
This message means that while backing up the specified save set, save was unable to enter the named directory. This may mean that save is not setuid root on the specified client, or that the directory is actually an NFS mount point for which root is not allowed access. Check the permissions for save on the specified client (using ls(1)) and make sure that save is owned by root and that the setuid bit is set.
RPC error, details...Cannot open save session with ' server '
The save command generates this message if it is unable to back up data to the NetWorker server. There are several possible details. The most likely causes are: resources are exceeded on the server so nsrd cannot accept new save sessions, nsrd actually died since savegrp started (however, this is unlikely, since you cannot normally receive a savegrp completion message after nsrd dies, but you can see this when using the -p option), there are numerous network errors occurring and save cannot open a session to save its data (check this by running netstat -s and see how many network errors are occurring; you may need to do this several times a few minutes apart to get the change in errors). Save cannot tell which of these three causes are the real cause. If you see these errors frequently, and it looks like a server resource problem, you might consider increasing the value of the “client retries” attribute of the group resource having these problems. This won't decrease the resource utilization, but will make savegrp more robust (the trade-off is that increasing client retries will increase the load on the server even more).
This error occurs when the named client has more than one name, for example, a short name, client, and a fully-qualified domain name, client.xxx.com. When the client attempts to connect back to the NetWorker server to start a save, that client is calling itself by the name client, which matches the client resource name, but when the server looks up the client's network address, it is getting back the name client.xxx.com. If this is, in fact, correct, add the name client.xxx.com to the client's aliases attribute, and re–run the save.
This message can occur if you have a directory tree that is very deep, or directory names that are very long. This message can also occur if there are bad blocks in the specified filesystem, or if the filesystem is corrupt. NetWorker limits the full pathname to 1024 characters which is the system imposed maximum on most systems. To save such directories, you need to rename or move the directories so that the full pathname is shorter than 1024 characters. If the filesystem appears to be corrupted (for example, a very long pathname that looks like it has a loop in the name), perform a filesystem check on the specified client.
/ path /savefs: Command not found / path /save: Not found / path /save: Command not found / path /savefs: Not found
The save or savefs command could not be found in the specified path. If you are using nsrexecd, this probably means that the save or savefs command is not in the same directory in which nsrexecd is installed (or that save or savefs was removed). If you are using rshd for remote execution, then you need to set the executable path attribute in the Client resource for this client to be the directory in which the NetWorker executables are installed on the client.
This informational message accompanies several other save or asm messages listed. This message means that savefs has detected the failed save command and has marked the save set as failed.
save : unknown host name: server savefs : unknown host name: server
The host table on the specified client (either /etc/hosts, NIS, or DNS, depending on that client's configuration) does not include the server's name. You need to add the server's hostname to the specified client's host table. Note that if you use DNS but the server's Client resource name (that is, the client resource for the server itself) is not fully qualified (it looks like “server” not “server.domain” and the server is in a different domain from the client, you will need to add the name server to the domain table for the domain containing the client. If you use NIS, this error means that either the NIS hosts map does not contain the server, the /etc/hosts file does not list the server, or the NIS master for the specified client is otherwise misconfigured (the server is a secondary server and there is no yppush(1M) from the primary; run ypwhich -m on the client to find out which NIS server is providing master translation).
The specified client is not listed in the host table on the server (similar to “Warning: `client' is not in the hosts table!”). Depending on your host configuration, this means the client is not listed in one (or more) of /etc/hosts, NIS, or the Domain Name Service. If you use fully qualified domain names, you may need to make a new client resource for this client, using that fully qualified domain name (that is, name the client resource mars.acme.com, not mars).
This message is generated by a save or savefs command run on the specified client to save that client's filesystems. The client's hostname is not listed in the host table on the client (either /etc/hosts, NIS, or DNS, depending on that client's configuration). This almost always results in a failed save. Fix the client's host table and re-run the save command.
This warning message is generated when save notices that the file's modification time changed while the file was being backed up. NetWorker does not attempt to lock files before saving them, since this would make backups run extremely slowly. You may wish to backup files which generate this message manually, to ensure that a consistent copy is saved. NetWorker does not attempt this automatically, to avoid trying forever on the same file.
This informational message is often generated when NetWorker backs up the message log files. It may also occur for other files. For files that you expect to grow while savegrp is running, you can use a directive specifying that the logasm (see uasm(1M)) should be used to back up the file. Also see the nsr(4) and nsr_directive(4) reference pages.
The backup of the named filesystem was begun, but the connection to the NetWorker server closed part way through. This typically means that the server machine rebooted, or one or more save agents were killed by the system administrator or by the system itself (for example, due to overwriting the binary or a disk error in swap space). Restart the save at a later time.
This informational message occurs only when you stop a running savegrp. The session for this save set may not disappear immediately, especially if the program's attempt to kill the save session fails. The save set will be retried if and when you restart the savegrp.
This message is caused when then the client's hostname and IP address are not correctly listed in one or more of /etc/hosts, NIS, or DNS on the server. You must either change the appropriate host table (depending on which ones are in use on your server) to list the client's name as it is know to NetWorker, as that client's primary name, or you need to add the name listed at the end of the error message to the aliases attribute of the client's Client resource(s).
This message generally means that there are bad blocks on the disk(s) containing the specified file or directory. Immediately run a filesystem check on the named client filesystem and check your client's system error log. If there are bad blocks, repair them if possible, or move the filesystem to a different disk.
asm : cannot stat / path : Stale NFS file handle asm : cannot stat / path : Missing file or filesystem
These informational messages (or variants of them for other operating systems) mean that when save attempted to test the named directory to determine if it was a different filesystem from the one currently being saved, the filesystem was, in fact, NFS mounted, but the mount point was bad. While this message does not affect the saved data, it does mean you have a network or NFS problem between the specified client and one or more of its fileservers. You may need to remount filesystems on the client, or perhaps reboot it, to correct the problem.
This message generally accompanies another message reporting a specific problem while saving a file or directory on the named save set. The backup will attempt to continue and attempt to save other data, and generally, the backup will not be listed in the failed save sets section of the completion mail if any files on the save set are saved successfully, even if it only saves the top directory of the save set.
This message means that while backing up the specified save set, an attempt to determine the current directory's name failed. This occurs on clients (generally running older versions of the NetWorker ClientPak) on which the getwd(3) library call is broken. You may want to contact Legato Tech Support to find out if there is a patch available for your client platform to work around this vendor-specific bug, or contact your operating system vendor to see if a more recent version of the operating system addresses this problem.
This message is generated when a backed-up file had one or more hard links that were not found. The message is followed by a list of one or more filenames which were backed up minus some links. The message means that the files were either created (with multiple hard links) while the backup was occurring, so some of the links were missed due to the order of filesystem tree walking, or the file (or some links) were removed while the backup was occurring. Only those links that were found can be recovered; additional links will have been lost. You can do another incremental backup of the affected filesystem if a consistent state for the affected file is essential.
This message generally accompanies one or more other more-specific messages for the save set. The specified path within the current save set was not saved successfully. The backup will continue trying to back up other files and directories on the save set.
This error can be caused by several possible conditions (for example. out of memory, buggy networking software in the operating system, an external ASM unexpectedly exiting, a lost network connection). If it was due to a lost network connection, then the NetWorker server most likely exited (due to nsr_shutdown). After restarting the server, rerun the group. If the error is due to an ASM exiting unexpectedly (in this case, the message should be accompanied by a message describing which ASM exited unexpectedly), you may have found a bad block on the disk, or perhaps a bug. Check if the client ran out of memory (there may be console messages), and verify that there are no bad blocks on the save set's disk. If there were network errors, there may also have been messages logged by other programs on the system console (client or server), or to system log files.
connect to address AA.BB.CC.DD : messageTrying AA.BB.CC.DD ...
These informational messages are displayed only when the -v option is used. They mean that the connection to the client failed on the address specified in the first line of the message. If the client has more than one IP address, savegrp has attempted the address listed in the second line. Looking at subsequent lines of the completion mail show if this second address succeeded. You may want to check and change your network routing tables to avoid getting these messages.
This means the client machine is up, but it is not accepting new network connections for nsrexecd (or rshd). This could mean the client was in the process of booting when the savegrp attempted to connect, or that the client had exceeded some resource limit, and was not accepting any new connections. You should attempt to log into the client and verify that it is accepting remote connections.
This usually means the client has crashed or is hung. Make sure the client has rebooted, and that nsrexecd is running on it (if you are using nsrexecd).
This message is only delivered by itself. It occurs when the named group has already been started or restarted (for example. after a reboot, or when requested via the Group Control Window of nwadmin(1M)), either automatically by nsrd(1M) or manually, from the command line. You can use ps(1) to find out the process ID of a running savegrp. The existence of a running group is determined by looking for a file named /nsr/tmp/sg.groupname which, if existing and locked, means a savegrp is running.
has been inactive for n minutes since time .client : saveset is being abandoned by savegrp.
A backup of the specified save set started, but after N minutes of no activity, and savegrp gave up on the save set. Generally, this means that the client is hung waiting for an NFS partition. Unfortunately, NetWorker (or any other program) has no way of reliably telling if an NFS partition will hang until after it tries to access the partition. When the partition comes back on line, the save will complete, despite the fact that savegrp abandoned it. You should check the client, however, since you sometimes need to reboot the client to unhang NFS partitions. Non-UNIX clients also hang for other reasons, most notably bugs in the operating system implementation of their network protocols.
The NetWorker server cannot make TCP/IP connections to the client. This generally means the network itself is not configured correctly; most commonly, one or more gateways or routers are down, or the network routes were not set up correctly. Verify that the server can connect to the client. If the server cannot connect to the client, check your routers, gateways, or routing tables and reconfigure them if needed.
This informational message is added by savegrp to any save set that is saved at the level “full” instead of the level found in the client's schedule. Due to timing problems, you can occasionally see this message when the clocks on the client and server are out of sync, or when savegrp starts before midnight and ends after midnight.
No 'NSR client' resource for client clienthostnamesavefs: cannot retrieve client resources
This pair of messages occurs if the client's hostname changed (in /etc/hosts, NIS, or DNS). You may also have deleted the client's “Client” resource while savegrp was running. In the former case, you will need to add the client's new name to the aliases attribute of the client (this is a hidden attribute) using nsradmin(1M) (selecting the “Hidden” display option) or nwadmin(1M) (selecting the “Details View” option for the Client window). In the latter case, no additional action is required if this deletion was intentional (the next run of savegrp will not attempt to save the client). If it was accidental, and you did not want to delete the client, you should add the client again into the appropriate group(s). The next time savegrp runs, it will back up the client, just as if the client had been down the previous day.
The save set completed, but returned no status output. The most common reasons are that the client crashed or lost its network connection (that is. a router between the client and server crashed) while the client was being backed up. Another is that the disk on which the client status was being logged filled up (use df /nsr/tmp to see if this was the case). To determine if the save set was saved, use mminfo(1M). For example, run mminfo -v -c clientname -t '1 day ago' and look at the flags column for the completion status. An a flag means it aborted. Use a more distant time (the -t option) to look further back in time.
An explicit save set was named in the Client resource for the specified client, and that save set does not exist (or is not currently mounted) on the client. Make sure you spelled the save set name correctly (and that it is capitalized correctly), and log into the client and verify that the save set is mounted.
One of these informational messages is prepended to a save set's output if savegrp was unable to backup the data on the first try and if the client retries attribute for the group has a value greater than zero. In this case, the specified number of retries was performed before the backup of the save set succeeded or was finally marked as failed.
/ path : This data set is in use and cannot be accessed at this time
This message is generated by save sets on PC clients running DOS or NetWare. The NetWorker client software on these systems cannot back up files open for writing, due to the interface provided by the operating system. This message actually comes from the Novell TSA and is not changeable.
printer : unknown printer / path /savegrp: printing bootstrap information failed (reproduced below)/ path /savegrp: printing bootstrap information failed
These messages, or similar messages, accompany the bootstrap information when savegrp was unable to print the bootstrap on the printer. You need to either specify a different printer in the printer attribute for the group, or configure your print server to recognize the printer (by default, your system's default printer is used). The bootstrap information is listed as part of the savegrp completion mail. Print out this information immediately, in case your server has a disaster and loses a disk, and fix the printer name used by savegrp.
After the specified save set completed, savegrp was unable to read the log file of the output status from the save set. This generally means that someone, or an automated nonNetWorker administrative program or script, removed the log file. This message can also occur if the filesystem on which the client logs are stored has run out of space (use df /nsr/tmp to determine if this is the case). Verify that no scripts remove files from /nsr/tmp (which is where savegrp stores the save set log files).
This informational message is displayed only when the -v flag has been used for verbose information. This message means that nsrexecd is not running on the client, and that savegrp is attempting to use the rshd service instead, for backward compatibility with older versions of savegrp.
This error message normally accompanies another, more specific, error message. It is generated
when the attempt to run the specified command (usually save or savefs with several
command-line parameters) failed on the specified save set. The previous line of error output
should include the more specific error message (look for that message elsewhere in this section).
Generally, the problem is a bad host table configuration, or various permissions denied errors
(server not specified when starting nsrexecd, or missing permissions in .rhosts if not using
nsrexecd). If not, log into the NetWorker server as root and run the command
Saving server index because server is not in an active group
This informational message, generated by savegrp, means that savegrp has noticed that the NetWorker server is not listed in any automatically started, enabled group. Since all of the indexes are stored on the server, savegrp is saving the server's index and bootstrap information in case a disaster occurs. You should add the server to a group with autostart enabled, or enable one of the groups of which the server is already a member.
The NetWorker server has run out of socket descriptors. This means that you have exceeded the socket resource limit on your server. To avoid such future messages, you should determine what other network services are running while savegrp is running, and consider re-scheduling either savegrp or the other service(s). You can also reduce the parallelism in the nsr_service(4) resource, to reduce the resource utilization.
This section lists error messages returned by the NetWorker recover program during an on–demand restore using the Enterprise Backup Utility. The messages are organized alphabetically to make them easier to match to the NetWorker message displayed.
An informative message that explicitly states which NetWorker client's index is being browsed for interactive recovers that resolve to another machine.
This message indicates that some problem was encountered connecting to the NetWorker server on the named machine.
This message indicates that the client invoking the recover command is not in the server's client list. See nsr_service(4) for details.
Message from server : other clones exist for failed save set
The recover command will be automatically re-submitted to the server, if any files remain to be recovered, because the request failed on a save set that had multiple clones. The server automatically picks a different clone on each attempt.
An informative message that lets you know that the given pathname is mounted from a network file server and that the recovery will use the index for the named file server. If the machine is not a NetWorker client, then the -c option may be necessary.
The filename cannot be recovered because you are not root or in the group operator, and you don't have read permission for the file.
The filename cannot be recovered because you are not root or in the group operator, the file has an Access Control List (ACL), and you are not the owner of the file.
An informative message that lets you know which NetWorker server was selected for client's index.
During a backup or restore, NetWorker attempts to record error messages generated by the BusinesSuite Module for Oracle library to the dmo.messages file. NetWorker opens this file in one of the following locations (default locations, listed in order):
The alternate messages directory created during installation, /nsr/applogs.
The /tmp directory.
Standard error (stderr), if dmo.messages cannot be opened or created.
See Appendix A, “Environment Variables” for descriptions of NetWorker BusinesSuite Module for Oracle variables and values you can assign to them in the nsrdmo script.
NetWorker BusinesSuite Module for Oracle error messages appear in the following format, where pid is the process ID and function is an internally-defined function name:
(pid) OBK-sbt: <function>: <error message>
In addition, BusinesSuite Module for Oracle writes extended debugging information in the file specified by the NSR_DEBUG_FILE environment variable.