Appendix B. Error Messages

This appendix lists error messages you might encounter while using IRIX NetWorker BusinesSuite Module for Oracle and provides suggestions to resolve problems described.

Enterprise Backup Utility Messages

The Enterprise Backup Utility stores information and error messages in the log file specified by the log parameter in the Enterprise Backup Utility command script. You should review the output in this log file after each backup job.

Enterprise Backup Utility errors always begin with the prefix OBK. If the target database is Oracle7 release 7.3.2 or higher, you may use the Oracle error utility oerr to obtain information about error messages and courses of action. For example, to find information on a specific Enterprise Backup Utility error, use the following command, replacing error_number with the actual number of the error:

% oerr obk error_number 

If the target database is an Oracle7 release prior to 7.3.2, you may use the berr.sh script, included with the Enterprise Backup Utility, to obtain information about error messages and courses of action. For example, to find information on a specific Enterprise Backup Utility error, enter the following command, again replacing error_number with the actual number of the error:

% berr.sh obk error_number 

The Enterprise Backup Utility also writes messages to the file given by the trace specifier. This file is useful for debugging, in combination with the file specified by the NSR_DEBUG_FILE environment variable.


Caution: Using the trace specifier in an Enterprise Backup Utility command script will degrade performance.


NetWorker Error Messages

NetWorker error messages are displayed in the NetWorker Administrator window. The display lists the messages encountered during the past 24 hours. The messages are also written to the /nsr/messages/daemon.log file.

NetWorker error messages appear in this format:

day hh:mm:ss daemon_or_program_name: message 

Errors Generated While Saving Data

This section lists error messages encountered during a NetWorker backup. The messages are organized alphabetically by NetWorker daemon name and program name to make them easier to match to the NetWorker message displayed.

nsrck

cannot lock flag file for clientname : reason

The flag file signifying the end of the first part of index compression is already in use by another instance of the nsrck program, or by the nsrindexd daemon. Since disaster will ensue if two processes access the same index at the same time, nsrck will refuse to act on the named file.


checking index for clientname

Informative message that the files associated with the named client are being inspected.


completed checking count clients

Displayed as the program finishes, provided some form of checking was accomplished.


compressing index for clientname

Displayed when the -x or -C option has taken effect.


cross-checking index for clientname

Displayed when the -X option is in effect.


more space needed to compress clientname index, size required

The nsrck program cannot find enough disk space to hold the temporary file db.CMP. The operator should free some disk space on any local filesystem and retry the command. Use the df command to see how much free space is available on any filesystem.


rolling forward index compression for clientname

After a reboot, if index compression completed its first copy, the compression is rolled forward.


WARNING no valid savetimes - cross-check not performed for clientname

During a cross-check, no save sets were found for this client. Since this situation can occur during disaster recovery, nsrck avoids deleting the entire contents of the client index and instead does nothing.


nsrexecd

/ path /nsrexecd: Can't make pipe/ path /nsrexecd: Can't fork fork: No more processes

The specified client-side resource has been exceeded. There are too many other services running on the client while savegrp is running. Inspect the client and determine why it has run out of resources. The client may need to be rebooted. You should also consider rescheduling any jobs automatically started on the client (as with cron(1M)) that run while savegrp is running.


/ path /nsrexecd: Couldn't look up address for your host/ path /nsrexecd: Host address mismatch for server

The nsrexecd daemon on the client managed to look up the server in the client's host table, but the address listed there did not match the address of the server. Every interface of the server must have a unique name listed in the host table (possibly with non-unique aliases or CNAMEs), and each unique name must be listed as a valid server to nsrexecd.


/ path /nsrexecd: Host server cannot request command execution/ path /nsrexecd: Your host cannot request command execution

The server is not listed in nsrexecd's list of valid servers on the specified client. The list of valid servers is either on the nsrexecd command line (with one or more -s server options to nsrexecd), or in a file (with the -f file option to nsrexecd). It may also be the case that the server is not listed in one or more of /etc/hosts, NIS, or DNS, on the client, in which case nsrexecd cannot validate the server until the client's host naming configuration is fixed.


/ path /nsrexecd: Invalid authenticator/ path /nsrexecd: Invalid command

These two messages should never occur in a savegroup completion message. They mean that savegrp did not follow its protocol correctly.


/ path /nsrexecd: Permission deniedPermission denied

These similar messages are generated by nsrexecd and rshd, respectively. In either case, the server does not have permission to execute commands on the client. In the case of the first message, make sure that the server is listed as a valid server on the client (see “/path/nsrexecd: Host server cannot request command execution” for details). In the case of the second message, which does not mention nsrexecd, make sure that servername is listed in the client's /.rhosts file (or, if you have set the remote user attribute for this client, the .rhosts file in the home directory for that user on the client).


Login incorrect

This message is generated when the remote user attribute for the client is not set to a valid login on the client. Verify that the remote user attribute for the client is set to the correct login name. You may see this message even when running nsrexecd if nsrexecd has not been started (or was killed) on the client.


socket: protocol failure in circuit setup.

The client does not seem to support the TCP/IP protocol stack, or has not used a privileged port for setting up its connection. The latter could occur if you use nsrexecd but did not start it as root on the specified client. The nsrexecd daemon must run as root on each client.


nsrindexd

lock on filename acquired.

Informative message that will eventually follow the “waiting for lock...” message.


waiting for lock on filename .

This message indicates that another program is accessing the same file that is required by the nsrindexd daemon. The daemon waits for the advisory lock to be cleared.


nsrmmdbd

A copy of this process is already running!

Another copy of nsrmmdbd(1M) is currently running and has exclusive access to the media database. Only one nsrmmdbd process should be running on a given machine at a time. This can happen if the previous nsrmmdbd was not properly killed off. Use nsr_shutdown(1M) or ps(1) and kill(1) to identify and kill off all the NetWorker daemons before restarting nsrd(1M) again.


Cannot open lock file

An internal error. Check the permissions on the /nsr/tmp and /nsr/mm directories.


media db is saving its data, this may take a while

Printed when the daemon is dumping its records to a temporary file when the database is being backed up. The service is unavailable while the database is dumping.


media db is recovering, this may take a while

Printed when the nsrmmdbd daemon is reloading its database. The service is unavailable while the data is being reloaded.


media db is cross checking the save sets

Printed each time the daemon is restarted. Upon startup, the daemon sanity checks its records before providing its service.


media db is open for business

Printed after any of the previous messages are printed to indicate that the service is once again available.

 


save—savefs

Access violation from client - insecure port N

This message, generated by the save command on client, means that save is not setuid root. Make sure that the save command on the client is owned by root and has its setuid bit set. If save is on an NFS mounted filesystem, make sure the filesystem was not mounted on that client using the “–nosuid” option.


asm : chdir failed / path : Permission denied

This message means that while backing up the specified save set, save was unable to enter the named directory. This may mean that save is not setuid root on the specified client, or that the directory is actually an NFS mount point for which root is not allowed access. Check the permissions for save on the specified client (using ls(1)) and make sure that save is owned by root and that the setuid bit is set.


RPC error, details...Cannot open save session with ' server '

The save command generates this message if it is unable to back up data to the NetWorker server. There are several possible details. The most likely causes are: resources are exceeded on the server so nsrd cannot accept new save sessions, nsrd actually died since savegrp started (however, this is unlikely, since you cannot normally receive a savegrp completion message after nsrd dies, but you can see this when using the -p option), there are numerous network errors occurring and save cannot open a session to save its data (check this by running netstat -s and see how many network errors are occurring; you may need to do this several times a few minutes apart to get the change in errors). Save cannot tell which of these three causes are the real cause. If you see these errors frequently, and it looks like a server resource problem, you might consider increasing the value of the “client retries” attribute of the group resource having these problems. This won't decrease the resource utilization, but will make savegrp more robust (the trade-off is that increasing client retries will increase the load on the server even more).


save: client.xxx.com is not on client 's access list

This error occurs when the named client has more than one name, for example, a short name, client, and a fully-qualified domain name, client.xxx.com. When the client attempts to connect back to the NetWorker server to start a save, that client is calling itself by the name client, which matches the client resource name, but when the server looks up the client's network address, it is getting back the name client.xxx.com. If this is, in fact, correct, add the name client.xxx.com to the client's aliases attribute, and re–run the save.


save: path length of n too long, directory not saved

This message can occur if you have a directory tree that is very deep, or directory names that are very long. This message can also occur if there are bad blocks in the specified filesystem, or if the filesystem is corrupt. NetWorker limits the full pathname to 1024 characters which is the system imposed maximum on most systems. To save such directories, you need to rename or move the directories so that the full pathname is shorter than 1024 characters. If the filesystem appears to be corrupted (for example, a very long pathname that looks like it has a loop in the name), perform a filesystem check on the specified client.


/ path /savefs: Command not found / path /save: Not found / path /save: Command not found / path /savefs: Not found

The save or savefs command could not be found in the specified path. If you are using nsrexecd, this probably means that the save or savefs command is not in the same directory in which nsrexecd is installed (or that save or savefs was removed). If you are using rshd for remote execution, then you need to set the executable path attribute in the Client resource for this client to be the directory in which the NetWorker executables are installed on the client.


savefs: error starting save of filesystem

This informational message accompanies several other save or asm messages listed. This message means that savefs has detected the failed save command and has marked the save set as failed.


save : unknown host name: server savefs : unknown host name: server

The host table on the specified client (either /etc/hosts, NIS, or DNS, depending on that client's configuration) does not include the server's name. You need to add the server's hostname to the specified client's host table. Note that if you use DNS but the server's Client resource name (that is, the client resource for the server itself) is not fully qualified (it looks like “server” not “server.domain” and the server is in a different domain from the client, you will need to add the name server to the domain table for the domain containing the client. If you use NIS, this error means that either the NIS hosts map does not contain the server, the /etc/hosts file does not list the server, or the NIS master for the specified client is otherwise misconfigured (the server is a secondary server and there is no yppush(1M) from the primary; run ypwhich -m on the client to find out which NIS server is providing master translation).


unknown host

The specified client is not listed in the host table on the server (similar to “Warning: `client' is not in the hosts table!”). Depending on your host configuration, this means the client is not listed in one (or more) of /etc/hosts, NIS, or the Domain Name Service. If you use fully qualified domain names, you may need to make a new client resource for this client, using that fully qualified domain name (that is, name the client resource mars.acme.com, not mars).


Warning: ' client ' is not in the hosts table!

This message is generated by a save or savefs command run on the specified client to save that client's filesystems. The client's hostname is not listed in the host table on the client (either /etc/hosts, NIS, or DNS, depending on that client's configuration). This almost always results in a failed save. Fix the client's host table and re-run the save command.


Warning – file ' path ' changed during save

This warning message is generated when save notices that the file's modification time changed while the file was being backed up. NetWorker does not attempt to lock files before saving them, since this would make backups run extremely slowly. You may wish to backup files which generate this message manually, to ensure that a consistent copy is saved. NetWorker does not attempt this automatically, to avoid trying forever on the same file.


save: path file size changed!

This informational message is often generated when NetWorker backs up the message log files. It may also occur for other files. For files that you expect to grow while savegrp is running, you can use a directive specifying that the logasm (see uasm(1M)) should be used to back up the file. Also see the nsr(4) and nsr_directive(4) reference pages.


save: network error, server may be down

The backup of the named filesystem was begun, but the connection to the NetWorker server closed part way through. This typically means that the server machine rebooted, or one or more save agents were killed by the system administrator or by the system itself (for example, due to overwriting the binary or a disk error in swap space). Restart the save at a later time.


savegrp

Aborted

This informational message occurs only when you stop a running savegrp. The session for this save set may not disappear immediately, especially if the program's attempt to kill the save session fails. The save set will be retried if and when you restart the savegrp.


Access violation – unknown host: client

This message is caused when then the client's hostname and IP address are not correctly listed in one or more of /etc/hosts, NIS, or DNS on the server. You must either change the appropriate host table (depending on which ones are in use on your server) to list the client's name as it is know to NetWorker, as that client's primary name, or you need to add the name listed at the end of the error message to the aliases attribute of the client's Client resource(s).


asm : cannot open / path : I/O error

This message generally means that there are bad blocks on the disk(s) containing the specified file or directory. Immediately run a filesystem check on the named client filesystem and check your client's system error log. If there are bad blocks, repair them if possible, or move the filesystem to a different disk.


asm : cannot stat / path : Stale NFS file handle asm : cannot stat / path : Missing file or filesystem

These informational messages (or variants of them for other operating systems) mean that when save attempted to test the named directory to determine if it was a different filesystem from the one currently being saved, the filesystem was, in fact, NFS mounted, but the mount point was bad. While this message does not affect the saved data, it does mean you have a network or NFS problem between the specified client and one or more of its fileservers. You may need to remount filesystems on the client, or perhaps reboot it, to correct the problem.


asm : external ASM ' asm2 ' exited with code 1

This message generally accompanies another message reporting a specific problem while saving a file or directory on the named save set. The backup will attempt to continue and attempt to save other data, and generally, the backup will not be listed in the failed save sets section of the completion mail if any files on the save set are saved successfully, even if it only saves the top directory of the save set.


asm : getwd failed

This message means that while backing up the specified save set, an attempt to determine the current directory's name failed. This occurs on clients (generally running older versions of the NetWorker ClientPak) on which the getwd(3) library call is broken. You may want to contact Legato Tech Support to find out if there is a patch available for your client platform to work around this vendor-specific bug, or contact your operating system vendor to see if a more recent version of the operating system addresses this problem.


asm : missing hard links not found

This message is generated when a backed-up file had one or more hard links that were not found. The message is followed by a list of one or more filenames which were backed up minus some links. The message means that the files were either created (with multiple hard links) while the backup was occurring, so some of the links were missed due to the order of filesystem tree walking, or the file (or some links) were removed while the backup was occurring. Only those links that were found can be recovered; additional links will have been lost. You can do another incremental backup of the affected filesystem if a consistent state for the affected file is essential.


asm : / path was not successfully saved

This message generally accompanies one or more other more-specific messages for the save set. The specified path within the current save set was not saved successfully. The backup will continue trying to back up other files and directories on the save set.


asm : xdr_ op failed for / path

This error can be caused by several possible conditions (for example. out of memory, buggy networking software in the operating system, an external ASM unexpectedly exiting, a lost network connection). If it was due to a lost network connection, then the NetWorker server most likely exited (due to nsr_shutdown). After restarting the server, rerun the group. If the error is due to an ASM exiting unexpectedly (in this case, the message should be accompanied by a message describing which ASM exited unexpectedly), you may have found a bad block on the disk, or perhaps a bug. Check if the client ran out of memory (there may be console messages), and verify that there are no bad blocks on the save set's disk. If there were network errors, there may also have been messages logged by other programs on the system console (client or server), or to system log files.


connect to address AA.BB.CC.DD : messageTrying AA.BB.CC.DD ...

These informational messages are displayed only when the -v option is used. They mean that the connection to the client failed on the address specified in the first line of the message. If the client has more than one IP address, savegrp has attempted the address listed in the second line. Looking at subsequent lines of the completion mail show if this second address succeeded. You may want to check and change your network routing tables to avoid getting these messages.


Connection refused

This means the client machine is up, but it is not accepting new network connections for nsrexecd (or rshd). This could mean the client was in the process of booting when the savegrp attempted to connect, or that the client had exceeded some resource limit, and was not accepting any new connections. You should attempt to log into the client and verify that it is accepting remote connections.


Connection timed out

This usually means the client has crashed or is hung. Make sure the client has rebooted, and that nsrexecd is running on it (if you are using nsrexecd).


group groupname aborted, savegrp is already running

This message is only delivered by itself. It occurs when the named group has already been started or restarted (for example. after a reboot, or when requested via the Group Control Window of nwadmin(1M)), either automatically by nsrd(1M) or manually, from the command line. You can use ps(1) to find out the process ID of a running savegrp. The existence of a running group is determined by looking for a file named /nsr/tmp/sg.groupname which, if existing and locked, means a savegrp is running.


has been inactive for n minutes since time .client : saveset is being abandoned by savegrp.

A backup of the specified save set started, but after N minutes of no activity, and savegrp gave up on the save set. Generally, this means that the client is hung waiting for an NFS partition. Unfortunately, NetWorker (or any other program) has no way of reliably telling if an NFS partition will hang until after it tries to access the partition. When the partition comes back on line, the save will complete, despite the fact that savegrp abandoned it. You should check the client, however, since you sometimes need to reboot the client to unhang NFS partitions. Non-UNIX clients also hang for other reasons, most notably bugs in the operating system implementation of their network protocols.


Host is unreachable

The NetWorker server cannot make TCP/IP connections to the client. This generally means the network itself is not configured correctly; most commonly, one or more gateways or routers are down, or the network routes were not set up correctly. Verify that the server can connect to the client. If the server cannot connect to the client, check your routers, gateways, or routing tables and reconfigure them if needed.


no cycles found in media db; doing full save

This informational message is added by savegrp to any save set that is saved at the level “full” instead of the level found in the client's schedule. Due to timing problems, you can occasionally see this message when the clocks on the client and server are out of sync, or when savegrp starts before midnight and ends after midnight.


No 'NSR client' resource for client clienthostnamesavefs: cannot retrieve client resources

This pair of messages occurs if the client's hostname changed (in /etc/hosts, NIS, or DNS). You may also have deleted the client's “Client” resource while savegrp was running. In the former case, you will need to add the client's new name to the aliases attribute of the client (this is a hidden attribute) using nsradmin(1M) (selecting the “Hidden” display option) or nwadmin(1M) (selecting the “Details View” option for the Client window). In the latter case, no additional action is required if this deletion was intentional (the next run of savegrp will not attempt to save the client). If it was accidental, and you did not want to delete the client, you should add the client again into the appropriate group(s). The next time savegrp runs, it will back up the client, just as if the client had been down the previous day.


no output

The save set completed, but returned no status output. The most common reasons are that the client crashed or lost its network connection (that is. a router between the client and server crashed) while the client was being backed up. Another is that the disk on which the client status was being logged filled up (use df /nsr/tmp to see if this was the case). To determine if the save set was saved, use mminfo(1M). For example, run mminfo -v -c clientname -t '1 day ago' and look at the flags column for the completion status. An a flag means it aborted. Use a more distant time (the -t option) to look further back in time.


filesystem : No such file or directory

An explicit save set was named in the Client resource for the specified client, and that save set does not exist (or is not currently mounted) on the client. Make sure you spelled the save set name correctly (and that it is capitalized correctly), and log into the client and verify that the save set is mounted.


n retries attempted1 retry attempted

One of these informational messages is prepended to a save set's output if savegrp was unable to backup the data on the first try and if the client retries attribute for the group has a value greater than zero. In this case, the specified number of retries was performed before the backup of the save set succeeded or was finally marked as failed.


/ path : This data set is in use and cannot be accessed at this time

This message is generated by save sets on PC clients running DOS or NetWare. The NetWorker client software on these systems cannot back up files open for writing, due to the interface provided by the operating system. This message actually comes from the Novell TSA and is not changeable.


printer : unknown printer / path /savegrp: printing bootstrap information failed (reproduced below)/ path /savegrp: printing bootstrap information failed

These messages, or similar messages, accompany the bootstrap information when savegrp was unable to print the bootstrap on the printer. You need to either specify a different printer in the printer attribute for the group, or configure your print server to recognize the printer (by default, your system's default printer is used). The bootstrap information is listed as part of the savegrp completion mail. Print out this information immediately, in case your server has a disaster and loses a disk, and fix the printer name used by savegrp.


reading log file failed

After the specified save set completed, savegrp was unable to read the log file of the output status from the save set. This generally means that someone, or an automated nonNetWorker administrative program or script, removed the log file. This message can also occur if the filesystem on which the client logs are stored has run out of space (use df /nsr/tmp to determine if this is the case). Verify that no scripts remove files from /nsr/tmp (which is where savegrp stores the save set log files).


RPC exec on client is unavailable. Trying RSH.

This informational message is displayed only when the -v flag has been used for verbose information. This message means that nsrexecd is not running on the client, and that savegrp is attempting to use the rshd service instead, for backward compatibility with older versions of savegrp.


savegrp: client rcmd(3) problem for command ' command '

This error message normally accompanies another, more specific, error message. It is generated when the attempt to run the specified command (usually save or savefs with several command-line parameters) failed on the specified save set. The previous line of error output should include the more specific error message (look for that message elsewhere in this section). Generally, the problem is a bad host table configuration, or various permissions denied errors (server not specified when starting nsrexecd, or missing permissions in .rhosts if not using nsrexecd). If not, log into the NetWorker server as root and run the command
savegrp -p -v -c clientname groupname
giving the appropriate client for clientname and groupname. This verbose output should include the necessary additional information needed for fixing the problem.


Saving server index because server is not in an active group

This informational message, generated by savegrp, means that savegrp has noticed that the NetWorker server is not listed in any automatically started, enabled group. Since all of the indexes are stored on the server, savegrp is saving the server's index and bootstrap information in case a disaster occurs. You should add the server to a group with autostart enabled, or enable one of the groups of which the server is already a member.


socket: All ports in use

The NetWorker server has run out of socket descriptors. This means that you have exceeded the socket resource limit on your server. To avoid such future messages, you should determine what other network services are running while savegrp is running, and consider re-scheduling either savegrp or the other service(s). You can also reduce the parallelism in the nsr_service(4) resource, to reduce the resource utilization.


Errors Messages Generated While Recovering Data

This section lists error messages returned by the NetWorker recover program during an on–demand restore using the Enterprise Backup Utility. The messages are organized alphabetically to make them easier to match to the NetWorker message displayed.

Browsing machine 's on-line file index

An informative message that explicitly states which NetWorker client's index is being browsed for interactive recovers that resolve to another machine.


Cannot open recover session with server

This message indicates that some problem was encountered connecting to the NetWorker server on the named machine.


error, name is not on client list

This message indicates that the client invoking the recover command is not in the server's client list. See nsr_service(4) for details.


Message from server : other clones exist for failed save set

The recover command will be automatically re-submitted to the server, if any files remain to be recovered, because the request failed on a save set that had multiple clones. The server automatically picks a different clone on each attempt.


Path name is within machine : export - point

An informative message that lets you know that the given pathname is mounted from a network file server and that the recovery will use the index for the named file server. If the machine is not a NetWorker client, then the -c option may be necessary.


/ path :Permission denied

The filename cannot be recovered because you are not root or in the group operator, and you don't have read permission for the file.


/ path :Permission denied(hasacl)

The filename cannot be recovered because you are not root or in the group operator, the file has an Access Control List (ACL), and you are not the owner of the file.


Using server as server for client

An informative message that lets you know which NetWorker server was selected for client's index.


NetWorker BusinesSuite Module for Oracle Messages

During a backup or restore, NetWorker attempts to record error messages generated by the BusinesSuite Module for Oracle library to the dmo.messages file. NetWorker opens this file in one of the following locations (default locations, listed in order):

  1. The alternate messages directory created during installation, /nsr/applogs.

  2. The /tmp directory.

  3. Standard error (stderr), if dmo.messages cannot be opened or created.

See Appendix A, “Environment Variables” for descriptions of NetWorker BusinesSuite Module for Oracle variables and values you can assign to them in the nsrdmo script.

NetWorker BusinesSuite Module for Oracle error messages appear in the following format, where pid is the process ID and function is an internally-defined function name:

(pid) OBK-sbt: <function>: <error message>

In addition, BusinesSuite Module for Oracle writes extended debugging information in the file specified by the NSR_DEBUG_FILE environment variable.