This chapter provides examples and suggestions for you to consider while you are designing your NetWorker environment. It also offers background information to help you understand the logic behind the NetWorker backup schedule and index policy features.
This chapter explains
determining backup schedules
determining browse and retention policies (index policies)
determining jukebox policies
|Note: Practical instructions for setting backup schedules, and browse and retention policies are in Chapter 6, “Configuring a NetWorker Server.” Instructions for setting up NetWorker for jukeboxes are in Chapter 11, “Using NetWorker with Jukeboxes.”|
The NetWorker server backs up each client system across a network according to a backup schedule. Schedules are created in the Schedules window and assigned to individual clients in the Clients window.
Schedules can be simple or sophisticated, depending on the needs of your environment. All clients can share the same schedule, or each client can have its own unique schedule. This section discusses some of the considerations to keep in mind while determining which schedule best fits your situation and explains the default schedules provided with NetWorker.
When you design backup schedules, consider these questions:
How long do you want to keep the backed-up data?
How many versions of the data do you want to maintain?
How much data do you have to back up?
How many backup volumes do you want to use?
How much time do you have to complete the network-wide backups?
Do you want to be able to use just a few backup volumes to recover from an entire disk crash?
The typical capacity of an 8 mm tape cartridge is about 5 GB, and the maximum transfer rate is around 400 KB per second. Systems generally cannot sustain that transfer rate, but if they could, it would take more than 6 hours, for example, to back up 10 GB of data. To back up a network with a lot of data, you can use more than one schedule to stagger the full backups over several days. For example, Silicon Graphics systems can maintain a rate of 350 KB/second on an 8mm cartridge tape and 1.0 MB/second on a DLT.
Think about how many backup volumes you want to keep—this number depends on how often the data changes, and how long you want to keep the online backups.
If you run only incremental backups every night, you need more backup volumes to fully recover from a disk crash. If a site has 10 GB of data, and 5% of all the data is modified each day, that means 500 MB of data needs to be backed up every day. At 400 KB per second, 500 MB takes about 25 minutes to back up, and fills up about one tenth of an 8 mm cartridge tape. If you are maintaining the backups for three months, you will have about 12 tapes of backups to keep on your shelves (or in the autochanger).
You also need to decide on a policy for recovering files. For example, if the users expect to be able to recover any version of a lost file for at least three months, you must maintain all the backup volumes for the three-month period. On the other hand, if the users expect to be able to recover only the latest version of a lost file, you can use level 1–9 backups to decrease the quantity of backup volumes you need to maintain.
The following sections explain
NetWorker backup levels
backup time requirements
staggering the backup schedules
balancing convenience and security
using save sets
NetWorker preconfigured backup schedules
A backup schedule specifies the level of backup that NetWorker performs for a client on each day of a weekly or monthly period. NetWorker supports four backup levels:
If you do not need to maintain every version of a backed-up file online, you can use a backup scheme that includes occasional full backups followed by level 1–9 and incremental backups during the cycle. Different backup levels allow you to trade off the number of backup volumes and amount of time required to complete a backup versus the number of backup volumes and amount of time it takes to recover from a disk crash.
Figure 5-1 diagrams how backup levels work.
Assume you use a new backup volume for each day's backup. On day 1, a full backup is run. On day 2, the incremental backs up everything that has changed since the full backup. On day 3, the incremental backs up everything that has changed since day 2. On day 4, the incremental backs up everything that has changed since day 3. At this point, you have four backup volumes. To recover from a disk crash, you need all four of them: the one with the full backup (day 1), and all the volumes with incremental backups.
On day 5, the level 8 backs up everything that has changed since the full backup. You no longer need the data on the backup volumes from day 2, 3, or 4. To do a full recovery, all you need is the full backup volume and the level 8 backup volume. If you had to recover from a complete loss of a disk, you have reduced the number of backup volumes you need to only two.
On day 9, the level 7 backs up everything that has changed since the full backup. You still need only two backup volumes to recover a disk: the full backup, and the level 7.
Level 1–9 backups help you maintain control of your pool of backup volumes. Planning your backup strategy carefully should allow you to recover everything on a disk with a maximum of four backup volumes, assuming that each day's backup can fit on one volume.
The rest of this section compares various backup levels and combinations of levels:
full backups versus incremental backups
combining full, incremental, and level backups
If your site has a small number of files, you may choose to perform a full backup every day, or perhaps once a week. This schedule is simple to set up and execute, and it makes recovering from a disk crash easy—you simply need the last full backup volume.
Issues to consider are listed below:
Full backups take more time to execute than do incremental backups.
If the full backup does not fit on a single piece of media, someone has to monitor the backup and change the media (unless you have a jukebox).
Full backups cause the online indexes to grow more rapidly than do incremental or level backups.
You might decide to schedule a full backup at the beginning of the period and then schedule incremental backups the rest of the period. This schedule minimizes the amount of time that the backups take, minimizes the size of the backups, and causes the NetWorker indexes to grow at a slower rate. However, if you need to recover from a disk crash, you may need all the tapes used during the schedule, because the most current version of your files may be scattered across several different tapes. Although NetWorker asks for each tape that it needs for the recovery by name, loading and unloading them can be time-consuming (unless you have a jukebox, or all the incremental backups fit on one tape).
You can use level 1 through level 9 backups to moderate between the two extremes described in the preceding section. Level 1 through level 9 backups allow you to set up a schedule for each client, balancing the need for small, fast backups that do not take up too much index space and the need to recover quickly and easily from a disk crash.
A level backup serves as a checkpoint in your schedule because it collects into a single backup session all the files that have changed over many days, or even weeks. Without a level backup, these files would be spread across tapes from many different backup sessions. As a result, a level backup can simplify and speed file recovery.
To illustrate the effect of level 1 to level 9 backups, consider two examples. In the first example, a full backup takes place on the first day, followed by a level 9, level 8, level 7, and so on down to a level 1 backup over time. Figure 5-2 diagrams a full backup followed by level 9 to level 1.
The advantage of this schedule is that to recover from a disk crash, you only need two tapes: the one with the full backup, and the one with the last level backup. The disadvantage is that each day, there are more changed files to back up, so the backups take longer to complete.
Figure 5-3 diagrams a backup schedule that also starts out with a full backup, but the level backups that follow are in reverse order: starting with a level 1 on the first day following the full backup, on down to a level 9 backup. Each day, the backup backs up only the files that have changed on that day.
The advantage of this schedule is that each day's backup is small and completes in a short time. The disadvantage is that recovering from a disk crash requires the full backup tape and all of the level backup tapes up until the day of the disk crash.
Neither of these backup schedules is practical; they simply illustrate how level backups work. The real power of level backups comes into play when you combine multiple levels along with fulls and incrementals.
Sites with even a few gigabytes of files to back up often choose a monthly schedule based on full, incremental, and level backups. The example described in this section performs a full backup on the first day of each month, a level 5 backup on the 10th and 20th of the month, and incremental backups on all other days.
This monthly backup schedule minimizes the size of daily backups and also makes it relatively easy to recover in the event of a disk crash. This schedule offers several advantages. First, the level 5 backups simplify recovery. For example, if a disaster strikes on the 24th of the month, all the files needed to recover an entire client system are located on tapes from just five backup sessions:
incrementals from the 21st, 22nd, and 23rd
level 5 backup from the 20th
full backup at the beginning of the month
Second, the incremental backups are relatively small and quick to execute, even for large network environments, and several days of incrementals can fit onto a single tape. This situation further simplifies recovery and also avoids the need to have someone change tapes each day.
Figure 5-4 diagrams level 5 and incremental backups after a full backup.
The amount of time you have to complete a backup on any given day also influences the schedule that you decide to use. Because of flextime and around-the-world operations, many networks must be up and running for users from early in the morning until very late at night. Although NetWorker is able to back up live filesystems, most administrators want 100% of their network and system capacity ready for users during work hours.
How many files can NetWorker back up in, for example, a four-hour backup window? If your backup server is able to drive a single 8 mm tape drive at an average of 400 KB/second (its maximum speed is 500 KB/second and some time is invariably lost loading the tape or rewinding), you can back up a maximum of 5.76 GB in four hours. If you have more than this amount of data to back up, then full backups must be limited to weekends and holidays, when users are not affected.
A Silicon Graphics system with a 2 GB system disk and an 8 GB logical volume requires at least seven and a half hours for a full backup on an EXABYTE™ 8500 tape drive. For a Silicon Graphics DLT, 10 GB of data takes about 2.8 hours to back up uncompressed onto one tape.
To reduce the amount of time that backups take, follow these recommendations:
Select a backup server with enough CPU power, memory, and bus bandwidth so that the backup server is not the bottleneck. See the section “Guidelines for Choosing a Configuration” in Chapter 12 for more information on choosing a hardware configuration.
Leave the NetWorker parallelism feature turned on. This feature causes multiple client systems to send their files to the backup server in parallel. This keeps a stream of files ready for the tape drive, so that it does not start and stop.
Experiment with compressing files on the client systems to reduce the size of the data that has to be written to tape. Using the compressasm directive can reduce the space consumed on a backup volume by as much as 50% (actual savings may vary). If you use compressasm on all the files that are being backed up, a full backup of 8 GB will probably fit onto a single 5 GB backup volume. Compression may speed your backup as long as the client systems are still able to supply files to the backup server fast enough to keep the tape drive streaming.
Take advantage of the ability to skip over specified files during the backup. For example, you could choose to skip over core files and.o files. The NetWorker skip directive provides an easy way to specify that such files be skipped. (See the section “Using Directives” in Chapter 6 for more information.)
Make liberal use of incremental save levels. These are very efficient, since they take minimal backup media space and run very quickly.
Add a second backup device to your backup server. For unattended backups, a NetWorker server with two backup devices is worth more than twice as much as a NetWorker server with only one backup device. Often the NetWorker server with two backup devices is more productive than two NetWorker servers with only one device each. With IRIX NetWorker plus the TurboPak option, NetWorker can simultaneously back up to more than one device.
Using a jukebox and NetWorker Support for Jukebox option software is the most efficient way to complete unattended backups.
Unless you have a jukebox and a NetWorker Support for Jukebox option, you also have to schedule backups based on someone being available to load and unload tapes. Many administrators find that an incremental backup of their network fits onto a single 4-mm or 8-mm or DLT tape, but they must schedule multitape full and level backups for specific nights or weekends when an operator is on duty to load additional tapes. If an operator is not available over a holiday weekend, then you can set an override in the schedule to skip the backup on that day. You may also want to override the schedule just before a holiday with a full backup for added peace of mind.
The rest of this section gives details on strategies for optimizing backup time requirements:
using the compression directive compressasm
staggering the backup schedules
balancing convenience and security
using NetWorker preconfigured backup schedules
Using the directive compressasm involves significant CPU usage. If you have a large CPU, and performance monitors such as gr_osview during backup do not indicate that the CPU is overused, compressasm can compress data before it goes to the server.
compressasm typically achieves a 2:1 compression ratio, so network traffic is reduced (your results may vary). There is no harm in using compressasm in conjunction with a compressing tape drive, but the drive will probably not achieve much compression on the data at the tape drive.
If you are deciding between compressasm and a compressing drive solely to increase the amount of data on a tape, use the compressing drive, because the hardware on the drive can compress faster than NetWorker and it places no load on the CPU due to the compression.
Follow these guidelines for compressing data during backup:
Use compressasm to minimize network bandwidth if you have available CPU power.
Use compressing drives, such as the DLT 2700, to get more data on a tape.
Any generic compressing algorithm typically achieves 2:1 compression and tape drives are no exception. Sometimes you get more, sometimes less.
Compressing already compressed data has no effect, and may even expand the data.
Do not use compressasm if you have a compressing drive and no networked clients.
Networks with a large number of files can take a very long time to back up completely, and require a lot of loading and unloading of tapes. There may not even be time in a night or an entire weekend to complete a full backup of all the systems across a very large network.
An easy way to handle this problem is to stagger the clients' backup schedules. Rather than have every client system perform a full backup on Monday and incrementals the rest of the week, for example, you can schedule some clients to perform a full backup on Tuesday and others on Wednesday. In NetWorker, you can assign a separate backup schedule to each filesystem. Each filesystem, in essence, is treated as if it were a separate client.
You can leave the same backup volume mounted in the server's backup device throughout a week or month, and, when it becomes full, replace it with a new labeled backup volume. NetWorker tracks all the backups, no matter what day of the week or month it is, or what part of the backup schedule cycle is in effect. The same backup volume might contain full, level 1–9, or incremental backups; to NetWorker, it makes no difference. For you, the benefits are fewer backup volumes to manage and the ability to recover from a disk crash with a minimum number of backup volumes.
Some sites prefer to segregate the full backups from the level 1–9 and incremental ones. The full backups protect the network from a catastrophic disk loss, and you want to guarantee their integrity. There is always a very small risk that if you leave the backup volume with the full backup sitting in the backup device, something could happen to it.
If a backup volume with incremental backups is ruined, users might lose a day of work. In the worst possible case, if the backup volume with the full backup is destroyed, users could lose all the work done since the last full backup. Therefore, some administrators prefer to remove the backup volume used for a full backup and put it in a safe place, and mount another backup volume for the following level 1–9 and incremental backups. The trade-off is that you may need a few more backup volumes to recover from a disk crash: the one with the last full backup, and the other volumes that contain the most recent level 1–9 and incremental backups.
Save sets are groups of files, usually contained in a single filesystem, that have been backed up by NetWorker. Save sets are created each time a backup is started. Generating a save set creates one or more entries in both indexes.
By default, NetWorker backs up all of a client's files whenever that client is scheduled to be backed up. The save sets feature allows you to exclude certain files from backup, or to back up some of a client's files on one schedule and other files on a different schedule. This technique is particularly useful for clients with a great deal of data to back up.
NetWorker has several windows relating to save sets:
create save sets for a client in the Clients window (see “Adding a New Client” and “Scheduling Large Client Filesystems” in Chapter 6)
set browse and retention policies for save sets in the Policies window (see “Determining Browse and Retention Policies (Index Policies)” later in this chapter)
view save sets in the Instances window (see “Viewing Save Sets to Determine Resource Usage” in Chapter 8)
recover and clone save sets in the Save Set Recover and Set Save Clone windows (see Chapter 9, “Recovering and Cloning Save Sets”)
For your convenience, NetWorker is shipped with five preconfigured backup schedules: Default, Full Every Friday, Full on First Friday of Month, Full on First of Month, and Quarterly. If these schedules fit your backup requirements, you can use them “out of the box.” Alternatively, you can delete them and create new ones to accommodate site-specific needs.
This section explains the logic behind each schedule. After understanding how they work, you may want to use them as examples to set up your own schedules.
|Note: You cannot change the name of an existing schedule. For example, if you want to change the schedule “Full Every Friday” to “Full Every Monday,” you must delete the “Full Every Friday” schedule and create a “Full Every Monday” schedule. You cannot change the existing schedule to complete full backups on Mondays instead of Fridays, and then edit its name.|
The most efficient way to best protect the systems from file loss and maintain control over the number of backup volumes is to follow full backups with level 1–9 and incremental backups.
NetWorker provides a preconfigured backup schedule named “Default,” which you are not allowed to delete. It is a weekly schedule, and completes a full backup every Sunday, followed by incremental backups all other days of the week. This schedule is useful for a small to medium-sized network where the scheduled backups fit onto one backup volume or where autochangers are used.
This schedule is convenient if you want to premount the backup volume Friday night before you go home for the weekend. On Monday mornings, check your messages from NetWorker to make sure the backup completed. If you want to separate the full backups from the incrementals, remove the backup volume with the full backup and mount another one for the incremental backups. Figure 5-5 shows the default schedule.
Each time you use the Schedules window to create a new weekly backup schedule, this preconfigured schedule appears in the calendar as your starting point.
This weekly schedule completes a full backup every Friday, followed by incremental backups the other days of the week.
This schedule is identical to the Default schedule, except that instead of completing a full backup on Sundays, the full backup takes place on Fridays. Depending upon how much data changes on the network, the daily incremental backups might all fit onto one backup volume. In that case, if you had to recover from a disk crash, you would need only two backup volumes: the one with the last full backup, and the one with the incremental backups.
This monthly schedule completes a full backup on the first Friday of the month (not the first calendar day of the month). Incremental backups take place on all the other days. The advantage of this schedule is that you complete a full backup only once a month. Figure 5-6 shows this schedule.
If you use this schedule, store the backup volume with the full backup in a safe place, and use other backup volumes for the incremental backups. It would also be a good idea to change backup volumes every few days for the incremental backups. If you allow all the incremental backups to be stored on one backup volume, and it is destroyed near the end of the month, you are at risk of not being able to fully recover from a disk crash.
Whenever you create a monthly schedule for a full backup on a weekday instead of a calendar day (Friday, in this example), you must set the overrides in each month. (Notice the “f*” in the first Friday of each month.) This requirement is because the first weekday (Monday through Friday) in a month can fall on any calendar day from 1 to 7.
|Note: When the year ends, you must add the overrides for the next year. In other words, the overrides do not carry over from one year to the next.Preconfigured schedules, however, maintain the overrides for years into the future.|
This monthly schedule completes a full backup on the first calendar day of the month. Many sites prefer to begin each month with a full backup on the first day of the month. On the other days of the month, an incremental backup takes place. This schedule has the same advantages and disadvantages as the “Full on 1st Friday of Month” schedule. This schedule is easier to create because you do not have to set any overrides manually. Figure 5-7 shows this schedule.
Whenever you create a monthly schedule, this schedule is your starting point.
The quarterly schedule completes a full backup on the first day of the quarter. A level 5 backup takes place on the first day of the other months in the quarter. Every seven days, a level 7 backup takes place. The other days of the month, an incremental backup takes place. Figure 5-8 shows this schedule.
This schedule is convenient because a full backup takes place only once a quarter. On the first day of the month, a level 5 backs up everything that has changed since the first day of the quarter. Every seven days, the level 7 backup protects all the data that has changed since the first day of the month. The daily changes are protected by incremental backups.
If you use this schedule, segregate the backup volume with the full backup and store it in a safe place. The monthly level 5 backups should also be segregated onto their own backup volumes. On the other days of the week, leave one backup volume in the server, so that the level 7 and incremental backups are stored on it. However, if a week's worth of backups is on one backup volume, and it is destroyed the same day the disk crashed, you could not recover the changes that took place that week. Therefore, it would be better to change backup volumes every day, putting each day's backup on its own volume. If a daily incremental backup is destroyed and you need to recover from a disk crash, you can recover all but one day's work.
When you create a quarterly schedule like this one, use the Month period to set the level backups, then set each quarterly full backup on the calendar with an override.
To recover from a disk crash, you would need the backup volume with the full backup, the latest level 5, the latest level 7, and the incremental backups for the week.
When Worker starts a backup, it creates entries for the saved files in the online indexes
. NetWorker maintains two types of indexes: a file index and a media index. The file index stores information about files backed up by NetWorker, and the media index maps the saved files to the backup volumes. NetWorker maintains one file index per client and one media index per NetWorker server. NetWorker uses the indexes as databases to locate the files that are marked for recovery.
Each entry in the file index typically includes this information for a backed-up file: filename, number of blocks, access permissions, number of links, owner, group, size, last modified time, and backup time. The file index changes with each backup, as entries for the newly backed up files are inserted. As long as an entry for a file remains in the file index and the backup volume is not damaged, you can recover the file using the NetWorker Recover window.
The media index maps each file to the backup volume or volumes where it is stored. NetWorker uses the media index to tell you or the jukebox which backup volume to mount during a recover. The media index is usually much smaller than the file index because each volume contains many saved files. The size of an index is proportional to the number of entries it contains. NetWorker determines which volume to mount for recovering a file by mapping the saved files to their backup volumes.
In the NetWorker Policies, Indexes, and Volume Management windows, you create policies for automatic index management, monitor the contents of the indexes, select entries for removal, and mark backup volumes as recyclable. NetWorker uses browse and retention policies to manage and reduce the size of the online indexes:
The browse policy determines how long entries for your files remain in the online file index and thus browsable in the Recover window. Entries older than the browse policy are automatically removed from the online file index.
The retention policy determines how long entries are retained in the media index and thus are recoverable. Entries older than the retention policy are marked as recyclable in the media index.
NetWorker performs four actions on both kinds of indexes: inserting entries, browsing, removing entries, and reclaiming space.
Inserting entries in the file index occurs during a backup. If the index has no free space, NetWorker acquires more space from the filesystem to hold the new entries.
Browsing is looking through the index for information about your saved files or the contents of your backup volumes; thus, it neither increases nor decreases the size of an index.
The file index is browsed when you use the NetWorker Recover window to locate a file, or when you use the Indexes window to browse the save sets that contain the files you see in the Recover window.
The media index is browsed when you use the Volume Management window to view the save sets on the backup volume.
Removing entries frees space in the index. NetWorker uses the free space to insert new entries. The browse and retention policies determine when entries should be removed from the index. You can also remove entries manually by selecting “Remove oldest cycle” in the Indexes window or by selecting “Remove” from the Volume pull-down menu in the Volume Management window.
Reclaiming space returns empty space to the filesystem. The empty space is created when entries are removed from the index; you remove it by using the Reclaim space button in the Indexes window.
|Note: “Monitoring and Managing Index Disk Space Usage” in Chapter 8 gives detailed instructions for removing index entries and reclaiming space.|
This section explains NetWorker browse and retention policies and the trade-off between providing faster, easier recovery for users and conserving disk space. Topics include:
how browse policies work
reclaiming disk space
recovering files removed from the index
how media retention policies work
NetWorker preconfigured browse and media retention policies
One popular NetWorker feature is its index, which allows a user to browse the many versions of a file that have been backed up over time and to choose which one to recover. However, each version of a file that NetWorker tracks takes up space in the client's online index (about 200 bytes per version). Since disk space is limited, you need to establish a policy of how far back in time you keep information about backed-up files in the indexes.
The browse policy that you select specifies how long the entries for your files remain in the file indexes. A browse policy can be any number of days, weeks, months, or years. NetWorker automatically deletes entries older than the browse policy time and frees up disk space. The browse policy you select, like the backup schedule, can be different for each client.
To recover a complete directory or filesystem, you often need to recover some files from incremental and level backups as well as from a full. The incremental backup is dependent on the level backups and, in turn, on the full. NetWorker does not delete the entries from any backups on which other backups depend. As a result, you may find that entries are deleted later than you might expect.
In Figure 5-9, the browse policy is set to one week, which happens to equal one complete backup cycle.
NetWorker does not remove the first full backup from the online file index until all the incremental and level 5 backups that depend on it have expired. As a result, the full backup actually stays in the online index for a period of time equal to the browse policy plus one full backup cycle.
The first full backup is not removed from the online index in exactly one week, however, because there are unexpired incrementals and a level 5 backup that depend on the full backup. Each incremental backup is removed from the online index one week after it was completed. The level 5 backup is removed one week after the last incremental that depends on it is removed, and then the full backup is removed at that same time.
The rule to remember is that a full backup actually remains in the online index for a period of time equal to the browse policy plus one complete backup cycle. A backup cycle is measured from one full backup to the next full backup. Also note that the browse policy is set for an entire client (or filesystem, if the filesystems are separately scheduled). Consequently, whatever policy you have for keeping full backups online and browsable in the file index you must also use for all incremental and level backups. With NetWorker you manage backup cycles (the period from one full backup to the next); you do not independently manage different levels of backups.
Recovery is considerably easier if the file information is still in the NetWorker online index. That is why you want to set a browse policy that is long enough to cover most recovery requests.
NetWorker automatically reclaims disk space that is freed up when entries are deleted from the online file indexes. However, the space is not returned immediately to your system. Because reclaiming this space requires time, processing power, and swap space, having this process constantly occurring on your backup server would be inefficient. Instead, NetWorker first reuses newly freed space to store information about new files that are backed up. When less than 50% of file index space for a client is being used by files that have not reached the end of their browse period, NetWorker automatically invokes a process that returns the space to your system.
You can also reclaim disk space at any time by using the Reclaim space button in the Indexes window. See “Monitoring and Managing Index Disk Space Usage” in Chapter 8 for instructions on space-saving operations.
You can recover files whose entries have been removed from the online index because they have passed the Browse policy period as long as the files are still stored on a backup volume. The recover process is not as convenient as when the entries are still in the online index, however.
If you do not want to rebuild the index, and the save sets you need are still in the media index, and you know which save set contains the file you want, you can use the save set recover feature to recover the entire save set or selected directories and files. The save set recover feature is most useful for recovering from full backups, and is limited to root and users belonging to the group operator. See “Recovering Files Removed From the Index” in Chapter 8 for instructions.
|Tip: Recovery is considerably easier if the file information is still in NetWorker's online index. Set a browse policy that is long enough to cover most recovery requests.|
Your need to conserve disk space may lead you to establish a short browse period. The NetWorker media retention policy complements the browse policy by letting you specify a longer period of time during which files can still be recovered, although with more difficulty. The retention policy is also used by NetWorker to automatically recycle backup volumes.
As mentioned earlier in this chapter, NetWorker maintains a file index for each client system and a much smaller media index that tracks which save sets are stored on each backup volume. When NetWorker removes entries that are older than the specified browse time from a file index, it leaves the corresponding save set information in the media index. The retention policy controls how long this information is kept and, as a result, how long a backup volume is kept before it can be overwritten with new backups.
As with the backup schedule and browse policy, you set the retention policy for each NetWorker client. Different clients can have different policies. The retention period can be any number of days, weeks, months, or years, as long as the retention period is equal to or longer than the browse policy.
A NetWorker backup volume can contain save sets for many different clients over many days. As the retention period is reached for each save set, information about that save set is removed from the media index. When the retention period for every save set on a backup volume is reached, NetWorker marks the volume “recyclable.” This volume can then be reused for backups. At the time that the volume is actually reused, the old files are overwritten and can no longer be recovered.
NetWorker browse and retention policies combine to give you a hierarchy of recovery capability while keeping the disk space needed for the online indexes to a minimum. Recovering a file is quick and easy using the Recover window up until the browse policy time is reached and the file information is removed from the file index. Then you can use the more tedious process described to recover your files until the retention policy time is reached and the backup volume is recycled.
NetWorker is shipped with five preconfigured browse and retention policies: Week, Month, Quarter, Year, and Decade. Use these policies to choose the length of time to retain the entries in both the file index and media index. Remember, the retention policy you select affects the size of the media index and controls the length of time NetWorker tracks the backup volumes and the data on each volume.
The browse policy affects the size of the file index and the length of time that NetWorker retains entries for every file that is backed up and visible in the Recover window. You must always choose a retention policy that is greater than or equal to the browse policy.
For example, if you choose Quarter for the retention policy for a client, and Month as the browse policy, the client can browse all the file entries for backed-up files dating back to a month. Each month the oldest entries for the client's files are automatically removed from the server's file index. However, the backup tapes that contain the data for the files are still tracked by NetWorker in the media index.
This policy maintains the file index entries or the media index entries for one week. If you use this as a browse policy, the users can only view and mark for recovery files that go back in time for a week. As a browse policy, it is useful when you have a limited amount of disk space and users do not expect to be able to recover versions of their data that are older than a week.
As a retention policy, Week means that your backup volumes turn over quickly, and NetWorker recycles through the tapes at a faster rate. Use this policy if you schedule weekly full backups, and need to keep backup data for only one backup cycle plus a week.
This browse policy allows users to view and recover versions of files dating back at least a month. The Recover window displays versions for files backed up for one full month plus a number of weeks. As a retention policy, NetWorker maintains and tracks the backup volumes for one month plus one full backup cycle.
Use this policy if you need to keep backed-up data longer than a month. With this browse policy, the client can view and recover files for at least three months into the past. The retention policy tracks the backup volumes for at least three months plus one full backup cycle.
If you need to keep backed-up data online for several months, use the Year policy. For example, if your company requires ready access to information going back for at least three quarters, Year is a good browse and retention policy. Realize, however, that NetWorker requires more disk space to maintain all the information online.
This policy retains the entries in the server's indexes for ten years. It is useful for organizations that are required to keep records for very long periods.
Your NetWorker server requires lots of disk space for the online indexes if you choose Decade for your browse policy. Depending upon how much data you are backing up, ten years of file index entries could take up many gigabytes of disk space. In this case, it would make more sense to use Decade as the retention policy and use Quarter or Year as the browse policy. NetWorker can then track the backup volumes and the data on each one. You would always be able to retrieve data from an old backup volume if you needed to do so. NetWorker would still require disk space to maintain the media index, but it would be a much smaller amount of space using the Quarter or Year browse policies.
|Note: The limit for each client's file index is 2 GB.|
The NetWorker Autochanger Software Module automates your backup and recover activity. The capacity of the jukebox, the backup schedule you select, and the browse and retention policies you use determine whether you can walk away from backups for a week, a month, or even longer.
This section discusses
jukebox capacity for one backup cycle
jukebox capacity for more than one backup cycle
choosing a jukebox
A jukebox is most useful if it has at least enough capacity to complete one entire backup cycle without intervention. Such a jukebox allows backups to run while you are out ill, on vacation, or busy with a user emergency. It also helps minimize the time that you spend on backup (particularly if the backup server and jukebox are located some distance away). At the end of the cycle, you can move the used backup volumes off site and load fresh tapes into the jukebox.
A jukebox with the capacity for one entire backup cycle also speeds file recovery. If a user accidentally deletes a file, there is at least one version (more if the user has recently edited the file) in the jukebox. With NetWorker, the user can quickly identify the lost file and initiate the recovery. The jukebox loads the needed tape and NetWorker completes the recovery without your help. Depending on the speed of the jukebox, the file should be recovered very quickly.
To design a schedule that fits the capacity of your jukebox, start with your ideal schedule and then consider these suggestions to reduce the size of your complete backup cycle:
Use more incremental backups and fewer level 1–9 backups.
Back up systems with less critical files less often—perhaps only once a week.
Use NetWorker directives to skip files during the backup, for example, core files. See “Using Directives” in Chapter 6 for information.
Shorten the length of the backup cycle.
Although your jukebox may only have enough capacity for one backup cycle, you can still set the browse and retention policies for a longer period. If a user tries to recover a file stored on a volume that is not in the jukebox, NetWorker prompts you to load that volume. You can use the Location field in the Volumes window to keep track of volumes. Users can refer to this information when deciding which version of a file to recover and choose the one stored on a tape that is located in the jukebox.
If the jukebox has just enough capacity for a single backup cycle, you must reload tapes at the end of each cycle. If it has more capacity, you can set the schedule and the browse and recover policies so that the jukebox runs unattended for a long period of time. To continue backups virtually indefinitely, the jukebox automatically recycles tapes that contain save sets that have passed their browse and retention times.
Suppose you have established a backup schedule for your network of systems that takes one week to complete (for example, you schedule a full backup once a week) and consumes a total of 12 GB of tape during the week. Assume that you are using a 50 GB EXB-10i jukebox. Each of these combinations of browse and retention times allows the jukebox to operate without intervention for an extended period:
browse policy = 1 week, retention policy = 1 week
browse policy = 1 week, retention policy = 2 weeks
browse policy = 2 weeks, retention policy = 2 weeks
Each of these sets of policies has its advantages. With a browse and retention policy of just one week, your online indexes are kept small. With a browse and retention policy of two weeks, your indexes are larger but users have more versions to select from when they need to recover a file. A browse policy of one week and a retention policy of two weeks keeps your indexes small and does allow you to recover older files, although with a great deal more effort than if those files were still browsable in the index.
If you set the browse policy to four weeks, 4 * 12 GB = 48 GB fits in the jukebox. First, a full backup actually remains in the online index for a period of time equal to the browse policy plus one complete backup cycle. Thus with a browse policy of four weeks, essentially five weeks of backups would need to fit into the jukebox.
Second, since NetWorker cannot recycle a tape until all the save sets on that tape have expired, there is often some amount of “unavailable” tape in the jukebox.
Now suppose that one year later the number of files that you have has grown so that the one week backup cycle needs 18 GB of tape capacity. A browse policy of one week and a retention policy of one week still allows the jukebox to run unattended on an ongoing basis.
If you want to keep files online in the jukebox longer, you can use the methods listed earlier to reduce the size of the backup cycle. Alternately, you can stretch out the backup cycle. For example, you can perform full backups every other week rather than every week. This system should not greatly increase the size of a backup cycle, and gives you more versions of files online in the jukebox.
In the ideal situation you first design the best backup schedule and set of policies for your environment and then determine the jukebox size that you need to purchase.
Assume that you have a network of systems with a total of 25 GB of files to back up and that you have selected a schedule that includes a full backup at the beginning of each month, a level 5 on the 10th and 20th of each month, and incrementals on all other days. Table 5-1 illustrates that one complete backup cycle will be about 64 GB in size.
* 1 time /month
* 2 times/ month
* 27 times/month
To determine the size of jukebox that you need, start by estimating the size of a complete backup cycle. Now assume that you have decided on a browse policy of two months for all the client systems and a retention policy of six months. These policies let your users quickly recover any file, and any version of a file that they had during the past two months. With some effort you can recover for them files that they had any time during the past six months. Thus you will need six months times 64 GB, which yields 384 GB of capacity.
In practice you will need a little extra jukebox capacity, because there will be a number of “unavailable” volumes as NetWorker must wait to recycle a tape until after all the save sets on that tape have expired.
Finally, remember to plan for growth in the number of your files. While sites differ in the rate at which their files grow, a rule of thumb is that you should purchase a jukebox with about 50% more capacity than your current requirement.