You are here:
Geodatabases and ArcSDE
>
Administering ArcSDE geodatabases
>
Maintaining a geodatabase
This topic contains the following information:
Numerous occurrences could lead to data loss: data or database elements could be accidentally deleted; data could become corrupted by the addition of bad data; hardware, such as a disk or server, could fail; or disasters, such as flooding, could destroy your server and storage media.
Since much time, effort, and money are usually invested in an organization's data, it is unlikely that the loss of it would be a trivial thing. For this reason, it is critical that you have a tested recovery plan in place for your geodatabase. Notice the plan should be tested before it is implemented—you can back up all the data you want, but if you can't recover it, it is useless.
Backup and recovery strategy needs vary in accordance with your specific situation. For ArcSDE Personal and Workgroup geodatabases, only simple backup and recovery are performed. A simple backup is a full backup. Since ArcSDE Personal and Workgroup geodatabases are comparatively small and accessed by fewer users than ArcSDE Enterprise geodatabases, it doesn't take as long to create full backup files, and they can be done more frequently. To learn more about this type of recovery model, see
Simple backup and recovery.
For ArcSDE Enterprise geodatabases, the type of backups you use, where the backups should be stored, when the backups should be performed, and when and how restoration can be done can be affected by the following, interrelated factors:
- How often the data changes
The more frequently the geodatabase and its contents are edited, the more frequently backups should be performed.
- How important the data is to the organization
Is the data mission critical? If so, recovery time and currency of the recovered data are important. Is retention of the data of legal importance? If so, you should consider storing backups off-site.
- How much time is acceptable for recovery
Certain data might be needed right away, whereas the need for other data isn't as pressing. If there isn't much data in the database, compare how much time it would take to perform a database recovery versus manually reentering data.
- How much downtime can be tolerated
This affects whether or not you can take the database offline to perform backups or recover the data. If the data must be available 24 hours a day, be sure to schedule backups to occur during off-peak hours.
- How big the database is
This affects storage space and location as well as the amount of time it takes to back up and recover the database.
- What the system resources are with which you have to work
Is there ample storage space—both virtual and physical—for backups? Could you possibly set up a mirror or copy of your database? Is the network able to handle a backup or restore procedure taking place while users are still connected to the database? Would it make sense to have off-site consultants provide your database backup, storage, and recovery management?
- The type of database management system (DBMS) you're using
Each type of DBMS has different options for backup and recovery. For the basic concepts behind these options, see "Types of backups" below.
Most DBMSs have their own administrative utility to perform backup and recovery management, but there are also many third-party software products available.
Tip
- It is very important that you read the backup and recovery documentation available for your DBMS. They are
DB2—Data Recovery and High Availability Guide and Reference
DB2 Information Center
Informix—Backup and Restore Guide
Online version
Oracle—Backup and Recovery Guide
Oracle online documentation list
SQL Server—Consult the SQL Server online documentation installed with SQL Server.
Top of page
Types of backups
The steps taken to back up and recover data vary by DBMS, as do the names used for the types of backup strategies available. Basically, though, backups include complete copies of the database and copies of change logs. For enterprise ArcSDE geodatabases, you will normally use a combination of these.
Full backups are copies of the entire database. They can take a good deal of time, server resources, and storage space to accomplish depending on database size. For this reason, it is usually impractical to do full backups more than once a day. Full backups are commonly scheduled to occur during off-peak hours (usually at night) to minimize resource contention. This can be automated through scripting, so you don't need to be present to perform the backup.
If your database changes very little, you may only need to create a full backup every few days or once a week. For most sites, though, it is best to perform a full backup at least once a day.
Recovery of your database after corruption or loss usually starts with the restoration of the full backup copy, so it is important that you have a full backup.
Changes will most likely occur in your database between full backups. All DBMSs have logs that keep track of database changes and you can use these, along with your last full database backup, to restore the database. Making a backup of these logs takes less time and has less of an impact on server processes because they only contain the changes that have occurred in the database since the last full backup or, for some DBMSs, since the last log file or differential backup. Therefore, you can do log backups throughout the day. However, doing a recovery with a number of different files takes more time than just restoring a full copy of the database. Your DBMS documentation can provide guidelines on how to size your log files during database setup to optimize their use.
Obviously, backup files should be stored in a different location than your production database. At the very least, they should be backed up to a different drive. To be safer, backup files should be stored on a separate server or other media such as tape.
Once you have decided upon your backup strategy, the commands necessary to create a backup may often be scripted.
ESRI recommends using a script for backup purposes whenever possible. Executing the backup commands from a Unix shell script or a Windows batch file will help ensure that the database is being backed up consistently and that all intended steps are being followed. Be sure to update the script whenever the physical structure of the database changes.
Top of page
The following is some DBMS-specific information on backup strategies.
SQL Server backups
When you perform a backup, you back up data, changes, or transaction logs. The difference between the three depends on how much you are willing to lose in the event of system failure.
Database, file group, and file backups back up the entire entity. They are the foundation for any backup and restore strategies. Differential backups back up only changes made to the data since the last full database, file group, or file backup. Transaction log backups back up only the transaction log.
Differential backups record changes made to a database, file group, or file since the last full database backup. Changes are identified through the differential changed map, which represents all changed extents in a database. If an extent's value is 1 within the map, that extent is backed up. During the next full backup, any changed values within the map are set back to 0.
Transaction log backups back up the transaction log and control the log's size. Transaction log backups are only useful in full or bulk-logged recovery models.
During a transaction log backup, the entire log is first backed up. All committed or rolled back transactions that precede the oldest active transaction (MinLSN) and reside in a previous virtual log are truncated from the transaction log. This controls the size and growth of the transaction log. In simple recovery model, in which the transaction log cannot be used for recovery, backup is used to truncate the log.
ArcSDE single spatial databases can be backed up and recovered as any other SQL Server database. If you have deployed a multidatabase model (one sde database holds metadata for all spatial databases) the process becomes more complex. In this configuration, all spatial databases depend on the sde database, so you must back up all databases as one.
You can automate your backups using the SQL Server Enterprise Manager or Management Studio. Backups can be automated through SQL Server jobs or the SQL Server maintenance wizard.
Top of page
Oracle backups
The types of backups you can perform for an Oracle database include the following:
-
Hot backup
Backing up an Oracle database while the database instance is running is called a hot backup. If you plan to take a hot backup, you must operate your database in ARCHIVELOG mode.
Enter the ALTER TABLESPACE … BEGIN BACKUP command prior to the backup of each tablespace; this tells Oracle that a hot backup is being taken. If this command is not issued, the hot backup will appear to succeed but it may be useless for restoring the database. To finish the hot backup, for each tablespace, enter the ALTER TABLESPACE … END BACKUP command.
Changes to data are recorded and held in the rollback segment until they are no longer needed by any outstanding transaction. Taking a hot backup prevents the release of rollback segment data until the ALTER TABLESPACE … END BACKUP command is issued. Therefore, the rollback segment must be large enough to accommodate changes made during the hot backup. If the rollback segment runs out of space, a transaction will fail with an ORA-1555 error:
ORA-1555: snapshot too old (rollback segment too small)
While the hot backup will succeed despite this error, changes made to the database may need to be reentered.
You may avoid this error by taking a hot backup during times of low database activity or by ensuring that your rollback segments are large enough to accommodate data changes made during backup.
You do not need to shut down the ArcSDE server process (giomgr) prior to making a hot backup.
For details on hot backup, refer to the
Oracle Backup and Recovery Guide for your Oracle release.
Cold backup
Creating a backup of an Oracle database while the database instance is shut down is called a cold backup. Managing a cold backup is simpler than a hot backup and less prone to error.
If you run the database in NOARCHIVELOG mode, a cold backup is your only option. Running the database in ARCHIVELOG mode will enable you to use a cold backup to recover a database to the latest committed transaction.
Database export
You can use Oracle's export utility to supplement a full backup. If changes are made to a known set of data objects between full backups, you can export the objects.
However, the export utility should only be used on data objects that are not changing during the export and on all data objects that are closely related. For example, if you use export to back up a business table, you should also capture the related spatial index, feature, and delta tables in the same backup.
ESRI does not recommend that you use the export utility as your only backup method.
You can also back up the entire Oracle database with the Oracle export utility, then make cumulative and incremental backups.
For more information on the export utility, refer to the Oracle Utilities manual for your Oracle release.
You should make regular full backups of the Oracle database. A full backup should include the Oracle database, control files, datafiles, redo log files, and, if you have started an ArcSDE service, the giomgr.defs, dbinit.sde, and services.sde files.
If your Oracle database is operating under ARCHIVELOG mode, you have several variations that you may add to your backup strategy in addition to the periodic full backup. Databases operating under the NOARCHIVELOG mode are restricted to full backups with the possible addition of Oracle export files.
You should make at least three copies of the control files with each backup because of their importance in ensuring database consistency. Because control files are comparatively small, there is negligible cost in doing so.
The redo log files are essential for bringing the data files from an earlier state to a later state. Between any two points in time, the redo logs must be found in an unbroken sequence if database recovery is to succeed.
ESRI recommends that you maintain at least two copies of all archived redo logs for as far back in time as may be reasonably necessary for database recovery. The two copies should be stored on physically distinct media-separate disk drives, for example, or a disk drive and a tape drive.
If you intend to purge the archived redo log files from their location on disk, be sure that you have a second backup copy of each archived redo log file before purging.
This strategy of multiple backups of the archived redo log files helps guard against multiple media failure, which is not as rare as it might seem. Some tape drives, for example, fail to detect bit errors until you attempt to restore a file, when it may be too late.
You only need to make a single copy of each data file with each backup as long as you carefully maintain multiple copies of archived redo logs.
NOTE: The online redo log files are not necessary for backup. If the current online redo log file fails, committed information still exists in memory, which Oracle writes to the data files when a checkpoint is issued. Oracle issues a checkpoint automatically when you shut down a database instance with NORMAL, IMMEDIATE, or TRANSACTIONAL priority. Before shutting down a database with ABORT priority, you should force a checkpoint with the ALTER SYSTEM … CHECKPOINT command if at all possible.
Top of page
Informix backups
The following command can be used to back up an ArcSDE geodatabase in Informix:
dbexport sde –c
where sde is the name of the ArcSDE geodatabase.
For more information on different Informix database backup strategies, refer to the
IBM Informix Backup and Restore Guide.
Regardless of which backup and restore mode you are using, you should make regular, full backups of your Informix databases. A full backup should include the Informix database and the giomgr.defs, dbinit.sde, and services.sde files. You should also back up any dbtune files you have created and imported into the DBTUNE table.
Top of page
DB2 backups
You can make a backup of an entire database or of just certain tablespaces. Doing so creates a copy of the database or tablespace on separate media, such as tape or another server. Backups can be made while the database is active or inactive, depending on the type of recovery logs you use.
-
Online backup
Performing a backup of your DB2 database while the database instance is running is called an online backup. If you plan to make an online backup, you must use archive logs. To do this, you must enable the Roll-forward recovery option. After you have enabled the database for Roll-forward recovery, you must restart the database. You must also create at least one full offline backup (see the next section) after Roll-forward is enabled but before you perform an online backup.
Since the database is running, users can be connected to the database while the backup operation is being performed. Any changes made to the database during the backup operation are recorded in the logs.
If you are using an ArcSDE service, you do not need to shut down the ArcSDE server process (giomgr) prior to making an online backup.
For details on online backups, refer to the documentation for your DB2 release.
-
Offline backup
Making a backup of a DB2 database while the database instance is shut down is called an offline backup. Managing an offline backup is simpler than an online backup and less prone to error because with no users connected to the database, no changes can be made while the backup is taking place.
Before you take the database offline, you need to stop your ArcSDE service and any other service that accesses your database.
If you use circular logging (your database is nonrecoverable), an offline backup is your only option. You can use an offline or online backup if you use archive logging.
You should make regular full backups of the DB2 database. A full backup should include the DB2 database and, if you have started an ArcSDE service, the giomgr.defs, dbinit.sde, and services.sde files.
-
Incremental backups
As the size of the database grows, the amount of time it takes to complete a full database backup operation increases. To reduce the number of times you have to perform a full database backup, you can create one full backup image and several incremental backups. An incremental backup only contains pages that have been updated since the previous backup image was made plus all the metadata of the initial database.
DB2 uses two types of incremental backup images: cumulative and delta.
A cumulative backup image is a copy of all database data that has changed since the most recent, successful, full backup operation. A cumulative backup image contains a series of incremental backups taken over a period of time; therefore, the previous incremental backup image is contained in it.
A delta backup image is a copy of all database data that has changed since the last successful backup of any kind-full, cumulative, or delta. A delta backup image is also known as a differential, or noncumulative, backup image.
The BACKUP DATABASE command is used to create a backup image. It only affects the database partition on which it is executed. To execute BACKUP DATABASE, you must have either sysadm, sysctr, or sysmaint authorization in the database.
When you use the BACKUP DATABASE command, you can specify to which directory the backup image will be saved. This is called the target location. Possible target locations include directories, devices, or other servers. If you do not specify a target location in the command, the backup image is saved in the directory from which the command was issued.
The database for which the backup operation is to be performed can be local or on a remote server. The backup image will be saved on the database's server unless you are using a third-party storage management application.
When the BACKUP DATABASE command begins executing, a connection to the database is made that is specifically for the backup operation. If you were already connected to the database, that connection will be dropped before the backup operation connection is created. Once the backup is completed, the backup operation connection is disconnected.
If the backup operation is successful, you will receive a message to that effect that includes a timestamp for the backup image. This timestamp uniquely identifies each backup image.
To back up your ArcSDE DB2 database, you must copy the following files to an offline location:
- Data files
- giomgr.defs, dbinit.sde, and services.sde files
- Archive recovery log files
The database recovery log files are important for bringing the data files from an earlier state to a later state. Between any two points in time, if you want to use the recommended Roll-forward recovery option, the recovery logs must be found in an unbroken sequence for database recovery is to succeed.
ESRI recommends that you maintain at least two copies of all archive recovery logs for as far back in time as may be reasonably necessary for database recovery. The two copies should be stored on physically distinct media-separate disk drives, for example, or a disk drive and a tape drive. You could use log mirroring to maintain separate copies of the recovery logs.
If you intend to purge the archive recovery log files from their location on disk, be sure that you have a second backup copy of each archive recovery log file before purging.
This strategy of multiple backups of the archive recovery log files helps guard against multiple media failure, which is not as rare as it might seem. Some tape drives, for example, fail to detect bit errors until you attempt to restore a file, when it may be too late.
You only need to make a single copy of each data file with each backup as long as you carefully maintain multiple copies of archive recovery logs.
You can use the Configure Automatic Maintenance wizard in the DB2 Control Center to set the type of maintenance performed, such as backups, and when to run them. Using the wizard, specify your maintenance goals and at what times maintenance can occur. DB2 uses this information to determine if and when maintenance activities are required and runs them during the next time period in which maintenance is allowed.
In addition to setting the type of maintenance and times, you can set up notification email to send error or trouble messages to recipients you specify.
NOTE: You can perform manual backup operations even when your database is configured to use automatic maintenance.
You can also use the Backup wizard in the DB2 Control Center to create backups of database objects, partitions, or the entire database.
Top of page
Recovery models
No matter what DBMS you use, you should create a recovery plan and document the following:
- All DBMS and ArcSDE installations, their patch levels, operating system (OS) service packs, server names, database names, collations (SQL Server)
- What databases are backed up, their recovery models, and their backup schedules for each DBMS instance (include explanations for the recovery model and backup schedule).
- Who performs the backups and who can serve as a point of contact for backup and restore questions.
- Where the backups will be stored and how long the backup media will be retained
- Creation of a backup schedule
This information will help you more quickly restore data and should be part of your overall backup and recovery plan.
Each DBMS offers different strategies or models for you to use to recover your data in the event of corruption or media failure. Some of these strategies are described in the DBMS-specific sections below. For complete instructions on database recovery, consult the documentation provided with your DBMS software.
Top of page
SQL Server
For SQL Server, the recovery model is a database setting, changed either through the Enterprise Manager, Management Studio, or the ALTER DATABASE statement. There are three recovery models for SQL Server: full, bulk-logged, and simple. Each is summarized below.
NOTE: For ArcSDE Workgroup and Personal geodatabases stored in SQL Server Express, you will not change the recovery model.
-
Full recovery model
Full recovery allows point-in-time (or point-of-failure) recovery, provided you have the transaction log backups at that point in time. Point-in-time recovery signifies the ability to recover a database to a specific time, restoring all committed transactions and rolling back all incomplete transactions.
Recovery to a point of failure or point in time involves
- Backing up the currently active transaction log
- Restoring the most recent full database backup without recovery
- Restoring the most recent differential database backup without recovery
- Restoring in sequence any transaction log backups without recovery
- Restoring the last transaction log backup with recovery
-
Bulk-logged recovery model
Databases using the bulk-logged recovery model minimally log bulk operations to the Microsoft SQL Server transaction log. These operations include CREATE INDEX, SELECT … INTO, writetext, updatetext, and BULK INSERT. The transaction log does not record sufficient information to recover these changes if media failure occurs after a bulk operation. You can recover a database to the point of failure, but your data may not be consistent if it was changed by a bulk operation. The process of restoration is the same as in full database recovery.
-
Simple recovery model
The simple recovery model does not use the transaction log for recovery. If you use the simple recovery model, you can only restore full database backups. The advantage to using the simple recovery model is that there is less transaction log management. This topic is covered more fully in Simple backup and recovery.
Top of page
Oracle
Oracle manages changes to its contents and structure in such a manner as to guarantee recovery of the database to the last committed transaction after any single point of failure.
To the last committed transaction means that once control is returned to the user after executing a COMMIT statement, Oracle guarantees the committed data has been written to disk in some form and is recoverable.
Single point of failure means that any single file or process can fail without losing the contents of any committed transaction. If a data file is lost or corrupted, the contents of the redo logs guarantee that the data is recoverable. If a control file breaks, the other control files ensure that the information remains safe. A process may be killed, but committed data is never lost.
To recover a database after any failure, Oracle takes these steps:
- Reads the init.ora file to determine the names and locations of the control files
- Reads the control files to verify their consistency with each other and to determine the physical file structure of the database
- Opens each data file mentioned in the control file to determine whether that data file is current and reflects the latest committed change or is in need of recovery
- Opens each redo log file in sequence and applies the information found there to each data file, as necessary, to bring each data file to the state where it contains all its committed transactions
If the database has lost a control file, the database is recovered by replacing the lost control file with a copy of a current control file.
If the database has lost one or more data files, the database is recovered by first replacing the lost data file or data files with backup copies, then using the redo logs (online or archived) to make the restored copies current. If the backup copies are restored to different locations from the original files they are intended to replace, you must use the ALTER DATABASE RENAME FILE command to tell the Oracle instance where the restored files are to be found.
If the database has lost the current online redo log, the database instance will halt when it attempts to commit more transactions. No data will have been lost, but the latest transaction will not be committed and may need to be reentered when the database comes back up. However, the current online redo log file will have to be replaced and a backup of the database should immediately be performed.
If the database has lost any archived redo logs, the database instance will continue to function, because it will have no knowledge of the loss. However, the ability to recover the database in the event of a second media failure or file loss may be compromised. A fresh backup should be taken if the archived redo logs are lost.
Top of page
Informix
Informix can use two types of recovery systems: the ON-Archive system and the ontape utility. (You can also use the ON-Bar utility to perform backup and restore operations. Please see the recommended Informix documentation,
IBM Informix Backup and Restore Guide, for differences between these backup and recovery systems and their usage.)
Recovery of an Informix Dynamic Server re-creates data, particularly database spaces, from an archive and backed-up logical log files.
An archive is a copy of either all or some portion of the data that Informix manages. More precisely, an archive is a copy of one or more dbspaces (database spaces) and any supporting data that you might need to restore them.
You create an archive of Informix Dynamic Server data on tape or disk that, ideally, you store in a safe location that is separate from your computer facility.
A logical log backup is a copy on tape or disk of logical log files that you have made full and eligible for backup. The logical logs files store a record of Informix Dynamic Server activity that occurs between archives.
You must restore data in two operations. The first operation is a physical restore and the second, which must follow the first, is a logical restore.
For the recovery of an Informix database, refer to the
Archive and Backup for Informix Dynamic Server 2000 and
Informix Backup and Restore Guide.
The following command can be used to recover your ArcSDE geodatabase:
dbimport sde –c –d sdedbs –l buffered
where sde is the name of the ArcSDE geodatabase and sdedbs is the name of the sbspace being used.
Once the Informix database has been recovered, if necessary, restore the ArcSDE installation from the ArcSDE media and the dbtune, giomgr.defs, dbinit.sde, and services.sde files from your backup tapes.
Top of page
DB2
The types of recovery you can use with geodatabases in DB2 are version recovery, which uses circular logging, roll-forward recovery, which uses archive logging, or High availability disaster recovery (HADR).
- Version recovery and circular logging
Circular logging uses a ring of log files to record changes to the database. When the last log in the ring is filled, the first file is reused. Since transaction information can be overwritten, when you use circular logging, rollforward recovery is not possible; however, recovery is generally not needed during data loads. Circular logging is the default logging model.
- Roll-forward recovery and archive logging
If you want to use roll-forward recovery, the recommended recovery method for use with ArcSDE geodatabases, use archive logging. Archive logging does not overwrite log files; rather, it creates additional logs to record all transactions since the last backup. Note that you will need more disk space to use archive logging since logs are not reused as in circular logging.
- HADR
High-availabiltiy disaster recovery (HADR) is a database replication feature available in DB2 that protects your data by replicating data changes from a source (also called primary) database to a target database (also called a standby database). If the primary database goes down, you can switch to the standby database while you recover the primary database.
You can specify one of three synchronization modes for HADR: synchronous, near synchronous, or asynchronous.
In synchronous mode, a change made to the primary database will not be committed to the primary database until the associated logs have been written to the transaction logs on the standby database. This keeps the two databases synchronized because no change goes into the primary database that isn't saved in a log file (and is, therefore, recoverable) on the standby database. This mode has the greatest impact on the primary database's performance.
For near synchronous mode, a change made to the primary database will not be committed to the primary database until the associated logs have been written to memory on the standby database. This is almost as effective as synchronous mode and has less of an impact on the primary database. However, if the standby database fails after the data was committed to the primary, the transferred data could be lost and the two databases could be rendered inconsistent.
Database transaction commits on the primary database are not affected by the communication of log records to the standby database. In this mode, there is more of a chance for data between the two databases to become inconsistent.
There are other replication options that can be used with DB2, including SQL replication and Q replication. Consult your DB2 documentation for more information.
To restore your geodatabase, you can use the RECOVER DATABASE command or the Restore Data wizard from the DB2 Control Center. The geodatabase can be restored to an existing database or to a new database. The following are notes on recovering to a new database:
- If you are recovering to a new database, you must be attached to the instance.
- If you recover to a new database on another server and your source and destination servers are not using the default code page, you need to create your database on the destination machine with the correct code page before you perform the restore. If you do not do this and your production code page is different from that of the source, the restore utility will assume the default code page. This will result in an SQL2548N error.
- When you recover to a new database, the recovery history file from the backup image will become the recovery history file of the new database.
When you perform a full database recovery of a recoverable database, there can be no other connections to the database other than the connection that is made when the database is recovered. When a full database backup is restored, DB2 checks to be sure that all the tablespaces referenced in the backup image are present in the database to which the data is being restored. (This could be an existing database or a new, empty database.) If it finds that a tablespace is missing or inaccessible, the recovery operation will fail.
To avoid this, you can perform a redirected restore operation. This allows you to specify new tablespaces to which you can restore table spaces from the backup image that were missing from the target database. For details on how to perform a redirected restore, consult your DB2 documentation.
If you are only restoring individual tablespaces of a recoverable database, the database can remain online as long as users aren't connected to the tablespaces you are trying to restore.
Tip
- You can automate archiving and retrieval of log files with a user exit program. To use this, you must set the logarchmeth1 configuration parameter to USEREXIT.
For details on how to develop and use a user exit program, consult your DB2 documentation.
- If your database is partitioned, you must run the RECOVER DATABASE command from the catalog partition. When you recover a partitioned database to a specific point in time, all the partitions listed in the db2nodes.cfg file are affected. When you recover to the end of the logs, you can specify which partitions will be affected by the recovery. If you do not designate specific partitions, all partitions in the db2nodes.cfg file will be affected.
Top of page