Community:NDR/Journaling/RestoreProcedure

From NSDLWiki

Jump to: navigation, search

Contents

[hide]

Restoring an NSDL Repository

The restore procedure is used in the case of a repository server that has been offline or is out of sync with the repository cluster due to equipment or service failures. It assumes the server OS and necessary software is installed. It is also assumed that the OUT-OF-DATE server will start out as a follower; to make a follower a leader, see Community:NDR/Journaling/LeaderProcedure.

0. The first step may depend on weather the OUT-OF-DATE system is being reloaded from scratch, or if it was a follower, and has become out of sync due to being off-line for a time.

  • A scratch reload will require moving files from the current leader (or an up-to-date follower) to the OUT-OF-DATE system. This is "Starting a system from scratch" below.
  • A follower that has gotten behind for any reason may be brought up-to-date by processing any journal files that were missed. This is "Bringing an existing system back up to date" below. This is a simpler process - I believe. (TC)


The First step is to load current data to the OUT-OF-DATE system:

Starting a system from scratch

  • 1. Rsync files from up2date machine (LEADER OR FOLLOWER) to OUT-OF-DATE machine. These snapshot files are moved from/to [NDR_BACKUP]
    • foxml files
    • database dump files (fedora22, ri, proai, proai-public)
    • proai (2) cache files
  • 2. Restore files and db on OUT-OF-DATE machine:
    • Run /root/bin/restore-ndr.sh script
    • 11/29 script needs updating to fix command line db password issue - needs to be restarted from failure point.

Bringing an existing system back up to date

  • 1. Copy missing journal files from (directory?) an up-to-date system.
    • Copy to the [NDR_CONTENT]/journals/incomingFiles directory of the OUT-OF-DATE system.
  • 2. Start fedora on the OUT-OF-DATE system.
    • Make sure it is configured as a follower and let it process the journal files. This should bring it up-to-date as of the last journal file processed.


Restart system as active follower

3. Prepare OUT-OF-DATE machine for receiving files

  • copy [FEDORA_HOME]/server/config/fedora.fcfg.follower to fedora.fcfg
  • [NDR_CONTENT]/journals/incomingFiles - check that it is or empty offline
  • Start RMI receiver

4. Prepare current LEADER for the new follower

  • Add the OUT-OF-DATE server as a follower in the [FEDORA_HOME]/server/config/fedora.fcfg.
  • Bounce the fedora to set the config.

5. Copy the relevant journal files to offline server.

  • Do a hash on the offline server
  • Find the hash in the LEADER archiveFiles to know most recent file.
  • LEADER:[NDR_CONTENT]/journals/archiveFiles to OUT-OF-DATE:[NDR_CONTENT]/journals/incomingFIles

6. Start fedora on OUT-OF-DATE machine

  • Verify that this new follower is munching on the journal files:
    • see logs in [NDR_CONTENT]/journals/journal.recovery.log*
    • see that files are moving out of [NDR_CONTENT]/journals/incomingFiles


[FEDORA_HOME] = /usr/local/fedora
[NDR_CONTENT] = /usr/local/ndr-content
[NDR_BACKUP] = /usr/local/ndr-backup

Personal tools