BackUps

Setting Up Back Ups of Super Computer Data

The /compute folder on the BYU supercomputer is not backed up, so you will need to back up your data regularly. This tutorial explains how to set up an automatic back up to regularly copy your super computer /compute folder to your cloud Box account.

Explanation of Variables Found Below

  • localusername is your username on the local machine
  • fslusername is your username on the supercomputer
  • CAEDMusername is your CAEDM username
  • computebkp is the name of a folder created on your Box account
  • testfile is the name of a small file on the supercomputer used for testing

Set Up Rclone to Work With Box

Rclone is a program that allows you to move files to cloud storage through the command line. You need to do two things: 1) create a folder on Box to contain the backup files and 2) configure Rclone to access to your Box account without a password. The latter is done by creating a unique key and placing this key on your super computer account.

  1. Create a Folder on your Box Account
    1. Navigate to box.byu.edu in a web browser.
    2. Supply your BYU ID and password and authenticate through DUO.
    3. Create a folder on Box by clicking on "New." You can name it anything you desire. The examples below use the name computebkp.
    4. Don't close the browser. Stay logged into Box.
  2. Configure Rclone
    1. Install Rclone on your local machine if it not already installed
      1. This can be done through YAST.
      2. Select Software Management.
      3. Search for rclone.
    2. Run the following command in the command line of your local machine
    rclone authorize box
    1. Click on the big button "Grant Access to Box" that appears in your browser.
      1. If you have logged into box.byu.edu in a browser before running the command rclose authorize box then a new tab should appear in your browser with a big blue button.
      2. Once you click "Grant Access to Box" then the webpage should change to show "Success!" and instruct you to returns to Rclone.
    2. Return to the window where you ran rclone authorize box on your local machine and copy the access token for later.
      1. The token will be between the symbols ---> <----. It should be something like {"access_token":"...."} where "..." is a long string with several things in it.
      2. Paste this key into a temporary file so that you have it for later.
    3. Create the Rclone configuration file
      1. Log into the supercomputer.
      2. Create a file called rclone.conf and place it at ~/.config/rclone/rclone.conf.
      3. The file should be the following.
      [boxRaw]
      type = box
      token = PASTE_TOKEN_HERE
      
      [box]
      type = chunker
      remote = boxRaw:
      chunk_size = 30G
      hash_type = sha1
      
      1. Replace "PASTE_TOKEN_HERE" with the token obtained from running rclone authorize box on your local machine. (Remember that it should be something like {"access_token":"...."} ).
      2. Make sure to save this file to ~/.config/rclone/rclone.conf.
  3. Check that Rclone Is Set Up Properly on the Supercomputer
    1. Load the rclone module on the supercomputer using the command module load rclone.
    2. Select a small file to copy. For this example the file is named testfile, and the folder created on Box is computebkp.
    3. Run the command
    rclone copy testfile box:computebkp
    1. Check to see if testfile appears on Box in the folder computebkp. If it doesn't, review each step of "II. Configure Rclone."

Set Up Restic on the Super Computer

Restic is a program that automates many of the tasks needed to create backups. The purpose is to create a series of snapshots of your files so that you can go back in time and review different versions. It can be used with Rclone but serves a different purpose. Rclone is the utility that copies files to cloud storage. Restic is the utility that creates the snapshots and labels them. It uses Rclone to make the copies but adds the additional information to each copy to identify each snapshot and stores all the information in a repository. You will set up the repository on Box.

  1. Load the Restic Module on the Supercomputer
    1. Log into the supercomputer
    2. load the Restic (and rclone) module (s) using the following command.
    module load restic rclone
  2. Create a Restic Password File
    1. Create a file and place it at ~/.restic-password.
    2. Place a random, uniquie, secure password in the file.
      1. Do not forget this password. You will not be able to retrieve backups if you forget it.
      2. Make the password at least 16 characters long.
      3. These websites can help you generate strong passwords.
        1. passwordsgenerator.net
        2. strongpasswordgenerator.com
      4. Do not use a password that you use elsewhere. This password file could be compromised if someone hacks the supercomputer. The supercomputer administrators can also view it at any time.
    3. Change the permissions on the password file so that only you as owner can view it or write to it by running the following command.
    chmod 600 ~/.restic-password
  3. Initialize the Repository on Box
    1. Run the following command
    restic -p ~/.restic-password -r rclone:box:computebkp init
    1. computebkp is the name of the folder you created on Box
    2. This command will take several seconds to execute.
    3. If everything works properly, you should see "created restic repository..." and a note reminding you to not lose your password.

Creating a Backup

Once Restic and Rclone are set up as described above, you can create a backup of your compute folder.

CAUTION Creating a backup can take a long time. The first time you run Restic it could take 1-2 days depending on the amount of data you have. Subsequent backups should not take as long because only the files that are changed or new compared to the previous snapshot are backed up.

The following command will create a backup of your compute folder on the supercomputer.

restic -p ~/.restic-password -r rclone:box:computebkp backup ~/compute --tag first_backup
  1. The flag --tag assigns this snapshot the tag of first_backup.
  2. Tagging is for convenience in searching snapshots later.
  3. The --tag flag can be omitted.

Automate the Backup Process

Once you have Rclone and Restic set up, you can automate the process to regularly back up your data. This is done by: 1) creating a bash script with the Restic backup command, and 2) telling the supercomputer to regularly run this script. The latter is done using cron.

Cron is a utility that runs on all Linux computers. You create a cron table, crontab, with the script path and the time interval at which to run the script, and cron runs the scripts as indicated in the table. Cron can be used to schedule any job desired; it is not limited to this use for backups.

  1. Create the Backup Script
    1. Download the file backup_compute.sh .
      1. This file is a bash script that contains the Restic command to back up the compute folder on the supercomputer to Box.
      2. Variables are used in this script to define the folder to back up and the Restic repository on Box so that you can easily change it for other purposes. It is currently set up to work with the names for folder as described in the steps above.
    2. Log into the supercomputer.
    3. Save backup_compute.sh in the bin folder in your home directory on the supercomputer. You may need to create this folder using mkdir ~/​bin
    4. Change the mode of the file so that it is an executable using the following command.
    chmod 755 ~/bin/backup_compute.sh
  2. Create the Entry in the Cron Table
    1. Log into the supercomputer.
    2. Execute the following command which will open a text editor.
    crontab -e
    1. The file that opens may be blank or may have content.
    2. The editor will likely be VI.
    1. Place the following line in the file and save the file as normal.
    0 2 * * 6 ~/bin/backup_compute.sh
    1. This command tells Cron to run the script backup_compute.sh at 2:00 AM every Saturday.
    2. You can change the frequency of the backups by changing the first 5 characters in this line.
    3. Search Google for crontab examples to learn more about the syntax for crontab.
    4. https://crontab.guru/ is helpful to learn about how to set the syntax of the table.

If you have done everything correct, your compute table should now be regularly backed up to box. Check your backups regularly to ensure they are running.

Other Helpful Tutorials

Rclone
Restic [Includes a discussion of how to retrieve backups.]
Backing Up Your Data