Basic Command-line AWS Glacier Workflow

By Paul Heinlein | Sep 23, 2016

Glacier is Amazon’s AWS cold-storage service. Its data-center analog is archival tape storage, and it is about as slow as tape. Retrieval times are measured in hours (if not days). Glacier is a disaster-recovery tool, not live storage.

Unlike most AWS offerings, Glacier cannot be usefully controlled from the web console. It must be accessed with command-line tools or custom-built programs. Here’s a quick overview of Glacier operations using the AWS command line interface.

Note: You’ll see unexplained references to SNSTopic in some of the JSON snippets throughout this post. They refer to the AWS Simple Notification Service, a push-notification service that will alert you to AWS events that interest you. I left them there for my own reference. You can safely ignore them, though you may find it worth your time learning how to set up notifications.

Create the Vault

aws glacier create-vault --account-id - --vault-name sandbox-02

If you log into your AWS web console, you should be able to see your new vault within a minute or two.

From the command line, you can retrieve a description of your vault.

[~]$ aws glacier describe-vault --account-id - --vault-name sandbox-02
{
    "SizeInBytes": 1036288,
    "VaultARN": "arn:aws:glacier:us-west-2:112233445566:vaults/sandbox-02",
    "LastInventoryDate": "2016-09-14T12:27:07.315Z",
    "NumberOfArchives": 1,
    "CreationDate": "2016-08-03T21:56:26.616Z",
    "VaultName": "sandbox-02"
}

Upload Archive Files

Here’s a script the uploads to Glacier all the tar archives in a given directory. The output file and the archive descriptions include a timestamp.

#!/bin/sh
# variables for vault name, timestamp, and output file
VAULT="sandbox-02"
NOW=$(date +%s)
IDFILE="archive-ids-${NOW}.json"

# make sure we can write the output file
touch $IDFILE || exit 1

# upload all tar files in forbackup directory, writing
# results to the output file
#
# the archive-description string in the filename prefixed
# with the timestamp. this information may be of great
# help when/if we later retrieve the file.
for F in /home/myproject/forbackup/*.tar; do
  echo "# $F" >> $IDFILE
  aws glacier upload-archive \
    --vault-name "$VAULT" \
    --account-id - \
    --archive-description "${NOW}/$F" \
    --body "$F" >> $IDFILE
done

The output file will contain a JSON stanza for each uploaded file:

# /home/myproject/forbackup/allimages.tar
{
  "archiveId": "Uto28rqS24V9TD6YFVkny5bCUoRr4DOJIHzpOan-4uzy-EwEfRW2QkuuvtMw4pJxuP-dXbfCfATKOlmOgDMVCKVLRIh-eBD8Zq9TcBbq2ovrCb4y2Mccd3xwPQD1udWLhUp0cxeFiw", 
  "checksum": "70cde3046ff600c49e3de101df06bdba70a2acb31753cb33097c408b9baa9023", 
  "location": "/112233445566/vaults/sandbox-02/archives/Uto28rqS24V9TD6YFVkny5bCUoRr4DOJIHzpOan-4uzy-EwEfRW2QkuuvtMw4pJxuP-dXbfCfATKOlmOgDMVCKVLRIh-eBD8Zq9TcBbq2ovrCb4y2Mccd3xwPQD1udWLhUp0cxeFiw"
}
# /home/myproject/forbackup/sqldump.tar
{
  "archiveId": "AveGlBWdJIDk8-THelSpu8FFo34KUmg8pVOQFvMxEQzM8MXMC6A4V7XcX3E3_qf7II3nYNuUpsgAhbSNYzbUUDKEmKv6VRwJvQZdP9m33ZpCGhsrMXnAgn05ng2xDvHHGFSRUjFf-g", 
  "checksum": "49b20365823966a8209e16625fe5c0cfee1a4299be01c9cfe3efbe7431908333", 
  "location": "/112233445566/vaults/sandbox-02/archives/AveGlBWdJIDk8-THelSpu8FFo34KUmg8pVOQFvMxEQzM8MXMC6A4V7XcX3E3_qf7II3nYNuUpsgAhbSNYzbUUDKEmKv6VRwJvQZdP9m33ZpCGhsrMXnAgn05ng2xDvHHGFSRUjFf-g"
}

It’s worth noting that comments are semantically invalid in the JSON standard. My script adds a #-prefixed comment for each file, which means that the output file is not, strictly speaking, proper JSON. (This hack would be unnecessary if Amazon deigned to include the ArchiveDescription string in the JSON.)

Get Inventory

aws glacier initiate-job \
  --account-id - \
  --vault sandbox-02 \
  --job-parameters '{ "Type": "inventory-retrieval" }'

An inventory-retrieval job will take several hours. I’d suggest submitting the job very early or very late in your workday. You can verify the job is in progress by submitting a list-jobs request.

[~]$ aws glacier list-jobs --account-id - --vault-name sandbox-02
{
  "JobList": [
    {
      "InventoryRetrievalParameters": {
        "Format": "JSON"
      },
      "VaultARN": "arn:aws:glacier:us-west-2:112233445566:vaults/sandbox-02",
      "SNSTopic": "arn:aws:sns:us-west-2:112233445566:glacier-sandbox",
      "Completed": false,
      "JobId": "j6ig7qCeJ4Ortc-D83EgHsNxm3RriaAkyEFma3_dx_TV_xix5_APExmpGrDLT7EU07Wxc_5BQfwllggqsgH_JfLusxIV",
      "Action": "InventoryRetrieval",
      "CreationDate": "2016-09-15T15:42:07.927Z",
      "StatusCode": "InProgress"
    }
  ]
}

Once the job is complete, you can request its output. Use the JobId from the inventory-retrieval job.

aws glacier get-job-output \
  --account-id - \
  --vault-name sandbox-02 \
  --job-id "j6ig7qCeJ4Ortc-D83EgHsNxm3RriaAkyEFma3_dx_TV_xix5_APExmpGrDLT7EU07Wxc_5BQfwllggqsgH_JfLusxIV" \
  glacier-jobs-out

The resulting file (here, glacier-jobs-out) will list the archives found within the inventory-retrieval range:

{
  "VaultARN":"arn:aws:glacier:us-west-2:112233445566:vaults/sandbox-02",
  "InventoryDate":"2016-08-04T07:56:34Z",
  "ArchiveList": [
    {
      "ArchiveId":"Uto28rqS24V9TD6YFVkny5bCUoRr4DOJIHzpOan-4uzy-EwEfRW2QkuuvtMw4pJxuP-dXbfCfATKOlmOgDMVCKVLRIh-eBD8Zq9TcBbq2ovrCb4y2Mccd3xwPQD1udWLhUp0cxeFiw",
      "ArchiveDescription":"1470261757//home/myproject/forbackup/allimages.tar",
      "CreationDate":"2016-08-03T22:02:37Z",
      "Size":44120068,
      "SHA256TreeHash":"70cde3046ff600c49e3de101df06bdba70a2acb31753cb33097c408b9baa9023"
    },
    {
      "ArchiveId":"AveGlBWdJIDk8-THelSpu8FFo34KUmg8pVOQFvMxEQzM8MXMC6A4V7XcX3E3_qf7II3nYNuUpsgAhbSNYzbUUDKEmKv6VRwJvQZdP9m33ZpCGhsrMXnAgn05ng2xDvHHGFSRUjFf-g",
      "ArchiveDescription":"1470261757//home/myproject/forbackup/sqldump.tar",
      "CreationDate":"2016-08-03T22:02:58Z",
      "Size":1003520,
      "SHA256TreeHash":"49b20365823966a8209e16625fe5c0cfee1a4299be01c9cfe3efbe7431908333"
    }
  ]
}

Retrieve an Archive

Using the correct ArchiveID keypair from the inventory-retrieval data, you need to build a JSON archive-retrieval request:

{
  "Type": "archive-retrieval",
  "ArchiveId": "AveGlBWdJIDk8-THelSpu8FFo34KUmg8pVOQFvMxEQzM8MXMC6A4V7XcX3E3_qf7II3nYNuUpsgAhbSNYzbUUDKEmKv6VRwJvQZdP9m33ZpCGhsrMXnAgn05ng2xDvHHGFSRUjFf-g",
  "Description": "Retrieve SQL dump for audit team",
  "SNSTopic":"arn:aws:sns:us-west-2:112233445566:glacier-sandbox"
}

Then reference that JSON file in your job request:

aws glacier initiate-job \
  --account-id - \
  --vault-name sandbox-02 \
  --job-parameters file://archive-retrieval.json

You’ll receive a location and job ID:

{
    "location": "/112233445566/vaults/sandbox-02/jobs/xGvIJyQPC9weheMNwIf4s2z8Zct1lYGvjzdxz84VwhD-OaGtCRPwLCAGdr5c_m3qadoOkMGo-FYaLJ5psLKhhcFDjC1n",
    "jobId": "xGvIJyQPC9weheMNwIf4s2z8Zct1lYGvjzdxz84VwhD-OaGtCRPwLCAGdr5c_m3qadoOkMGo-FYaLJ5psLKhhcFDjC1n"
}

Asking AWS to list glacier jobs is instructive:

[~]$ aws glacier list-jobs --account-id - --vault-name sandbox-02
{
  "JobList": [
    {
      "VaultARN": "arn:aws:glacier:us-west-2:112233445566:vaults/sandbox-02",
      "RetrievalByteRange": "0-44120067",
      "SNSTopic": "arn:aws:sns:us-west-2:112233445566:glacier-sandbox",
      "Completed": false,
      "SHA256TreeHash": "49b20365823966a8209e16625fe5c0cfee1a4299be01c9cfe3efbe7431908333"
      "JobId": "xGvIJyQPC9weheMNwIf4s2z8Zct1lYGvjzdxz84VwhD-OaGtCRPwLCAGdr5c_m3qadoOkMGo-FYaLJ5psLKhhcFDjC1n",
      "ArchiveId": "AveGlBWdJIDk8-THelSpu8FFo34KUmg8pVOQFvMxEQzM8MXMC6A4V7XcX3E3_qf7II3nYNuUpsgAhbSNYzbUUDKEmKv6VRwJvQZdP9m33ZpCGhsrMXnAgn05ng2xDvHHGFSRUjFf-g",
      "JobDescription": "Retrieve SQL dump for audit team",
      "ArchiveSizeInBytes": 1003520,
      "Action": "ArchiveRetrieval",
      "ArchiveSHA256TreeHash": "49b20365823966a8209e16625fe5c0cfee1a4299be01c9cfe3efbe7431908333"
      "CreationDate": "2016-09-22T17:16:29.191Z",
      "StatusCode": "InProgress"
    }
  ]
}

Retrieve Your Bits

Once you’re notified the job is complete, you can retrieve the file:

aws glacier get-job-output \
  --account-id - \
  --vault-name sandbox-02 \
  --job-id "xGvIJyQPC9weheMNwIf4s2z8Zct1lYGvjzdxz84VwhD-OaGtCRPwLCAGdr5c_m3qadoOkMGo-FYaLJ5psLKhhcFDjC1n" \
  sqldump.tar

The output file I named sqldump.tar, which is the same as the original filename, but you can specify any filename you want.