Publishing a Static Website on CloudFront

by Jean-Michel Lacroix on September 23, 2010

My first CloudFront article deals with basic S3 and CloudFront setup and with the DefaultRootObject which is the key of hosting a website on Amazon's CDN.

This second post focuses about getting your website on CloudFront by providing maintenance scripts to handle publishing and invalidation. Most of the scripts are generic, but the Rakefile targets a Jekyll-generated static site.

The Rakefile

First of all, here's my simple Rakefile. Don't try to use it yet, there's a lot of stuff missing.

task :default => :server

desc 'Start server with --auto'
task :server do
  jekyll('--server --auto')
end

desc 'Build site with Jekyll'
task :build do
  jekyll('--no-future')
end

desc 'Build and deploy'
task :publish => :build do
  bucket = 'MYBUCKET'
  puts "Publishing site to bucket #{bucket}"
  sh 'ruby aws_cf_sync.rb _site/ ' + bucket
end

def jekyll(opts = '')
  sh 'rm -rf _site/*'
  sh 'jekyll ' + opts
end

There's only 3 commands available:

server: run the Jekyll server on localhost
build: generate the website in Jekyll's _site folder
publish: sync the _site folder on a S3 bucket and invalidate its content

The publish task is the only one I'll talk about, since the two others are really basic. Before continuing, make sure you have replaced "MYBUCKET" by your own S3 bucket.

Directory synchronization

I've written a script that wraps the s3cmd "sync" command with an invalidation tool to help with updates:

local   = ARGV[0]
s3_dest = ARGV[1]

if local == nil || s3_dest == nil
  puts "syntax aws_cf_sync.rb local_source s3_dest"
  exit
end

config = "#{Dir.pwd}/s3.config"
if !File.exists?(config)
  puts "please setup your s3.config file"
  exit
end

invalidate = "#{Dir.pwd}/aws_cf_invalidate.rb"
if !File.exists?(invalidate)
  puts "please download the aws_cf_invalidate.rb script"
  exit
end

s3_dest   = s3_dest.split('/')
s3_bucket = s3_dest.shift
s3_path   = s3_dest.join('/')

s3_path += '/' unless s3_dest.length == 0

%x[ $(which s3cmd) -c #{config} sync #{local} s3://#{s3_bucket}/#{s3_path} --acl-public ]

files = %x[ cd _site && find . -type f ].split("\n").map do |f|
  s3_path + f[2,f.length]
end

%x[ ruby #{invalidate} #{files.join(' ')} ]

To run this script, s3cmd has to be installed. On OSX, you can do so easily with homebrew:

$ brew install s3cmd

Now, create an s3.config file (God I hate these) with the s3cmd --configure command or copy the following configuration. Don't forget to set your own AWS credentials (rows #2,3).

[default]
access_key = S3_ACCESS_KEY
secret_key = S3_SECRET_KEY
acl_public = False
bucket_location = US
cloudfront_host = cloudfront.amazonaws.com
cloudfront_resource = /2008-06-30/distribution
default_mime_type = binary/octet-stream
delete_removed = False
dry_run = False
encoding = UTF-8
encrypt = False
force = False
get_continue = False
gpg_command = None
gpg_decrypt = %(gpg_command)s -d --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)sgpg_encrypt = %(gpg_command)s -c --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)sgpg_passphrase = 
guess_mime_type = True
host_base = s3.amazonaws.com
host_bucket = %(bucket)s.s3.amazonaws.com
human_readable_sizes = False
list_md5 = False
preserve_attrs = True
progress_meter = True
proxy_host = 
proxy_port = 0 
recursive = False
recv_chunk = 4096
send_chunk = 4096
simpledb_host = sdb.amazonaws.com
skip_existing = False
urlencoding_mode = normal
use_https = False
verbosity = WARNING

You can test your s3cmd configuration by typing this command to list all your buckets:

$ s3cmd -c s3.config ls

Cache invalidation

The sync script invalidates the cache of the published objects by calling this script:

require 'rubygems'
require 'hmac-sha1'
require 'net/https'
require 'base64'

s3_access='S3_ACCESS_KEY'
s3_secret='S3_SECRET_KEY'
cf_distribution='CLOUDFRONT_DISTRIBUTION_ID'

if ARGV.length < 1
  puts "usage: aws_cf_invalidate.rb file1.html dir1/file2.jpg ..."
  exit
end

paths = '<Path>/' + ARGV.join('</Path><Path>/') + '</Path>'

date = Time.now.utc
date = date.strftime("%a, %d %b %Y %H:%M:%S %Z")
digest = HMAC::SHA1.new(s3_secret)
digest << date

uri = URI.parse('https://cloudfront.amazonaws.com/2010-08-01/distribution/' + cf_distribution + '/invalidation')

req = Net::HTTP::Post.new(uri.path)
req.initialize_http_header({
  'x-amz-date' => date,
  'Content-Type' => 'text/xml',
  'Authorization' => "AWS %s:%s" % [s3_access, Base64.encode64(digest.digest)]
})

req.body = "<InvalidationBatch>" + paths + "<CallerReference>ref_#{Time.now.utc.to_i}</CallerReference></InvalidationBatch>"

http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
res = http.request(req)

puts res.code
puts res.body

Note that you have to set your CloudFront distribution ID and your AWS credentials in the previous script too. Sorry for that, but I didn't feel like refactoring around this painful s3cmd config file.

Summary and publishing

In your Jekyll working directory, you should now have these files:

Rakefile: rake tasks to easily manage your site
aws_cf_sync.rb: script that syncs your local files with your S3 bucket
aws_cf_invalidate.rb: script that invalidates the cache of the updated files
s3.config: the configuration of your s3 bucket

Before publishing, I suggest you add these exclusions in your _config.yml file:

exclude: [ Rakefile, aws_cf_sync.rb,
           aws_cf_invalidate.rb, s3.config ]

If you've been a good reader and followed every instruction, all you have to do to publish your website on your S3 bucket and invalidate the CloudFront cache is:

$ rake publish

You can now fire up your browser and refresh frantically until you see your changes.