Publishing a Static Website on CloudFront
by Jean-Michel Lacroix on
My first CloudFront article deals with basic S3 and CloudFront setup and with the DefaultRootObject which is the key of hosting a website on Amazon's CDN.
This second post focuses about getting your website on CloudFront by providing maintenance scripts to handle publishing and invalidation. Most of the scripts are generic, but the Rakefile targets a Jekyll-generated static site.
The Rakefile
First of all, here's my simple Rakefile. Don't try to use it yet, there's a lot of stuff missing.
task :default => :server desc 'Start server with --auto' task :server do jekyll('--server --auto') end desc 'Build site with Jekyll' task :build do jekyll('--no-future') end desc 'Build and deploy' task :publish => :build do bucket = 'MYBUCKET' puts "Publishing site to bucket #{bucket}" sh 'ruby aws_cf_sync.rb _site/ ' + bucket end def jekyll(opts = '') sh 'rm -rf _site/*' sh 'jekyll ' + opts end
There's only 3 commands available:
- server: run the Jekyll server on localhost
- build: generate the website in Jekyll's _site folder
- publish: sync the _site folder on a S3 bucket and invalidate its content
The publish task is the only one I'll talk about, since the two others are really basic. Before continuing, make sure you have replaced "MYBUCKET" by your own S3 bucket.
Directory synchronization
I've written a script that wraps the s3cmd "sync" command with an invalidation tool to help with updates:
local = ARGV[0] s3_dest = ARGV[1] if local == nil || s3_dest == nil puts "syntax aws_cf_sync.rb local_source s3_dest" exit end config = "#{Dir.pwd}/s3.config" if !File.exists?(config) puts "please setup your s3.config file" exit end invalidate = "#{Dir.pwd}/aws_cf_invalidate.rb" if !File.exists?(invalidate) puts "please download the aws_cf_invalidate.rb script" exit end s3_dest = s3_dest.split('/') s3_bucket = s3_dest.shift s3_path = s3_dest.join('/') s3_path += '/' unless s3_dest.length == 0 %x[ $(which s3cmd) -c #{config} sync #{local} s3://#{s3_bucket}/#{s3_path} --acl-public ] files = %x[ cd _site && find . -type f ].split("\n").map do |f| s3_path + f[2,f.length] end %x[ ruby #{invalidate} #{files.join(' ')} ]
To run this script, s3cmd has to be installed. On OSX, you can do so easily with homebrew:
$ brew install s3cmd
Now, create an s3.config file (God I hate these) with the s3cmd --configure command or copy the following configuration. Don't forget to set your own AWS credentials (rows #2,3).
[default] access_key = S3_ACCESS_KEY secret_key = S3_SECRET_KEY acl_public = False bucket_location = US cloudfront_host = cloudfront.amazonaws.com cloudfront_resource = /2008-06-30/distribution default_mime_type = binary/octet-stream delete_removed = False dry_run = False encoding = UTF-8 encrypt = False force = False get_continue = False gpg_command = None gpg_decrypt = %(gpg_command)s -d --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)sgpg_encrypt = %(gpg_command)s -c --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)sgpg_passphrase = guess_mime_type = True host_base = s3.amazonaws.com host_bucket = %(bucket)s.s3.amazonaws.com human_readable_sizes = False list_md5 = False preserve_attrs = True progress_meter = True proxy_host = proxy_port = 0 recursive = False recv_chunk = 4096 send_chunk = 4096 simpledb_host = sdb.amazonaws.com skip_existing = False urlencoding_mode = normal use_https = False verbosity = WARNING
You can test your s3cmd configuration by typing this command to list all your buckets:
$ s3cmd -c s3.config ls
Cache invalidation
The sync script invalidates the cache of the published objects by calling this script:
require 'rubygems' require 'hmac-sha1' require 'net/https' require 'base64' s3_access='S3_ACCESS_KEY' s3_secret='S3_SECRET_KEY' cf_distribution='CLOUDFRONT_DISTRIBUTION_ID' if ARGV.length < 1 puts "usage: aws_cf_invalidate.rb file1.html dir1/file2.jpg ..." exit end paths = '<Path>/' + ARGV.join('</Path><Path>/') + '</Path>' date = Time.now.utc date = date.strftime("%a, %d %b %Y %H:%M:%S %Z") digest = HMAC::SHA1.new(s3_secret) digest << date uri = URI.parse('https://cloudfront.amazonaws.com/2010-08-01/distribution/' + cf_distribution + '/invalidation') req = Net::HTTP::Post.new(uri.path) req.initialize_http_header({ 'x-amz-date' => date, 'Content-Type' => 'text/xml', 'Authorization' => "AWS %s:%s" % [s3_access, Base64.encode64(digest.digest)] }) req.body = "<InvalidationBatch>" + paths + "<CallerReference>ref_#{Time.now.utc.to_i}</CallerReference></InvalidationBatch>" http = Net::HTTP.new(uri.host, uri.port) http.use_ssl = true http.verify_mode = OpenSSL::SSL::VERIFY_NONE res = http.request(req) puts res.code puts res.body
Note that you have to set your CloudFront distribution ID and your AWS credentials in the previous script too. Sorry for that, but I didn't feel like refactoring around this painful s3cmd config file.
Summary and publishing
In your Jekyll working directory, you should now have these files:
- Rakefile: rake tasks to easily manage your site
- aws_cf_sync.rb: script that syncs your local files with your S3 bucket
- aws_cf_invalidate.rb: script that invalidates the cache of the updated files
- s3.config: the configuration of your s3 bucket
Before publishing, I suggest you add these exclusions in your _config.yml file:
exclude: [ Rakefile, aws_cf_sync.rb,
aws_cf_invalidate.rb, s3.config ]
If you've been a good reader and followed every instruction, all you have to do to publish your website on your S3 bucket and invalidate the CloudFront cache is:
$ rake publish
You can now fire up your browser and refresh frantically until you see your changes.