How to Cache with CloudFront

Krisztina Szilvasi, Istvan Szukacs·16-Jan-2023

How to Cache with CloudFront

Caching is a common technique used to improve the performance and scalability of web applications. By storing frequently accessed data in temporary storage locations, applications can retrieve data more quickly and reduce the load on their servers. AWS CloudFront, a content delivery network (CDN) service, offers a range of features for caching and delivering content to users. In this article, we will explore the benefits of caching with CloudFront, the major traits of cache behaviors and policies, and go through how to implement different caching durations for index pages and the rest of the pages.

Benefits of Caching

Using caching can have several benefits for a web application. In addition to browser caching, Cloudfront also stores frequently accessed data in edge locations, which are closer to the users, so that data can be retrieved more quickly. CloudFront can serve a larger number of requests from edge caches, meaning it does not have to forward as many viewer requests to the origin in order to obtain the latest objects. This helps to reduce the load on the origin servers and improve the performance of the application, leading to a better user experience and potentially higher search engine rankings. Caching might also contribute to the reduction of bandwidth costs, as the content does not need to travel all over the network. Overall, implementing caching in AWS Cloudfront not only improves the speed, scalability, and cost-effectiveness of the web application, but also contributes to better SEO by enhancing the performance of the site.

Cache Behaviours

An AWS Cloudfront cache behavior determines how Cloudfront handles requests for objects. It can specify which HTTP methods are accepted and forwarded by Cloudfront to the origin. For example, for a typical static site, only the GET, HEAD, and OPTIONS methods are suitable. Cloudfront also enables developers to choose which HTTP methods are cached in edge locations. Furthermore, path patterns can be used to set different caching times for different types of objects. For example, videos and images might need to be cached for longer periods than texts. The lower caching times are useful for index.htmls, which reference CSS or JS files, which have different asset ids after being updated.

One Cloudfront distribution can have one or more cache behaviors. A distribution is required to have at least a default cache behavior, although ordered cache behaviors can be specified in addition. Cache behaviors are executed from top to bottom in order of precedence. The topmost cache behavior will have precedence 0. The default will always be executed last, having the highest precedence value.

Cache Policy

When attached to a cache behavior, a cache policy indicates the values included in the cache key, which is used by CloudFront to locate objects in its cache. These values can include HTTP headers, cookies, and URL query strings. The cache policy also sets default, minimum, and maximum TTL values for how long objects should stay in the CloudFront cache. Essentially, the TTL determines caching in Cloudfront edge locations, not browser caching. So if the default TTL is set to 3600 seconds, it does not necessarily mean that the browser will also cache objects for the same amount of time.

Response Header Policy

When attached to a cache behavior, a response header policy controls the HTTP headers included in CloudFront's responses to requests that match the cache behavior. CloudFront will add or remove headers based on the configuration of the response headers policy. Configuring custom headers can be used for setting how long the browser should cache a response. The Cache-Control header specifies for how many seconds a response remains fresh. It's important to note that the elapsed time is calculated from the time the response was generated on the origin server, rather than the time it was received.

Cache-Control: max-age=604800

To put it very simply: the TTL in the cache policy specifies for how long an object stays in the Cloudfront cache, and the Cache-Control header specified in the response header policy indicates for how long an object is cached by the browser. However, it is a bit more complicated than that. The relationship between the max-age and the TTL plays an important role in browser and Cloudfront caching. For all the details, see this page of the AWS documentation.

Use Case

For our website www.datadeft.eu, we wanted to have the content cached in the browser and Cloudfront for a week except for the index pages, where the target is only 60 seconds. So we are going to create two cache policies and two response header policies. Finally, we are going to attach the 7-day policies to the Cloudfront distribution as default cache behavior and the 60-second policies as ordered cache behavior. We are going to complete this task using Terraform in this example, but naturally, the same can be done using the AWS Console or the AWS CLI. We already discussed, in our previous article, the benefits of using Terraform.

This cache policy determines that by default all objects are stored in the Cloudfront cache for 7 days (minimum 1 day and maximum 1 month). This is going to be the default cache policy, attached to the default cache behavior.

resource "aws_cloudfront_cache_policy" "caching-7-days" {  name        = "caching-7-days"  comment     = "Caching objects for 7 days"  default_ttl = 604800  # 1 week  max_ttl     = 2630000 # 1 month  min_ttl     = 86400   # 1 day  parameters_in_cache_key_and_forwarded_to_origin {    enable_accept_encoding_brotli = true    enable_accept_encoding_gzip   = true    cookies_config {      cookie_behavior = "none"    }    headers_config {      header_behavior = "none"    }    query_strings_config {      query_string_behavior = "none"    }  }}

This request header policy indicates that the browser stores all responses for 7 days. It is going to be the default for our site and going to be attached to the default cache behavior.

resource "aws_cloudfront_response_headers_policy" "max-age-7-days" {  name    = "max-age-7-days"  comment = "Max-Age=604800"  security_headers_config {  }  custom_headers_config {    items {      header   = "Cache-Control"      override = true      value    = "Max-Age=604800"    }  }}

In our case min_ttl is 86 400, default_ttl and max-age are 604 800, max_ttl is 2 630 000; so the following is true minimum TTL < max-age < maximum TTL. Since this condition is met, the Cloudfront and browser caching are both 604 800 seconds. Other rules apply depending on the relationship between the TTLs and the max-age. For example, if the minimum TTL was higher than the max-age, then Cloudfront would cache objects for the minimum TTL. For all the details, see this page of the AWS documentation.

The following cache and request header policy contains the values that determine the caching for the index pages. These are going to be attached to the ordered cache behavior.

resource "aws_cloudfront_cache_policy" "caching-60-seconds" {  name        = "caching-60-seconds"  comment     = "Caching objects for 60 seconds"  default_ttl = 60  max_ttl     = 60  min_ttl     = 60  parameters_in_cache_key_and_forwarded_to_origin {    enable_accept_encoding_brotli = true    enable_accept_encoding_gzip   = true    cookies_config {      cookie_behavior = "none"    }    headers_config {      header_behavior = "none"    }    query_strings_config {      query_string_behavior = "none"    }  }}
resource "aws_cloudfront_response_headers_policy" "max-age-60-seconds" {  name    = "max-age-60-seconds"  comment = "Max-Age=60"  security_headers_config {  }  custom_headers_config {    items {      header   = "Cache-Control"      override = true      value    = "Max-Age=60"    }  }}

The following code snippet shows how the cache behaviors are attached to the Cloudfront distribution, and how the policies are attached to the cache behaviors. The ordered cache behavior uses the cache and response header policies that specify 1 minute of caching. So both the browser and Cloudfront will cache the chosen object for 60 seconds. path_pattern = "*/" indicates that this rule affects to index pages only. As this cache behavior has precedence 0 – meaning that 0 items precede it –, it is going to be executed first. The default cache behavior uses the cache and response header policies that specify 7 days of caching. This applies to both browser and Cloudfront caching. As it does not have a path_pattern attribute, it applies to all objects (whose caching is not specified in previous cache behaviors).

resource "aws_cloudfront_distribution" "www.datadeft.eu" {...  ordered_cache_behavior {                                                    # precedence 0    viewer_protocol_policy     = "allow-all"    path_pattern               = "*/"                                         # all index pages    allowed_methods            = ["GET", "HEAD"]    cached_methods             = ["GET", "HEAD"]    cache_policy_id            = "68482be9-d235-4bf5-bf0a-17775cb433b1"       # caching-60-seconds    response_headers_policy_id = "53f59fb9-952b-43e4-a688-47bdf56a5388"       # max-age-60-seconds    compress                   = true    target_origin_id           = "S3-${var.bucket-name}"  }  default_cache_behavior {                                                     # precedence 1    viewer_protocol_policy     = "redirect-to-https"    allowed_methods            = ["GET", "HEAD", "OPTIONS"]    cached_methods             = ["GET", "HEAD", "OPTIONS"]    cache_policy_id            = "fed3ebc7-63ac-40f8-8914-7fa9d5e285b2"        # caching-7-days    response_headers_policy_id = "2ee3b1a3-67d4-4bf2-80a1-222cd0d680d0"        # max-age-7-days    compress                   = true    target_origin_id           = "S3-${var.bucket-name}"  }  ...  }

After applying these configurations via Terraform. Let us check if it really works. First I will try the index page of our blog section.

curl -I https://www.datadeft.eu/blog/HTTP/2 200content-type: text/htmlcontent-length: 13168date: Mon, 09 Jan 2023 13:54:45 GMTlast-modified: Thu, 15 Dec 2022 11:56:07 GMTetag: "80d81d7df39aa90071ae60a286e56b69"server: AmazonS3vary: Accept-Encodingx-cache: Hit from cloudfrontvia: 1.1 bef2aa0a3399e7cf217d61d0ac883834.cloudfront.net (CloudFront)x-amz-cf-pop: BUD50-C1alt-svc: h3=":443"; ma=86400x-amz-cf-id: 6ge9_DFm-SO4w2UximqlWCywj-wvFivRaIrFegRdzJh6jDOHL4RpHw==age: 9cache-control: Max-Age=60

It works as expected. Now let's try another page.

curl -I https://www.datadeft.eu/services/data-engineering/HTTP/2 302x-amz-error-code: Foundx-amz-error-message: Resource Foundlocation: /services/data-engineering/date: Mon, 09 Jan 2023 13:54:05 GMTserver: AmazonS3x-cache: Miss from cloudfrontvia: 1.1 8d1d469965b7983f5b93251c439f9c4c.cloudfront.net (CloudFront)x-amz-cf-pop: BUD50-C1alt-svc: h3=":443"; ma=86400x-amz-cf-id: 1pPqUBomfauHCwMOigkBudUI4rP-fMEtR4k94oi_QUBZrE6TOWt8-w==cache-control: Max-Age=604800

In conclusion, using caching with AWS CloudFront can greatly improve the performance, scalability, and cost-effectiveness of a web application. By configuring cache behaviors, policies, and headers, developers can control how CloudFront delivers content to users and optimize the user experience. Whether you are looking to cache static or dynamic content, or target specific viewer groups, CloudFront provides a range of features to suit your needs. By implementing these features effectively, you can enhance the performance of your site and potentially improve your search engine rankings.

Back to blog posts