Cloudflare Cookie Caching Confusion

Recently I discovered a surprising behavior of Cloudflare that affects sites that use HTTP cookies to hold user access tokens:

CloudFlare’s non-Enterprise service tiers do not respect the “Vary: Cookie” HTTP header.

This impacts sites that:

  • Are relying on CloudFlare’s “Cache Everything” option
  • Expose both “logged in” and “not logged in” versions of the same URL, where the page content differs for logged-in and anonymous visitors
  • Use cookies to differentiate between logged-in and anonymous visitors

Under these conditions, a logged-in user who visits a URL that is also visible to anonymous traffic will sometimes be served the “not logged in” version of a page.

This is confusing because the user will not see their normal posting controls, and they may be asked to log in again unnecessarily.

This problem won’t affect a site where logged-in users always see different URLs, for example a banking or webmail service where anonymous visitors will never see the same pages as logged-in users.

The impact happens on sites like discussion forums where anonymous visitors have read-only access to the same pages where logged-in users hang out.

Why this happens

For URLs that are visible (in different forms) to logged-in and anonymous visitors, it is important for your web server to return a Vary: Cookie HTTP header, to tell caches like CloudFlare that a different version of the page will be returned depending on what cookies the browser sends.

Of course, pages visible only to logged-in users should also send a Cache-Control: private header, to prevent their content from being exposed to anyone other than the user who requested them.

CloudFlare is usually very good at obeying the letter of the relevant web standards, but in this case, it will ignore Vary: Cookie. This means that anonymous hits will poison the cache for logged-in users. CloudFlare will serve up the anonymous version of the page even though the browser is sending a valid logged-in cookie.

Mitigations

One cannot rely on Cache-Control: public/private alone from fixing the problem, because both anonymous and logged-in hits will have the exact same cache key. CloudFlare properly declines to cache responses marked Cache-Control: private, but as soon as it sees a Cache-Control: public response, it will be used to fulfill subsequent hits on the same key.

Proper fixes include:

  • Use a Page Rule to disable caching on the affected URLs. Of course, this means you do not benefit from the cache anymore.
  • Find a way to ensure logged-in users always view different URLs than anonymous traffic.
  • Upgrade to CloudFlare’s Enterprise service tier.

Cost Disease

From the article (not my own words):

“If some government program found a way to give poor people good health insurance for a few hundred dollars a year, college tuition for about a thousand, and housing for only two-thirds what it costs now, that would be the greatest anti-poverty advance in history. That program is called ‘having things be as efficient as they were a few decades ago’.”

Read more at Philip Greenspun’s blog:
https://blogs.harvard.edu/philg/2017/03/04/another-economic-sourpuss/

The original source is here:
https://slatestarcodex.com/2017/02/09/considerations-on-cost-disease/

Crude Linear Models Almost Always Outperform Human Judgment

Fascinating article – from 1979 – on how even a crudely-constructed linear model is almost always superior to human expert judgment on classification tasks:

Robyn M. Dawes, “The Robust Beauty of Improper Linear Models in Decision Making”

Is there still any role for people in decisionmaking?

But people are important. The statistical model may integrate the information in an optimal manner, but it is always the individual (judge, clinician, subjects) who chooses variables. Moreover, it is the human judge who knows the directional relationship between the predictor variables and the criterion of interest, or who can code the variables in such a way that they have clear directional relationships.

Now consider how good machine learning and AI algorithms have become at these parts of the task! Soon it might become optimal to delegate the whole process to machines.