Notes on Kubernetes setup with Terraform on Amazon EKS

Terraform

  • Terraform may not garbage-collect AWS load balancers and security groups that are created as side effects of EKS operation. (e.g. creating a Service with type LoadBalancer means EKS will go and create an actual ELB for you). During terraform destroy, these might have to be deleted manually to unblock a full cleanup.
  • It IS possible to give Terraform control of manifests and Helm resources inside the cluster. This requires a hack to fetch an authentication token using Heptio as an external script, which you can then feed to the “kubernetes” provider. Eventually Terraform might gain support for exec-based authentication, and then this will be smoother.

Kubernetes on EKS

  • Authentication is done by a command heptio-authenticator-aws (which queries AWS for a token), called from kubectl. Generally you are expected to create a kubectl config file for access to each new cluster. Implicitly this creates dependencies on kubectl (~/.kube/config) and heptio (~/.aws/credentials).
  • Ingress support is extremely limited. EKS can create Services with type LoadBalancer as ELBs, but does not do anything with Ingress resources. You have to install something extra, like alb-ingress-controller in order to get Ingress working. This breaks many off-the-shelf Helm charts that expect Ingress to work normally.
  • Helm charts seem like a busted early version of Dockerfiles.
  • I have not figured out HTTPS yet. alb-ingress-controller does allow AWS managed certificates for SSL termination. Again this breaks off-the-shelf Helm charts.

Halyard

  • Halyard’s Docker image is busted due to a typo in the URL for heptio-authenticator-aws. Fix in my pull request.
  • Running Halyard/Spinnaker on EKS seems to be a bleeding-edge configuration with some issues. kubectl proxy doesn’t seem to perform authentication or figure out URL paths for the Spinnaker Deck GUI front-end.

Minimum Viable Community Management Toolkit

The tools you will need to protect an open online community from abuse:

  • Abuse Detection
    • Manual & automated monitoring
    • Automated tools must be able to adapt to new forms of abuse as they emerge
    • User-visible “Report abusive content” feature
      • Fills gaps in internal content monitoring
      • Helps teach you what your specific community considers “abusive.”
    • Review queue for internal team
      • Collect & prioritize abuse reports
      • Suggest appropriate resolutions based on history of user’s past behavior
      • Must be as quick/easy as possible, since it will be scanned frequently
      • (maybe) Provide feedback to users when reports are resolved. Be careful, this can lead to frustration if user disagrees with your judgment.
    • Muting tools
      • Time-limited and permanent mute (read-only mode)
      • Provides “cooling-off” time without permanent harm
    • Banning tools
      • Time-limited and permanent bans
      • (maybe) “Shadowbans” to slow down adversary response
    • Reputation Database
      • Prevent abusers from returning under different names/accounts
      • Track: IP addresses, email addresses, VPNs, social network accounts, browser/device fingerprints
    • Anti-Fraud Firewall
      • Close off channels that abusers use to target the community
      • Anonymizing Proxies/VPNs, Throw-away email providers, Datacenters, Country-level blocks, rate limits
    • Identity verification to guard posting privileges
      • e.g., social network login or SMS phone line
      • Note: do not rely on Google or Facebook OAuth alone to authenticate identity. They are bad at this.

    Nice-to-have improvements:

    • Honeypot / “Lightning Rod”
      • Divert troublemakers to a well-confined area
    • Pro-active detection & response
      • Look for signs of incoming abuse before it happens
      • Deflect in a positive direction, or pre-emptively mute

New Domain Parking / Set-Up Notes

Steps to “park” a new domain with email and HTTP service. Total cost is ~$12/year assuming you already have a web server set up.

Domain Registration and DNS

  • Register domain with Amazon Route53 ($12/year for .com)
    • Delete the public “Hosted Zone” ($6/year) since CloudFlare will be used for hosting DNS
    • No Route53 Hosted Zone is necessary, unless you want to run a VPC with its own private view of the domain, in which case there needs to be a private Hosted Zone.
  • Create CloudFlare free-tier account for DNS hosting
    • Change Amazon Route53 DNS server settings over to CloudFlare
    • CloudFlare settings that you might want to adjust:
      • Crypto/SSL Policy: see below
      • Always Use HTTPS: On (unless you need fine-grained control over HTTP→HTTPS redirection)

HTTPS

Assume you have a web server that will respond to HTTP requests on the new domain.

  • Option 1: Direct Connection (CloudFlare ingress and SSL termination, but no SSL to the origin)
    • Use a single-host A/CNAME record in CloudFlare
    • CloudFlare will handle SSL termination, but must be used in “Flexible” crypto mode which reverts to HTTP when talking to the origin server.
  • Option 2a: Proper SSL Setup with AWS load balancer (~$240/year) and its built-in certificate
    • Create an EC2 load balancer with a certificate appropriate for the domain
    • Use a CNAME record in CloudFlare pointing to the load balancer’s dualstack.my-lb-1234566-... DNS name
    • Now you can enable CloudFlare’s “Full” crypto mode
  • Option 2b: Proper SSL Setup with Let’s Encrypt (free)
    • TBD – needs some kind of containerized HTTP server that updates the certificate automatically

Email Forwarding

It is important to be able to receive email addressed to [email protected], for example to respond to verification emails for future domain transfers or SSL certificate issuance.

Email forwarding can be set up for free using Mailgun:

  • Create Mailgun free-tier account on the top-level domain
  • Add the necessary DNS records for Mailgun at CloudFlare (domainkey and MX servers)
  • In Mailgun’s “Routes” panel, create a rule that matches incoming email to [email protected] and forwards it as necessary

Email Reception

If you actually want to receive (not just forward) incoming email, either use Gmail on the domain, or the following (nearly-free) AWS system:

  • In Amazon SES, add and verify the domain
    • This will require adding a few more records at CloudFlare, including MX records
  • Set up an SES rule to accept incoming email and store messages in S3
  • Use a script like this one to poll S3 for new messages and deliver them via procmail

Cloudflare Cookie Caching Confusion

Recently I discovered a surprising behavior of Cloudflare that affects sites that use HTTP cookies to hold user access tokens:

CloudFlare’s non-Enterprise service tiers do not respect the “Vary: Cookie” HTTP header.

This impacts sites that:

  • Are relying on CloudFlare’s “Cache Everything” option
  • Expose both “logged in” and “not logged in” versions of the same URL, where the page content differs for logged-in and anonymous visitors
  • Use cookies to differentiate between logged-in and anonymous visitors

Under these conditions, a logged-in user who visits a URL that is also visible to anonymous traffic will sometimes be served the “not logged in” version of a page.

This is confusing because the user will not see their normal posting controls, and they may be asked to log in again unnecessarily.

This problem won’t affect a site where logged-in users always see different URLs, for example a banking or webmail service where anonymous visitors will never see the same pages as logged-in users.

The impact happens on sites like discussion forums where anonymous visitors have read-only access to the same pages where logged-in users hang out.

Why this happens

For URLs that are visible (in different forms) to logged-in and anonymous visitors, it is important for your web server to return a Vary: Cookie HTTP header, to tell caches like CloudFlare that a different version of the page will be returned depending on what cookies the browser sends.

Of course, pages visible only to logged-in users should also send a Cache-Control: private header, to prevent their content from being exposed to anyone other than the user who requested them.

CloudFlare is usually very good at obeying the letter of the relevant web standards, but in this case, it will ignore Vary: Cookie. This means that anonymous hits will poison the cache for logged-in users. CloudFlare will serve up the anonymous version of the page even though the browser is sending a valid logged-in cookie.

Mitigations

One cannot rely on Cache-Control: public/private alone from fixing the problem, because both anonymous and logged-in hits will have the exact same cache key. CloudFlare properly declines to cache responses marked Cache-Control: private, but as soon as it sees a Cache-Control: public response, it will be used to fulfill subsequent hits on the same key.

Proper fixes include:

  • Use a Page Rule to disable caching on the affected URLs. Of course, this means you do not benefit from the cache anymore.
  • Find a way to ensure logged-in users always view different URLs than anonymous traffic.
  • Upgrade to CloudFlare’s Enterprise service tier.

Cost Disease

From the article (not my own words):

“If some government program found a way to give poor people good health insurance for a few hundred dollars a year, college tuition for about a thousand, and housing for only two-thirds what it costs now, that would be the greatest anti-poverty advance in history. That program is called ‘having things be as efficient as they were a few decades ago’.”

Read more at Philip Greenspun’s blog:
https://blogs.harvard.edu/philg/2017/03/04/another-economic-sourpuss/

The original source is here:
https://slatestarcodex.com/2017/02/09/considerations-on-cost-disease/

Crude Linear Models Almost Always Outperform Human Judgment

Fascinating article – from 1979 – on how even a crudely-constructed linear model is almost always superior to human expert judgment on classification tasks:

Robyn M. Dawes, “The Robust Beauty of Improper Linear Models in Decision Making”

Is there still any role for people in decisionmaking?

But people are important. The statistical model may integrate the information in an optimal manner, but it is always the individual (judge, clinician, subjects) who chooses variables. Moreover, it is the human judge who knows the directional relationship between the predictor variables and the criterion of interest, or who can code the variables in such a way that they have clear directional relationships.

Now consider how good machine learning and AI algorithms have become at these parts of the task! Soon it might become optimal to delegate the whole process to machines.