Add Envkey to a Docker app in Kubernetes, without rebuilding the image

I’m a big fan of using Envkey to manage parameters and credentials across different services like AWS, GCP, Docker, and Kubernetes.

One minor drawback of Envkey is that the normal methods of integrating it require you to modify either your app code, to invoke the language-appropriate envkey library, or your Dockerfile, to download and run the envkey-source binary.

This makes it difficult to use community-maintained Docker images. If you want to use Envkey, you have to fork your own version of the Dockerfile, modify it to inject Envkey, then rebuild and host the modified image yourself.

On Kubernetes, however, I discovered a neat method to integrate Envkey with any unmodified Docker image. Kubernetes provides an initContainers feature that runs a separate container attached to your pod as it starts up. We can specify an initContainer that downloads Envkey and saves the environment variables in a place where the main container can pick them up.

The Deployment specification with Envkey looks like this:

spec:
    # create a volume to share between
    # envkey and the main container
    volumes:
      - name: envkey
        emptyDir: {}
     initContainers:
      # here is the Envkey container
      - name: envkey
        image: appropriate/curl
        command:
          - "sh"
          - "-c"
          # use curl to download, install, and run envkey
          - "curl -s \
          https://raw.githubusercontent.com/envkey/envkey-source/master/install.sh \
          | /bin/sh && envkey-source > /envkey/export-env && chmod -R 777 /envkey"
          # at this point, the variables will be in a file called
          # /envkey/export-env
          # chmod/chown may be necessary depending on the user
          # account under which the main container runs.
        volumeMounts:
          - name: envkey
            mountPath: "/envkey"
        env:
          # use a Kubernetes secret to hold the ENVKEY itself
          - name: ENVKEY
            valueFrom:
              secretKeyRef:
                key: ENVKEY
                name: my-envkey
containers:
      - name: my-main-container
        image: my-docker-image
        volumeMounts:
          - name: envkey
            mountPath: "/envkey"
        command:
          - "sh"
          - "-c"
          # note: you have to override the standard container
          # command here, to load the envkey variables first.
          # Check the Dockerfile for the appropriate CMD.
          - ". /envkey/export-env && \
          my-docker-image-entrypoint"

A couple of notes:

  • We have to load the Envkey variables into the main container’s process environment before invoking the standard entry point. This means we have to peek at the Dockerfile that was used to build the container so that we can duplicate the same command. (there does not seem to be a way to over-ride just part of the entry point).
  • This also requires that /bin/sh is available in the Docker container. Most public images do have this.
  • The environment variables won’t be updated while the container is running. This shouldn’t be an issue if you are following an immutable-infrastructure philosophy.
  • You need a way to carry the ENVKEY itself into the init container’s environment. A Kubernetes secret is a great way to do this.

Amazon EKS Ingress Guide

This post explains how to set up Ingress for Kubernetes on Amazon EKS and make your Kubernetes services available to the internet.

What is Ingress?

Services you deploy in your Kubernetes cluster are not, by default, visible to anyone outside the cluster.

An Ingress is a special type of Kubernetes object that exposes one or more Services to the internet. It’s an abstraction that covers load balancing, HTTP routing, and SSL termination.

While many Kubernetes resources are “write once, run anywhere” on any cloud platform, Ingress behaves quite differently on different platforms. This is because each platform has its own way of hooking up Kubernetes services to the outside internet.

Most Kubernetes platforms implement some basic Ingress functionality out of the box. You can also add a third-party ingress controller to augment or replace the platform’s basic features.

In Kubernetes, an ingress controller is not the same as an Ingress resource. An ingress controller runs on your cluster, waits for you to create Ingress resources, and manages controller-specific handling for HTTP requests based on those Ingress specs.

When the controller notices an Ingress that is marked with certain special annotations, it comes to life and creates new resources to implement the ingress flow. This might include Kubernetes pods containing reverse proxies, or an external load balancer. Most controllers provide some configuration parameters in the form of controller-specific annotations you can apply to Ingress resources.

Currently, Amazon EKS ships with only a very basic ingress system. For practical purposes, you will almost certainly want to install a more powerful ingress controller.

Here are some possible choices:

Option #1: Kubernetes Service with type: LoadBalancer

This is the “native” ingress option supported by EKS. It does not actually use the Ingress resource at all. Just create a Kubernetes Service and set its type to “LoadBalancer”, and then EKS will deploy an ELB to receive traffic on your behalf.

This approach has a few drawbacks:

  • Each Service spawns its own ELB, incurring extra cost.
  • You cannot link more than one Service to a DNS hostname, because ELBs offer no ability to route to different targets based on HTTP Host headers or request paths.
  • Most recent Kubernetes codebases have already switched to the newer Ingress system and do not configure themselves with externally-visible Services.
  • You miss the flexibility of modern ingress controllers like Nginx, including features like automatic TLS certificate management and OAuth security mix-ins.

Option #2: alb-ingress-controller

This is a third-party project (Helm chart here) that spawns ALBs to correspond to specially-marked Ingress resources in Kubernetes. It tries to automatically manage ALB target groups and routing rules to match the specs it sees on each Ingress resource.

Advantages:

  • Works with the new Ingress resources rather than raw Services.
  • Multiple Services can share one Ingress using host-based or path-based routing, if you set up the Ingress specs manually. (but note, this is different from the one-Ingress-per-service model that you will find in most Kubernetes documentation and public Helm charts).
  • Includes support for lots of tweakable options on ALBs and target groups, like security groups and health checks.

But there are some drawbacks too:

  • Creates a new ALB for each Ingress resource. So if you follow the common pattern where each Kubernetes Service has its own dedicated Ingress, then you will end up with multiple ALBs instead of one shared ALB.
  • Doesn’t always maintain the links between target groups and worker nodes properly. Sometimes it fails to get a target group into a “healthy” state on start-up, or drops live nodes from an active target group for no apparent reason. (Anecdotally, I have found the ALB ingress controller to be more reliable when its target type is set to “instance” rather than “pod”).

A brief note about health check settings on ALB target groups: by default, ALBs want to see a “200 OK” response on HTTP requests to “/” before enabling a target group. If you are just setting up your cluster, you might not yet have a service set up to respond to “/” requests. This will prevent any target group from registering as “healthy”, even if you have working endpoints on other paths. As a temporary fix, you could configure the ALB to accept “404 Not Found” as a healthy response.

Option #3: nginx-ingress-controller with ELB support

Recent versions of the standard Nginx ingress controller (Helm chart here) now have the ability to create AWS ELBs to accept traffic. I have not tried this approach because it doesn’t offer as much flexibility as alb-ingress-controller.

Note, however, that you will find many Kubernetes guides on the web that assume you are using the Ngninx ingress controller, because it is platform-neutral and includes nice flexibility for routing and manipulating the traffic passing through it.

A working compromise: alb-ingress-controller + Nginx

For a practical Kubernetes cluster on Amazon, I recommend a combination of two ingress controllers: alb-ingress-controller to serve as the first hop, plus nginx-ingress-controller for final routing.

The advantage of this configuration is that you can use one ALB for the whole cluster, and still benefit from the standardized and flexible Nginx-based configuration for individual Services.

With a single ALB, you minimize the ongoing cost, plus have the ability to run multiple services on a single DNS CNAME.

To set this up, deploy both ingress controllers into Kubernetes. The standard Helm charts work fine. Then manually create a single ALB Ingress resource that deploys one ALB for the whole cluster, with all the AWS-specific settings like health checks and ACM-managed TLS certificates. This main Ingress will have only one route, which forwards all traffic to the nginx-ingress-controller Service.

Here is an example of what the Kubernetes manifest for this singleton controller might look like:

The Nginx ingress controller, when used in this mode, does not create any AWS resources and is not visible to the internet. It just sits there functioning as a reverse proxy, passing requests from the ALB inward to other services in the cluster that register with it.

All Nginx configuration comes from Kubernetes Ingress resources. You can feel free to set up multiple independent services each with their own Ingress resource, and the Nginx controller will cleverly merge their routing rules into a single Nginx configuration for its reverse-proxy pods.

So far I have only noticed a couple of minor issues with this approach:

  • alb-ingress-controller sometimes drops healthy targets out of a target group, as mentioned above.
  • Because there is an extra layer of proxying along the request path, you have to be careful about which HTTP headers to pass inward and outward, and how to correctly track client IP addresses and HTTP vs. HTTPs in X-Forwarded-* headers.

Important note on IAM Roles & Garbage Collection

All of the above options involve pods within the Kubernetes cluster creating AWS resources like ELBs, ALBs, and Target Groups. This has two important implications:

First, you need to give permission for Kubernetes workers to manage ELBs/ALBs, for example by adding them to the IAM role used by the worker machines.

Second, beware that these resources often don’t get cleaned up automatically. You will have to do some manual garbage collection from time to time. ELBs/ALBs are of particular concern because you pay every hour they are running, even if they are not receiving traffic.

Conclusion

I hope you have found this information helpful! Feel free to contact me at dmaas@maasdigital.com.

Notes on Kubernetes setup with Terraform on Amazon EKS

Terraform

  • Terraform may not garbage-collect AWS load balancers and security groups that are created as side effects of EKS operation. (e.g. creating a Service with type LoadBalancer means EKS will go and create an actual ELB for you). During terraform destroy, these might have to be deleted manually to unblock a full cleanup.
  • It IS possible to give Terraform control of manifests and Helm resources inside the cluster. This requires a hack to fetch an authentication token using Heptio as an external script, which you can then feed to the “kubernetes” provider. Eventually Terraform might gain support for exec-based authentication, and then this will be smoother.

Kubernetes on EKS

  • Authentication is done by a command heptio-authenticator-aws (which queries AWS for a token), called from kubectl. Generally you are expected to create a kubectl config file for access to each new cluster. Implicitly this creates dependencies on kubectl (~/.kube/config) and heptio (~/.aws/credentials).
  • Ingress support is extremely limited. EKS can create Services with type LoadBalancer as ELBs, but does not do anything with Ingress resources. You have to install something extra, like alb-ingress-controller in order to get Ingress working. This breaks many off-the-shelf Helm charts that expect Ingress to work normally.
  • Helm charts seem like a busted early version of Dockerfiles.
  • I have not figured out HTTPS yet. alb-ingress-controller does allow AWS managed certificates for SSL termination. Again this breaks off-the-shelf Helm charts.

Halyard

  • Halyard’s Docker image is busted due to a typo in the URL for heptio-authenticator-aws. Fix in my pull request.
  • Running Halyard/Spinnaker on EKS seems to be a bleeding-edge configuration with some issues. kubectl proxy doesn’t seem to perform authentication or figure out URL paths for the Spinnaker Deck GUI front-end.

Minimum Viable Community Management Toolkit

The tools you will need to protect an open online community from abuse:

  • Abuse Detection
    • Manual & automated monitoring
    • Automated tools must be able to adapt to new forms of abuse as they emerge
    • User-visible “Report abusive content” feature
      • Fills gaps in internal content monitoring
      • Helps teach you what your specific community considers “abusive.”
    • Review queue for internal team
      • Collect & prioritize abuse reports
      • Suggest appropriate resolutions based on history of user’s past behavior
      • Must be as quick/easy as possible, since it will be scanned frequently
      • (maybe) Provide feedback to users when reports are resolved. Be careful, this can lead to frustration if user disagrees with your judgment.
    • Muting tools
      • Time-limited and permanent mute (read-only mode)
      • Provides “cooling-off” time without permanent harm
    • Banning tools
      • Time-limited and permanent bans
      • (maybe) “Shadowbans” to slow down adversary response
    • Reputation Database
      • Prevent abusers from returning under different names/accounts
      • Track: IP addresses, email addresses, VPNs, social network accounts, browser/device fingerprints
    • Anti-Fraud Firewall
      • Close off channels that abusers use to target the community
      • Anonymizing Proxies/VPNs, Throw-away email providers, Datacenters, Country-level blocks, rate limits
    • Identity verification to guard posting privileges
      • e.g., social network login or SMS phone line
      • Note: do not rely on Google or Facebook OAuth alone to authenticate identity. They are bad at this.

    Nice-to-have improvements:

    • Honeypot / “Lightning Rod”
      • Divert troublemakers to a well-confined area
    • Pro-active detection & response
      • Look for signs of incoming abuse before it happens
      • Deflect in a positive direction, or pre-emptively mute

New Domain Parking / Set-Up Notes

Steps to “park” a new domain with email and HTTP service. Total cost is ~$12/year assuming you already have a web server set up.

Domain Registration and DNS

  • Register domain with Amazon Route53 ($12/year for .com)
    • Delete the public “Hosted Zone” ($6/year) since CloudFlare will be used for hosting DNS
    • No Route53 Hosted Zone is necessary, unless you want to run a VPC with its own private view of the domain, in which case there needs to be a private Hosted Zone.
  • Create CloudFlare free-tier account for DNS hosting
    • Change Amazon Route53 DNS server settings over to CloudFlare
    • CloudFlare settings that you might want to adjust:
      • Crypto/SSL Policy: see below
      • Always Use HTTPS: On (unless you need fine-grained control over HTTP→HTTPS redirection)

HTTPS

Assume you have a web server that will respond to HTTP requests on the new domain.

  • Option 1: Direct Connection (CloudFlare ingress and SSL termination, but no SSL to the origin)
    • Use a single-host A/CNAME record in CloudFlare
    • CloudFlare will handle SSL termination, but must be used in “Flexible” crypto mode which reverts to HTTP when talking to the origin server.
  • Option 2a: Proper SSL Setup with AWS load balancer (~$240/year) and its built-in certificate
    • Create an EC2 load balancer with a certificate appropriate for the domain
    • Use a CNAME record in CloudFlare pointing to the load balancer’s dualstack.my-lb-1234566-... DNS name
    • Now you can enable CloudFlare’s “Full” crypto mode
  • Option 2b: Proper SSL Setup with Let’s Encrypt (free)
    • TBD – needs some kind of containerized HTTP server that updates the certificate automatically

Email Forwarding

It is important to be able to receive email addressed to [email protected], for example to respond to verification emails for future domain transfers or SSL certificate issuance.

Email forwarding can be set up for free using Mailgun:

  • Create Mailgun free-tier account on the top-level domain
  • Add the necessary DNS records for Mailgun at CloudFlare (domainkey and MX servers)
  • In Mailgun’s “Routes” panel, create a rule that matches incoming email to [email protected] and forwards it as necessary

Email Reception

If you actually want to receive (not just forward) incoming email, either use Gmail on the domain, or the following (nearly-free) AWS system:

  • In Amazon SES, add and verify the domain
    • This will require adding a few more records at CloudFlare, including MX records
  • Set up an SES rule to accept incoming email and store messages in S3
  • Use a script like this one to poll S3 for new messages and deliver them via procmail

Cloudflare Cookie Caching Confusion

Recently I discovered a surprising behavior of Cloudflare that affects sites that use HTTP cookies to hold user access tokens:

CloudFlare’s non-Enterprise service tiers do not respect the “Vary: Cookie” HTTP header.

This impacts sites that:

  • Are relying on CloudFlare’s “Cache Everything” option
  • Expose both “logged in” and “not logged in” versions of the same URL, where the page content differs for logged-in and anonymous visitors
  • Use cookies to differentiate between logged-in and anonymous visitors

Under these conditions, a logged-in user who visits a URL that is also visible to anonymous traffic will sometimes be served the “not logged in” version of a page.

This is confusing because the user will not see their normal posting controls, and they may be asked to log in again unnecessarily.

This problem won’t affect a site where logged-in users always see different URLs, for example a banking or webmail service where anonymous visitors will never see the same pages as logged-in users.

The impact happens on sites like discussion forums where anonymous visitors have read-only access to the same pages where logged-in users hang out.

Why this happens

For URLs that are visible (in different forms) to logged-in and anonymous visitors, it is important for your web server to return a Vary: Cookie HTTP header, to tell caches like CloudFlare that a different version of the page will be returned depending on what cookies the browser sends.

Of course, pages visible only to logged-in users should also send a Cache-Control: private header, to prevent their content from being exposed to anyone other than the user who requested them.

CloudFlare is usually very good at obeying the letter of the relevant web standards, but in this case, it will ignore Vary: Cookie. This means that anonymous hits will poison the cache for logged-in users. CloudFlare will serve up the anonymous version of the page even though the browser is sending a valid logged-in cookie.

Mitigations

One cannot rely on Cache-Control: public/private alone from fixing the problem, because both anonymous and logged-in hits will have the exact same cache key. CloudFlare properly declines to cache responses marked Cache-Control: private, but as soon as it sees a Cache-Control: public response, it will be used to fulfill subsequent hits on the same key.

Proper fixes include:

  • Use a Page Rule to disable caching on the affected URLs. Of course, this means you do not benefit from the cache anymore.
  • Find a way to ensure logged-in users always view different URLs than anonymous traffic.
  • Upgrade to CloudFlare’s Enterprise service tier.

Cost Disease

From the article (not my own words):

“If some government program found a way to give poor people good health insurance for a few hundred dollars a year, college tuition for about a thousand, and housing for only two-thirds what it costs now, that would be the greatest anti-poverty advance in history. That program is called ‘having things be as efficient as they were a few decades ago’.”

Read more at Philip Greenspun’s blog:
https://blogs.harvard.edu/philg/2017/03/04/another-economic-sourpuss/

The original source is here:
https://slatestarcodex.com/2017/02/09/considerations-on-cost-disease/

Crude Linear Models Almost Always Outperform Human Judgment

Fascinating article – from 1979 – on how even a crudely-constructed linear model is almost always superior to human expert judgment on classification tasks:

Robyn M. Dawes, “The Robust Beauty of Improper Linear Models in Decision Making”

Is there still any role for people in decisionmaking?

But people are important. The statistical model may integrate the information in an optimal manner, but it is always the individual (judge, clinician, subjects) who chooses variables. Moreover, it is the human judge who knows the directional relationship between the predictor variables and the criterion of interest, or who can code the variables in such a way that they have clear directional relationships.

Now consider how good machine learning and AI algorithms have become at these parts of the task! Soon it might become optimal to delegate the whole process to machines.