PostgreSQL Lock Contention Diagnosis

Useful Postgres commands for diagnosing queries that are slow due to lock contention or deadlocks.

Find long-running queries:

select pid, now() - query_start, left(query, 100) from pg_stat_activity  where state != 'idle' order by query_start asc;

(to stop a query, use select pg_terminate_backend(pid) where ...)

Find locks that are blocking queries:

select relation::regclass, * from pg_locks where not granted;   

Find queries that are responsible for held locks:

SELECT
  COALESCE(blockingl.relation::regclass::text,blockingl.locktype) as locked_item,
  now() - blockeda.query_start AS waiting_duration, blockeda.pid AS blocked_pid,
  blockedl.mode as blocked_mode,
  blockinga.pid AS blocking_pid, left(blockinga.query,100) as blocking_query,
  blockingl.mode as blocking_mode
FROM pg_catalog.pg_locks blockedl
JOIN pg_stat_activity blockeda ON blockedl.pid = blockeda.pid
JOIN pg_catalog.pg_locks blockingl ON(
  ( (blockingl.transactionid=blockedl.transactionid) OR
  (blockingl.relation=blockedl.relation AND blockingl.locktype=blockedl.locktype)
  ) AND blockedl.pid != blockingl.pid)
JOIN pg_stat_activity blockinga ON blockingl.pid = blockinga.pid
  AND blockinga.datid = blockeda.datid
WHERE NOT blockedl.granted
AND blockinga.datname = current_database();

Information Disclosure Vulnerability in the Google Cloud Speech-to-Text API

Abstract

A vulnerability was discovered in the Google Cloud Speech-to-Text API that allowed a user to read transcription results belonging to a different, unrelated user. This vulnerability could be exploited with a simple “curl” command.

Background

Google’s Cloud Speech-to-Text API offers several different modes of operation, including an asynchronous mode designed to handle long-form transcription jobs. A typical use for this mode is transcribing an audio podcast to enable searching it for keywords.

To perform transcription in this mode, the user first authenticates to Google Cloud and submits the transcription job using the REST endpoint “speech:longrunningrecognize”. This returns an opaque ID for the job, in the form of a short string.

After a job has been submitted, the user can check its status by sending the job ID to the “operations/get” endpoint, which returns metadata about the job. After the job finishes, this same endpoint also returns the complete transcription result, including full text, confidence values, timestamps, etc.

The Speech-to-Text API also offers an “operations/list” endpoint, which returns a list of ongoing and completed transcription job IDs.

Vulnerability

While testing the Speech-to-Text API, I discovered that querying the “operations/list” endpoint with no parameters would return transcription job IDs belonging to jobs I did not own. Furthermore, querying any of these IDs using the “operations/get” endpoint would return the results of these jobs, including full-text transcriptions.

Below is an example “curl” session demonstrating the issue. Job IDs have been randomized and transcription data redacted.

Query the “operations/list” endpoint:

curl -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
     -H "Content-Type: application/json; charset=utf-8" \
"https://speech.googleapis.com/v1/operations"

Returns:

{
  "operations": [
    {
      "name": "5201534545415957057"
    },
    {
      "name": "2479283670382910446"
    },
    ...
}

None of these operation IDs correspond to jobs submitted by the user making this request, or to any other user in the same Google Cloud customer account.

Query the “operations/get” endpoint for one of these IDs:

curl -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
     -H "Content-Type: application/json; charset=utf-8" \
     "https://speech.googleapis.com/v1/operations/5201534545415957057"

Returns:

{
  "name": "5201534545415957057",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",
    "progressPercent": 100,
    "startTime": "2019-01-01T00:00:00.000000Z",
    "lastUpdateTime": "2019-01-01T00:00:00.000000Z"
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeResponse",
    "results": [
      {
        "alternatives": [
          {
            "transcript": "*** FULL TEXT HERE ***",
            "confidence": 0.96096134,
          }
        ]
      },
    ]
  }
}

The portion marked “FULL TEXT HERE” represents full-text transcription data that did not belong to the user making this request.

I did not save these transcription results because I was not sure who owned this data, and it seemed potentially sensitive. For example, one transcription I glanced at was a discussion of financial matters for a business.

Analysis

It was not clear whether these unrecognized transcription results were some kind of Google test data, or actual jobs submitted by other customers.

One possible explanation is that these jobs were Google-provided sample data, perhaps belonging to a not-yet-released tutorial to introduce developers to the Speech-to-Text API.

However, I feared that they might have been actual transcription results belonging to an unrelated Google Cloud customer. Given the potential sensitivity of this issue, I thought it would be prudent to report it as a possible security vulnerability.

Disclosure & Resolution Timeline

On April 19, 2019, I disclosed my findings to Google via the security bug reporting tool.

The issue was assigned the following tracker URL: https://issuetracker.google.com/issues/130907659

On April 24, 2019, Google’s Security Team triaged this report and forwarded it to the VRP.

On October 12, 2019, Google reported that all bugs associated with this report had been fixed.

On January 2, 2020, following another round of testing which still appeared to show unintended information disclosure, Google reported that some fixes had been inadvertently reverted, and were now re-applied.

OpsDomain.com

Pricing at cloud providers like AWS, Azure, and Google is incredibly opaque. It’s hard to dig through their web UIs to figure out exactly how much you’ll be paying for CPUs, memory, GPUs, and other services.

I’m starting to build a price-comparison service at https://opsdomain.com

The idea is to show apples-to-apples comparisons across cloud providers, with a simple interface that shows only a single row of prices. No gigantic spreadsheet grids!

Check out early results like Cloud GPU Pricing Comparison and AWS EC2 pricing sheet.

Feel free to reach out to me on LinkedIn or Twitter if there’s a specific analysis you would like to see!

MongoDB Scaling Tips

Quick list of things to check for optimizing MongoDB performance:

  • Use NVMe SSD storage (on AWS, this means i2/i3 instances)
  • Use WiredTiger storage engine
  • Use at least some OS swap space (< size of RAM is okay, but at least a few GB seems to help prevent OOM kills)
  • Check for warning messages in the mongod startup log
  • Don’t run slow analytics queries on the production database (set up streaming ETL to a separate database)
  • Don’t run backups on the master of a replica set
  • Check working set size vs RAM and disk
  • Batch reads/writes where possible
  • Optimize write concern settings
  • Try to avoid linear growth of working set size (e.g. by laying off historical data to cold storage)
  • Check query plans for mis-optimized queries
  • Add parallelism as necessary (collection/database/service)

Add Envkey to a Docker app in Kubernetes, without rebuilding the image

I’m a big fan of using Envkey to manage parameters and credentials across different services like AWS, GCP, Docker, and Kubernetes.

One minor drawback of Envkey is that the normal methods of integrating it require you to modify either your app code, to invoke the language-appropriate envkey library, or your Dockerfile, to download and run the envkey-source binary.

This makes it difficult to use community-maintained Docker images. If you want to use Envkey, you have to fork your own version of the Dockerfile, modify it to inject Envkey, then rebuild and host the modified image yourself.

On Kubernetes, however, I discovered a neat method to integrate Envkey with any unmodified Docker image. Kubernetes provides an initContainers feature that runs a separate container attached to your pod as it starts up. We can specify an initContainer that downloads Envkey and saves the environment variables in a place where the main container can pick them up.

The Deployment specification with Envkey looks like this:

spec:
    # create a volume to share between
    # envkey and the main container
    volumes:
      - name: envkey
        emptyDir: {}
     initContainers:
      # here is the Envkey container
      - name: envkey
        image: appropriate/curl
        command:
          - "sh"
          - "-c"
          # use curl to download, install, and run envkey
          - "curl -s \
          https://raw.githubusercontent.com/envkey/envkey-source/master/install.sh \
          | /bin/sh && envkey-source > /envkey/export-env && chmod -R 777 /envkey"
          # at this point, the variables will be in a file called
          # /envkey/export-env
          # chmod/chown may be necessary depending on the user
          # account under which the main container runs.
        volumeMounts:
          - name: envkey
            mountPath: "/envkey"
        env:
          # use a Kubernetes secret to hold the ENVKEY itself
          - name: ENVKEY
            valueFrom:
              secretKeyRef:
                key: ENVKEY
                name: my-envkey
containers:
      - name: my-main-container
        image: my-docker-image
        volumeMounts:
          - name: envkey
            mountPath: "/envkey"
        command:
          - "sh"
          - "-c"
          # note: you have to override the standard container
          # command here, to load the envkey variables first.
          # Check the Dockerfile for the appropriate CMD.
          - ". /envkey/export-env && \
          my-docker-image-entrypoint"

A couple of notes:

  • We have to load the Envkey variables into the main container’s process environment before invoking the standard entry point. This means we have to peek at the Dockerfile that was used to build the container so that we can duplicate the same command. (there does not seem to be a way to over-ride just part of the entry point).
  • This also requires that /bin/sh is available in the Docker container. Most public images do have this.
  • The environment variables won’t be updated while the container is running. This shouldn’t be an issue if you are following an immutable-infrastructure philosophy.
  • You need a way to carry the ENVKEY itself into the init container’s environment. A Kubernetes secret is a great way to do this.

Amazon EKS Ingress Guide

This post explains how to set up Ingress for Kubernetes on Amazon EKS and make your Kubernetes services available to the internet.

What is Ingress?

Services you deploy in your Kubernetes cluster are not, by default, visible to anyone outside the cluster.

An Ingress is a special type of Kubernetes object that exposes one or more Services to the internet. It’s an abstraction that covers load balancing, HTTP routing, and SSL termination.

While many Kubernetes resources are “write once, run anywhere” on any cloud platform, Ingress behaves quite differently on different platforms. This is because each platform has its own way of hooking up Kubernetes services to the outside internet.

Most Kubernetes platforms implement some basic Ingress functionality out of the box. You can also add a third-party ingress controller to augment or replace the platform’s basic features.

In Kubernetes, an ingress controller is not the same as an Ingress resource. An ingress controller runs on your cluster, waits for you to create Ingress resources, and manages controller-specific handling for HTTP requests based on those Ingress specs.

When the controller notices an Ingress that is marked with certain special annotations, it comes to life and creates new resources to implement the ingress flow. This might include Kubernetes pods containing reverse proxies, or an external load balancer. Most controllers provide some configuration parameters in the form of controller-specific annotations you can apply to Ingress resources.

Currently, Amazon EKS ships with only a very basic ingress system. For practical purposes, you will almost certainly want to install a more powerful ingress controller.

Here are some possible choices:

Option #1: Kubernetes Service with type: LoadBalancer

This is the “native” ingress option supported by EKS. It does not actually use the Ingress resource at all. Just create a Kubernetes Service and set its type to “LoadBalancer”, and then EKS will deploy an ELB to receive traffic on your behalf.

This approach has a few drawbacks:

  • Each Service spawns its own ELB, incurring extra cost.
  • You cannot link more than one Service to a DNS hostname, because ELBs offer no ability to route to different targets based on HTTP Host headers or request paths.
  • Most recent Kubernetes codebases have already switched to the newer Ingress system and do not configure themselves with externally-visible Services.
  • You miss the flexibility of modern ingress controllers like Nginx, including features like automatic TLS certificate management and OAuth security mix-ins.

Option #2: alb-ingress-controller

This is a third-party project (Helm chart here) that spawns ALBs to correspond to specially-marked Ingress resources in Kubernetes. It tries to automatically manage ALB target groups and routing rules to match the specs it sees on each Ingress resource.

Advantages:

  • Works with the new Ingress resources rather than raw Services.
  • Multiple Services can share one Ingress using host-based or path-based routing, if you set up the Ingress specs manually. (but note, this is different from the one-Ingress-per-service model that you will find in most Kubernetes documentation and public Helm charts).
  • Includes support for lots of tweakable options on ALBs and target groups, like security groups and health checks.

But there are some drawbacks too:

  • Creates a new ALB for each Ingress resource. So if you follow the common pattern where each Kubernetes Service has its own dedicated Ingress, then you will end up with multiple ALBs instead of one shared ALB.
  • Doesn’t always maintain the links between target groups and worker nodes properly. Sometimes it fails to get a target group into a “healthy” state on start-up, or drops live nodes from an active target group for no apparent reason. (Anecdotally, I have found the ALB ingress controller to be more reliable when its target type is set to “instance” rather than “pod”).

A brief note about health check settings on ALB target groups: by default, ALBs want to see a “200 OK” response on HTTP requests to “/” before enabling a target group. If you are just setting up your cluster, you might not yet have a service set up to respond to “/” requests. This will prevent any target group from registering as “healthy”, even if you have working endpoints on other paths. As a temporary fix, you could configure the ALB to accept “404 Not Found” as a healthy response.

Option #3: nginx-ingress-controller with ELB support

Recent versions of the standard Nginx ingress controller (Helm chart here) now have the ability to create AWS ELBs to accept traffic. I have not tried this approach because it doesn’t offer as much flexibility as alb-ingress-controller.

Note, however, that you will find many Kubernetes guides on the web that assume you are using the Ngninx ingress controller, because it is platform-neutral and includes nice flexibility for routing and manipulating the traffic passing through it.

A working compromise: alb-ingress-controller + Nginx

For a practical Kubernetes cluster on Amazon, I recommend a combination of two ingress controllers: alb-ingress-controller to serve as the first hop, plus nginx-ingress-controller for final routing.

The advantage of this configuration is that you can use one ALB for the whole cluster, and still benefit from the standardized and flexible Nginx-based configuration for individual Services.

With a single ALB, you minimize the ongoing cost, plus have the ability to run multiple services on a single DNS CNAME.

To set this up, deploy both ingress controllers into Kubernetes. The standard Helm charts work fine. Then manually create a single ALB Ingress resource that deploys one ALB for the whole cluster, with all the AWS-specific settings like health checks and ACM-managed TLS certificates. This main Ingress will have only one route, which forwards all traffic to the nginx-ingress-controller Service.

Here is an example of what the Kubernetes manifest for this singleton controller might look like:

The Nginx ingress controller, when used in this mode, does not create any AWS resources and is not visible to the internet. It just sits there functioning as a reverse proxy, passing requests from the ALB inward to other services in the cluster that register with it.

All Nginx configuration comes from Kubernetes Ingress resources. You can feel free to set up multiple independent services each with their own Ingress resource, and the Nginx controller will cleverly merge their routing rules into a single Nginx configuration for its reverse-proxy pods.

So far I have only noticed a couple of minor issues with this approach:

  • alb-ingress-controller sometimes drops healthy targets out of a target group, as mentioned above.
  • Because there is an extra layer of proxying along the request path, you have to be careful about which HTTP headers to pass inward and outward, and how to correctly track client IP addresses and HTTP vs. HTTPs in X-Forwarded-* headers.

Important note on IAM Roles & Garbage Collection

All of the above options involve pods within the Kubernetes cluster creating AWS resources like ELBs, ALBs, and Target Groups. This has two important implications:

First, you need to give permission for Kubernetes workers to manage ELBs/ALBs, for example by adding them to the IAM role used by the worker machines.

Second, beware that these resources often don’t get cleaned up automatically. You will have to do some manual garbage collection from time to time. ELBs/ALBs are of particular concern because you pay every hour they are running, even if they are not receiving traffic.

Conclusion

I hope you have found this information helpful! Feel free to contact me at dmaas@maasdigital.com.

Notes on Kubernetes setup with Terraform on Amazon EKS

Terraform

  • Terraform may not garbage-collect AWS load balancers and security groups that are created as side effects of EKS operation. (e.g. creating a Service with type LoadBalancer means EKS will go and create an actual ELB for you). During terraform destroy, these might have to be deleted manually to unblock a full cleanup.
  • It IS possible to give Terraform control of manifests and Helm resources inside the cluster. This requires a hack to fetch an authentication token using Heptio as an external script, which you can then feed to the “kubernetes” provider. Eventually Terraform might gain support for exec-based authentication, and then this will be smoother.

Kubernetes on EKS

  • Authentication is done by a command heptio-authenticator-aws (which queries AWS for a token), called from kubectl. Generally you are expected to create a kubectl config file for access to each new cluster. Implicitly this creates dependencies on kubectl (~/.kube/config) and heptio (~/.aws/credentials).
  • Ingress support is extremely limited. EKS can create Services with type LoadBalancer as ELBs, but does not do anything with Ingress resources. You have to install something extra, like alb-ingress-controller in order to get Ingress working. This breaks many off-the-shelf Helm charts that expect Ingress to work normally.
  • Helm charts seem like a busted early version of Dockerfiles.
  • I have not figured out HTTPS yet. alb-ingress-controller does allow AWS managed certificates for SSL termination. Again this breaks off-the-shelf Helm charts.

Halyard

  • Halyard’s Docker image is busted due to a typo in the URL for heptio-authenticator-aws. Fix in my pull request.
  • Running Halyard/Spinnaker on EKS seems to be a bleeding-edge configuration with some issues. kubectl proxy doesn’t seem to perform authentication or figure out URL paths for the Spinnaker Deck GUI front-end.

Minimum Viable Community Management Toolkit

The tools you will need to protect an open online community from abuse:

  • Abuse Detection
    • Manual & automated monitoring
    • Automated tools must be able to adapt to new forms of abuse as they emerge
    • User-visible “Report abusive content” feature
      • Fills gaps in internal content monitoring
      • Helps teach you what your specific community considers “abusive.”
    • Review queue for internal team
      • Collect & prioritize abuse reports
      • Suggest appropriate resolutions based on history of user’s past behavior
      • Must be as quick/easy as possible, since it will be scanned frequently
      • (maybe) Provide feedback to users when reports are resolved. Be careful, this can lead to frustration if user disagrees with your judgment.
    • Muting tools
      • Time-limited and permanent mute (read-only mode)
      • Provides “cooling-off” time without permanent harm
    • Banning tools
      • Time-limited and permanent bans
      • (maybe) “Shadowbans” to slow down adversary response
    • Reputation Database
      • Prevent abusers from returning under different names/accounts
      • Track: IP addresses, email addresses, VPNs, social network accounts, browser/device fingerprints
    • Anti-Fraud Firewall
      • Close off channels that abusers use to target the community
      • Anonymizing Proxies/VPNs, Throw-away email providers, Datacenters, Country-level blocks, rate limits
    • Identity verification to guard posting privileges
      • e.g., social network login or SMS phone line
      • Note: do not rely on Google or Facebook OAuth alone to authenticate identity. They are bad at this.

    Nice-to-have improvements:

    • Honeypot / “Lightning Rod”
      • Divert troublemakers to a well-confined area
    • Pro-active detection & response
      • Look for signs of incoming abuse before it happens
      • Deflect in a positive direction, or pre-emptively mute

New Domain Parking / Set-Up Notes

Steps to “park” a new domain with email and HTTP service. Total cost is ~$12/year assuming you already have a web server set up.

Domain Registration and DNS

  • Register domain with Amazon Route53 ($12/year for .com)
    • Delete the public “Hosted Zone” ($6/year) since CloudFlare will be used for hosting DNS
    • No Route53 Hosted Zone is necessary, unless you want to run a VPC with its own private view of the domain, in which case there needs to be a private Hosted Zone.
  • Create CloudFlare free-tier account for DNS hosting
    • Change Amazon Route53 DNS server settings over to CloudFlare
    • CloudFlare settings that you might want to adjust:
      • Crypto/SSL Policy: see below
      • Always Use HTTPS: On (unless you need fine-grained control over HTTP→HTTPS redirection)

HTTPS

Assume you have a web server that will respond to HTTP requests on the new domain.

  • Option 1: Direct Connection (CloudFlare ingress and SSL termination, but no SSL to the origin)
    • Use a single-host A/CNAME record in CloudFlare
    • CloudFlare will handle SSL termination, but must be used in “Flexible” crypto mode which reverts to HTTP when talking to the origin server.
  • Option 2a: Proper SSL Setup with AWS load balancer (~$240/year) and its built-in certificate
    • Create an EC2 load balancer with a certificate appropriate for the domain
    • Use a CNAME record in CloudFlare pointing to the load balancer’s dualstack.my-lb-1234566-... DNS name
    • Now you can enable CloudFlare’s “Full” crypto mode
  • Option 2b: Proper SSL Setup with Let’s Encrypt (free)
    • TBD – needs some kind of containerized HTTP server that updates the certificate automatically

Email Forwarding

It is important to be able to receive email addressed to [email protected], for example to respond to verification emails for future domain transfers or SSL certificate issuance.

Email forwarding can be set up for free using Mailgun:

  • Create Mailgun free-tier account on the top-level domain
  • Add the necessary DNS records for Mailgun at CloudFlare (domainkey and MX servers)
  • In Mailgun’s “Routes” panel, create a rule that matches incoming email to [email protected] and forwards it as necessary

Email Reception

If you actually want to receive (not just forward) incoming email, either use Gmail on the domain, or the following (nearly-free) AWS system:

  • In Amazon SES, add and verify the domain
    • This will require adding a few more records at CloudFlare, including MX records
  • Set up an SES rule to accept incoming email and store messages in S3
  • Use a script like this one to poll S3 for new messages and deliver them via procmail