Cloud Center of Excellence &#8211; FireMon.com

Customers

Partner Directory
Partner Portal
Technology Partners

Partners

Company

The Tragedy of Security Dies on the Crucible of DevOps

FireMon — Fri, 14 Aug 2020 13:57:50 +0000

The Tragedy of Security Dies on the Crucible of DevOps

Security ain’t what it used to be. Or perhaps it’s always been this way and it merely seems different due to the slow degradation of my youthful idealism.

Security is bifurcated. Down one path we strive to keep our organizations secure. To stop bad actors, protect data and assets, and defend users from threats and even their own mistakes. Down the other path lies compliance. Ensuring the organization meets regulatory, contractual, and other standards. Across both paths we manage risks; the risks of breaches and downtime, or the risks of regulatory fines.

The tragedy of security is a reflection of The tragedy of the commons. From Wikipedia:

The tragedy of the commons is a situation in a shared-resource system where individual users, acting independently according to their own self-interest, behave contrary to the common good of all users by depleting or spoiling the shared resource through their collective action.

Most of security compliance is designed to reduce security risk. At least on paper. But over time we’ve seen standard after standard, regulation after regulation, decouple risk from compliance. The very nature compliance standards prevents organizations from tailoring security measures to organizational risks. I’m not saying this is always true, but it is largely true, especially as you scale into larger organizations. There is no evidence that requiring password resets every 90 days for users with MFA or other conditional restrictions now in common use improves security. The person who invented password complexity requirements literally regrets it and says they don’t work. There is neither evidence nor a sound scientific basis for requiring non-default encryption of all storage volumes in a cloud provider.

Security is a shared, finite resource. We only have so much money, so many security professionals, and so much time non-security staff can dedicate to security at the expense of their other goals. The more we pull towards compliance the less of this pool we have for security. The less compliance aligns with security, the fewer efforts reduce security risks.

This is the tragedy of security. We decoupled risk from compliance, making lack of compliance the risk itself. The greater real risks are decoupled from compliance, the more rigid the compliance regime, the fewer security resources are available for defense, and the less support for security efforts from other teams.

A great example came up in a conversation with Chris Farris. To paraphrase,

Developers do care about security. It’s compliance that is an irritation. Help them secure their application and they’ll support it. Tell them to take a 4 hour outage on a weekend to re-build their database with encryption because a compliance wonk demands it and you’re just pissing them off.

I’m not saying all compliance is dumb rules, but some compliance rules are dumb, and the misapplication of good rules decoupled from risks is really dumb.

DevOps becomes the crucible for the future of security because in the world of cloud and DevOps individual application teams become responsible for the entire stack, including much of security. New servers, networks, and firewalls are a mere git commit and some API calls away. This also places a larger burden on those teams to manage their own security and compliance. We still have centralized security and shared security resources, but when it comes to the tip of the spear we absolutely rely more on the DevOps teams. We can’t create a big DMZ for the cloud. The lines between internal and external networks can no longer be categorized into cookie-cutter zones. Many applications build on native cloud services don’t even have a network anymore and rely on IAM rules and resource policies written in JSON pushed by application teams through infrastructure as code.

Thus a fair few of my recent cloud and DevOps-focused compliance projects are more about doing good security first, and then figuring out how to write a report that makes it looks like it meets the letter of compliance even though it doesn’t, even when it’s more secure.

There are two potential solutions.

The first is to revise security standards to better fit cloud and DevOps and reduce the number of dumb or inapplicable rules. Some of this work is happening, but I’ve come to believe this requires a generational shift we don’t have time to wait for. Let’s not give up, but we don’t have to wait.

The other is to use automation to remove as much of the security and compliance burden from individuals as possible, while still allowing them the freedom and control to build fast. I’m not suggesting the overall pool of security becomes smaller, but that we use automation and other technologies to reduce the individual requirement to burn cycles on the less valuable things.

It’s about time and focus of limited resources. And really, what’s more valuable… a good penetration test or a compliance audit? Now look at how much you spend on penetration testing and how much you pay for an audit.

Get 9x
BETTER

Book your demo now

Sign Up Now

Customers

Partner Directory
Partner Portal
Technology Partners

Partners

Company

Advanced Techniques for Defending AWS ExternalID and Cross-Account AssumeRole Access

FireMon — Tue, 14 Jul 2020 13:54:08 +0000

Advanced Techniques for Defending AWS ExternalIDs and Cross-Account AssumeRole Access

Last month Kesten Broughton at Praetorian Security released some great research on third party cloud security products using Amazon’s preferred cross-account connection technique – AWS IAM Assume Role Vulnerabilities Found in Many Top Vendors. The opening paragraph is a solid overview of the research:

In this first blog in our series on cross-account-trust we will present the results from 90 vendors showing that 37% had not implemented the ExternalId correctly to protect against confused-deputy attacks. A further 15% of vendors implemented the AWS account integration in the UI correctly, but the ExternalId parameter was not properly validated on the backend, making those sites also vulnerable. We will finish by discussing the new attack surfaces exposed by the AWS cross-account-assume-role trust. We conclude that vendors and clients should critically examine whether role trust is the best trust mechanism for their multi-tenant SaaS solution.

My first response when I read the research was, “why would anybody make a such a bad decision”, but the reality is that the entire concept of a “cloud security expert” is relatively new and while AWS is decent about talking about the confused deputy problem, they don’t cover some of the practical implication issues you don’t really think about until your product contacts the customer. Cross account connections using AssumeRole have a straightforward engineering solution, but without proper threat modeling it is extremely easy to make the mistakes documented in Kesten’s research.

Here at DisruptOps we made some smart decisions early on based on our initial threat modeling that kept us safe. However, we did pick up some tidbits in the research and are adding even more hardening. Plus we have a skunkworks project to wipe out a large portion of the problem anyway and still enable automation at scale without requiring direct write access into customer environments.

I do, however, have one major point of disagreement with the post. I’ll take auto-rotated credentials any day over the recommendation to use static credentials and vaulting. We still have to use them for other cloud providers and are always looking at ways to NOT have static credentials ANYWHERE in our environment.

The Problem

A range of different application types need to connect directly to cloud APIs these days. It could be as simple as accessing an S3 bucket or as complex as a fully automated Cloud Detection and Response platform (I mean, you know, as a random example). These requests require some kind of credentials and those are either static (like a username and password, or an IAM access key and secret key) or dynamic. Dynamic credentials include a token or other ephemeral attribute that is time-limited.

Static application credentials that allow direct access to your cloud management plane are… bad. We can solve this for users with MFA but that doesn’t help us for automated applications like our not-so-theoretical Cloud Detection and Response platform. A while ago Amazon Web Services tackled this problem with the concept of an IAM Role. A role in AWS is essentially a container for permissions which has two policies: what the container can do, and who (or what) can assume the role. Roles are then session based, so when an authorized entity assumes the role they have a set of credentials (access key, secret key, and session token) they can use for the length of the session (from 1 to 24 hours).

Roles in AWS can be assumed through external SAML connection (for users), internal “trusted” connections (from other AWS accounts), or AWS services (like an EC2 instance or Lambda function). Thus you can do cool things like run code in an instance but that instance never stores static credentials. This really removes a lot of headaches.

Allowing connections from an account you don’t control is a bit different; especially if that account is a platform that serves multiple customers. Platforms like that (okay, us) need access to hundreds or thousands of other AWS accounts. Imagine if an attacker could trick the platform into doing something in the wrong account, such as performing a configuration assessment on an account the current user doesn’t own. This is a short version of the confused deputy problem – the deputy is trusted by multiple accounts and the user tricks the deputy into giving them access to an account the current user should never touch.

The most practical exploit for this is if the platform allows a user to enter an account ID for an account they don’t control, add it to their profile, and then abuse the trust given to the platform.

AWS includes a mechanism to defend against this called the AWS ExternalID. This is an arbitrary shared secret the platform provider and the customer exchange out of band. This ID is an attribute passed on during any request to assume the role in the customer account, and the role trust policy in that account has a conditional requirement that checks that the shared secret is correct. When implemented properly this means someone can’t just add an account they don’t control to the deputy since the attacker can’t establish or know the shared secret in the customer account.

Unless…

You see where this is headed. Praetorian identified a large number of providers who use default AWS ExternalIds, or allow customers to set a non-unique ExternalId value for their whole account. They also found products that didn’t validate that the customer even enforces the External ID in their role trust policy. Praetorian found platforms that were actively exploitable, allowing confused deputy attacks due to poor security with cross account roles.

Hardening Cross Account Connections

This was all part of our threat modeling at DisruptOps so we are in good shape, although we did pick up another idea or two to make things better. Let’s walk through each issue identified by Praetorian and look at the best options for security hardening:

AWS ExternalID uses default settings: We use a random ExternalID.
AWS ExternalID is shared across multiple customer accounts:We use a random ExternalID on a per-account basis, not a per-customer basis.
AWS ExternalID is enumerable or guessable: We use fully random/long AWS ExternalIDs. I’m not worried about our PRNG choice thanks to API rate limiting (for you crypto nerds).
Customers can set their own ExternalID and may repeat or use weak ExternalIDs: We do not support customer’s setting their own ExternalID, although we have definitely had this request. This is where I think some other providers have gotten in trouble. Customers want the capability so they can automate product provisioning themselves; but to do this safely we would need to validate the ExternalID meets length and randomness requirements. With the right validation it can probably be done safely.
The platform allows the same AWS account to be provisioned multiple times: We don’t allow this.
The IAM Role is not restricted to a single entity in the provider account, but to any role in the account: We haven’t discussed this in this post, but you can either grant access from any role in the account to the “worker” role in the customer account, or grant access to a single resource or role. We limit our access to the specific roles we use for cross-account access.
The IAM Role name is static across all customer accounts: The risk here is essentially that the username (role name) is guessable. I don’t consider this a risk at all unless there are other very fundamental mistakes in place. One example is that if an attacker knows the role name and they gain access to the customer account they could potentially add themselves to the role trust policy and then use those privileges (I teach something like this in my incident response training classes). This is a potential risk but one we rate pretty low. If an attacker has that level of access they can likely take over any role in the account.
The provider does not validate that the ExternalID was set as a condition of access: We provide customers a CloudFormation Template for provisioning that ensures this is set properly. We will be adding support to validate that it stays there, especially as we continue to add other (non-CloudFormation) provisioning options to meet customer requirements.

Kesten seems to prefer the static credential model and use of good vaulting on the provider side for secure the connection. Personally I disagree and I think the AWS model is more secure out of the gates, assuming you follow the basic precautions.

We currently take two additional precautions beyond the recommendations by Praetorian:

We mask the ExternalID in CloudFormation: We set the ExternalID as a parameter in our CloudFormation templates with the NoEcho option set. This masks it in the console, command line tools, or API. This reduces the risk of exposure to someone with permissions to run CloudFormation that does not have IAM permissions.
We restrict access to the CloudFormation template to only the target account: This reduces the risk that someone could sweep in and grab the ExternalID during the provisioning process.

Even if the ExternalID is exposed it shouldn’t matter – your platform should ensure that a target account is only registered once, and only with a random ExternalID. Even if an attacker knows the Role Name and the ExternalID they can’t do anything with it since there isn’t anyplace in the platform to enter that information, and AWS themselves enforce that the cross account connection originates from the trusted account. That isn’t something you can spoof.

Hopefully seeing how we handle this gives you some ideas on what to do in your own tooling. I recommend the account registration and random ExternalID requirements even if you use AWS Organizations since adding a condition to restrict access from only your own organization can still open you up to attack from a lower-security account to a higher-security account.

And stay tuned for that new skunkworks project! We should be able to talk about it in a few months and it’s a real game changer for these kinds of problems.

Get 9x
BETTER

Book your demo now

Sign Up Now

Customers

Partner Directory
Partner Portal
Technology Partners

Partners

Company

The Overly Complex Way CloudTrail and CloudWatch Events Work Together

FireMon — Mon, 17 Feb 2020 17:24:58 +0000

The Overly-Complex Way CloudTrail and CloudWatch Events Work Together

One of the most vexing issues in my cloud journey has been understanding how CloudTrail and CloudWatch Events work together. For some reason it took me years (and a lot of testing) to wrap my head around how the connection really works; and especially how it works with the concept of multi-regions and AWS Organization trails. Then, once I figured it all out, I assumed everyone already knew, but recent conversations have made clear this confusion is pretty common. So here is my best attempt to simplify things.

First, the problem we are trying to solve: build CloudWatch Rules based on CloudTrail Events and use them to send notifications or trigger Lambda functions.

That’s it — I want to send a notification for something simple like the API call to open up a new security group rule (AuthorizeSecurityGroupIngress, in case you were wondering).

To make this work you need the following in place:

CloudTrail enabled in the region where the API call is made.
CloudTrail streaming to CloudWatch.
A CloudWatch Rule in the region of the API call which looks for that specific API call, or all CloudTrail API calls.

Now for the confusion:

When you create either a multi-region CloudTrail or an Organization trail, behind the scenes AWS is actually setting up trails in every single region (and every account, in the case of an Org trail). They are all separate trails, but each is configured to send its results to a shared S3 bucket, and you can only manage each one in its home account and region.
However CloudWatch events for API calls are only created in the region of the API call.
So if you create a multi-region trail the data is all collected centrally, but the events only appear locally. A CloudWatch Rule in the region of the home trail will only trigger for API calls made in that (home) region. So if you build an alarm for security group changes, it will only work in the home region — not in the other regions — even though CloudTrail is turned on.
The CloudWatch Log Group/Stream will appear in the primary region, not other regions, but each Event are created in the region which triggered the event.
If you want to collect all events for API calls you need to use an undocumented event definition (which I have pasted below).
If you read Amazon’s documentation… they never spell any of this out clearly. At least not that I have been able to find. In fact, I was once on a support call where I figured it out, and the AWS rep kept mumbling, “I don’t think that’s how it works” as my events started streaming in. He was having a rough night.

This was really non-intuitive to me for some reason. I had assumed that if you centralized the trail then you could centralize the CloudWatch Rule to trigger off API call events. Unfortunately that was totally incorrect, and even when you centralize the trail, you still need to create the Rule in every region you care about. Even if you use Event Bus to collect events from multiple accounts, you still need to create a CloudWatch Rule in every region of every account to send the event onto the Bus, and then you need to build Rules in every region of the Event Bus to trigger whatever notification/action you want.

Here is how I recommend approaching this if you want the near-real-time alerting capabilities or auto-remediation/actions supported by CloudWatch Rules:

Turn on a multi-region trail. You only need to do this once, and an Organization trail is sufficient.
This creates all the regional trails you need. It looks like one central trail, but is really a collection of regional trails sending their data to a central receiver.
Option 1: Create a CloudWatch Rule in every region you want near-real-time alerting for. CloudFormation and Terraform are your friends here.
Option 2: Centralize all your events. Within each region create a Rule to send all CloudTrail events to a Lambda function or SNS topic, which then forwards them to your destination. We use this technique ourselves; we send using a custom API endpoint, but you can stream to Kinesis or nearly anything.

To kickstart your journey here are two code samples.

The secret filter pattern for your CloudWatch Rule to collect all events from CloudTrail:

{ "detail-type": [ "AWS API Call via CloudTrail" ] }

And here is sample Lambda code to forward events, in this case, to Kinesis:

import json import boto3 def lambda_handler(event, context): kinesis = boto3.client('kinesis', region_name='us-west-2') data = json.dumps(event) print(data) response = kinesis.put_record( StreamName='cloudsec_prod_stream_alert_kinesis', Data=data, PartitionKey='test-client-id' ) print(response) return { 'statusCode': 200, 'body': json.dumps('Record added') }

Get 9x
BETTER

Book your demo now

Sign Up Now

Customers

Partner Directory
Partner Portal
Technology Partners

Partners

Company

Check out the first part of this post from Netflix for some examples.

Breaking Attacker Kill Chains in AWS: IAM Roles

FireMon — Fri, 02 Aug 2019 17:16:34 +0000

Over the past year I’ve seen a huge uptick in interest for concrete advice on handling security incidents inside the cloud, with cloud native techniques. As organizations move their production workloads to the cloud, it doesn’t take long for the security professionals to realize that the fundamentals, while conceptually similar, are quite different in practice. One of those core concepts is that of the kill chain, a term first coined by Lockheed Martin to describe the attacker’s process. Break any link and you break the attack, so this maps well to combining defense in depth with the active components of incident response.

In cloud deployments we have four major categories of attack, each with different kill chains:

Attacks on the cloud platform itself. Ignoring a fundamental compromise of the cloud provider (outside of the cloud customer’s hands) these attacks typically focus on misconfigurations of cloud services. If you leave an S3 bucket public, fail to put an authorizer on an API Gateway, or expose your credentials for AWS on GitHub, it falls in this category.
Attacks on customer-deployed resources and applications in the cloud. These traditional attacks are no different than those run against your data center. Common examples include SQL injection in a web application, and vulnerable servers with the wrong ports open to the Internet. These tend to be a bit more constrained than they would be against a data center, assuming you use accounts/subscriptions/projects and VPCs or virtual networks to limit blast radius.
Attacks against your cloud administrators and developers. Next time you run a penetration test make sure you let the attackers try and phish your developers and admins. This is one of the best ways to pivot into cloud since it’s often a lot easier for an attacker to gain access to a developer system than to break the cloud application itself. We’ll cover this in the future, but let’s just start with saying “MFA is my friend”.
Blended attacks. This is the category we are going to focus on today. In these attacks the threat actor breaks into something deployed in the cloud and then uses that to pivot into the cloud management plane. (Some would consider attacks against developers to be blended, but I like to break them out separately).

As a rule of thumb I always start by assuming any successful attack at any level can escalate or pivot into a blended attack, at which point your management plane security and incident response become your best defenses.

Today I’m going to focus on one of the most common blended attack processes and outline a mix of detective and preventative controls to help break the kill chain. Before I get into the details, do not take this post of an over-simplification of a complex problem. Managing what I’m about to discuss at scale is incredibly difficult even when you know what you are doing.

Over the next few weeks we will be rolling out our first Ops specifically designed for these issues to our early access customers and we should have them in production relatively soon after that.

The Blended AWS Attack: Extracting IAM Role Credentials

In a blended attack the threat actor breaks into something more traditional and then uses that to pivot into the cloud management plane. There are three primary ways this occurs. In each case the attacker extracts either static stored credentials or ephemeral IAM role credentials, which we will explain in a minute.

Direct compromise of an instance or a container. For example, if you leave port 22 open and the attacker can hack in or otherwise gain shell access.
Server Side Request Forgery (SSRF). The attacker takes advantage of (typically) a web server/service vulnerability and can execute commands without gaining shell access.
Compromise of a Lambda function. Although you can’t gain a shell on a Lambda, they are still subject to code execution vulnerabilities and even arbitrary code execution if they hold an application flaw. The exact implications can resemble SSRF.

In each case the attacker’s goal is to gain credentials to the AWS management plane and then leverage existing privileges or escalate privileges. We will discuss privilege escalation in a future post, and for now will focus on what those credentials are and how you can prevent their abuse.

Most people understand static credentials; which in AWS are an Access Key and a Secret Key. They are like a username and password but are used for AWS API calls. The current version uses a cryptographic process known as Signature 4 for HTTP request signing when you make those API calls. You can and should treat them just like a username and password — and you should never store them within cloud resources such as instances and Lambdas.

IAM roles are tricker when you first get started in AWS, they are both awesome and scary. An IAM Role in AWS is effectively a container for permissions that you use for a session. IAM Roles are great because they aren’t credentials per se… when you assume the role AWS provides a set of credentials for a time limited session. Roles are an “inside AWS only” thing. You can assign them to resources within AWS (like an instance or a Lambda function) and that resource can now make API calls without static, stored credentials! We use roles for federated identity connections, instances, Lambda functions, and every other service within AWS. Access keys are really only when you create a user in an AWS account, we use roles for everything else.

Roles have four permission types associated with them:

What the role can do within AWS. These are the straight up permission policies you attach to the role.
Who or what can use the role (the trust policy). Creating a role doesn’t mean anything or anyone can use it, this policy restricts access to, for example, AWS instances or a specific Lambda function.
A permission boundary to limit the scope of the role. This is a bit more complex and not relevant to our discussion today so we will cover it later.
When you assume a role for a session, you can also specify a subset of your existing permissions to use for that session. This is a cool feature for least privilege, but also not totally relevant for our discussion today.

It’s probably easier to explain how this works by walking through it. Suppose I have an application that needs to access an S3 bucket or a Dynamo Database. I create an IAM role for the instance and set the trust policy so the EC2 service can use the role. Then I launch an instance and assign the role. AWS runs the instance and has the instance assume the role. Assuming the role opens up a session and assigns an access key, a secret key, and a session token. AWS then rotates those credentials every 1-6 hours and the instance can now make those API calls authorized by the permissions policies.

While the credentials aren’t in the instance, the credentials are still accessible to the instance. Any code running inside needs to know the credentials to make the actual API calls to access S3 and Dynamo so something known as the metadata service provides them on demand The metadata service is a special thing in AWS for instances and containers that holds all the information about how it is configured. It’s pretty darn important for a server to be able to get its IP address, for example.

This is where the attack comes in.

The metadata service is simply a url you can access that returns the requested information. curl 169.254.169.254/latest/meta-data/ will provide all the basic information, and you can use the path curl 169.254.169.254/latest/meta-data/iam-security-credentials/ will provide the access, secret, and token. (In the case of a Lambda-based attack this all looks different and you use SDK code instead of curl, but the same principles apply).

The attacker can then copy those credentials and use them someplace else where they embed them in tools instead of having to load and run code on the compromised server. Also, being URL based, it opens the metadata service up to a wider range of SSRF attacks since you don’t need full arbitrary code execution. The credentials will expire at some point, but depending on the attack they might just come back and get a new set when they see the current ones stop working.

Smart attackers these days will use the credentials in an AWS account they control since Amazon has some tooling to detect credentials extracted and used outside their known address ranges.

Breaking the IAM Role Extraction Kill Chain

Let’s map out the kill chain. The attacker needs to do the following:

Discover and exploit a vulnerability in an instance, container, or Lambda that allows them to access the role credentials. This is pretty much always a mistake on the customer side… such as failing to patch, opening up the wrong ports, or deploying vulnerable code.
Extract the current role credentials.
Successfully run allowed API calls in an environment under their control.
Do something bad within the allowed IAM role’s permission policy scope. I mean, probably bad, it isn’t like most attackers patch your code for you.

The following techniques can break different links in the chain and include a mix of detective and preventative controls. Don’t feel bad if this looks overwhelming… very, VERY few of the organizations I work with implement these comprehensively, especially at scale.

6 techniques to help break different links in the attack chain

Vulnerability Management

Complexity: Moderate
Effectiveness: Low
Scalability: Hard
Type: Detective and Preventative

No surprise, your start should be wiping out all the initial vulnerabilities and misconfigurations the attacker can use to pivot and grab credentials. I only rated the complexity as moderate since there is nothing new or cloud-specific about this. But I also rate the effectiveness as low since it isn’t like comprehensive vulnerability management has prevented the legions of breaches over the past decades. Simple in concept, incredibly complex at scale.

Least Privilege IAM Permissions Policies with Resource Restrictions

Complexity: Moderate
Effectiveness: High
Scalability: Moderate to Hard
Type: Preventative

IAM policies in AWS are default deny and include explicit allow and deny statements. For example, you can write a policy that only allows access to read an S3 bucket. They also include resource restrictions, so where the allow statement authorizes the role to call the read API, the resource restriction only allows the role to read specific buckets or objects. You should always ALWAYS start your defenses here. When I perform assessments I find, on nearly every single project, IAM policies allowing too many privileges (the API calls) and too few resource restrictions. Yes, that service might need access to a Dynamo database, but does it need access to every table? This particular control is not too bad to implement at small scale but the bigger you are, and the more humans are making these policy decisions, the harder it is to be consistent at scale. It’s also important to add explicit deny statements in case someone adds a new policy with new permissions to the role. Permissions are cumulative, but any deny statements override allow statements.

Use IP, VPC, or other Request Origin Conditional Restrictions in the Permissions Policies

Complexity: High
Effectiveness: Moderate to High
Scalability: Hard
Type: Preventative

IAM policies support conditional statements that support a range of options including the IP address or source VPC. If you know a particular role should only ever make API calls from a specific resource in your application stack, you can lock the authorization to that exact IP address or subnet. If the attacker steals the credentials and tries to run them someplace else the API calls will fail. This is a precision guided sledgehammer – easy in concept and hard in execution since you might find other complexities interfere with proper implementation. For example, API calls to AWS services either hit the Internet directly, run through an NAT Gateway, or route internally with a Service Endpoint (which we will discuss in a moment). The IP address detected will depend on the route of the API call to the Internet. These are all manageable and detectable (and automatable) but you’ll want to read up first to make sure you understand the permutations.

Unless you run your Lambda function on a VPC this won’t be an option to protect a compromised function.

Use Service Endpoints with Policies + Resource Policies

Complexity: Moderate
Effectiveness: Moderate to High
Scalability: Moderate
Type: Preventative

In AWS a service endpoint is like a tap on the network that takes traffic that would normally go over the Internet to an AWS service and re-routes it internally. They were originally created to allow fully private subnets in AWS, those without any way of reaching the Internet, to still access certain AWS services. Endpoints support policies you can use to restrict access and actions in ways that are very similar to IAM policies. In this case you add restrictions to the endpoint policy to only allow access to specific resources behind that endpoint (S3 is the most common example). You specify which buckets are allowed, and no other resource in that subnet has access to that service. Think of this is a back stop to the IAM policy- with a service endpoint using a restrictive policy even if someone accidentally (or deliberately) allows the role broader access than it should have, it still can’t access anything not allowed in the service endpoint policy. This means we now have three layers of policies, all of which need to allow access to the resource:

The IAM permission policy that allows the role access to the resource.
The service endpoint policy that allows access to the resources when requests come through the endpoint, regardless of the permissions of the role used.
The bucket or resource policy (this depends on the kind of resource) which can restrict access from only the approved IP addresses.

Unless you run your Lambda function on a VPC this, again, won’t be an option to protect a compromised function.

Add Metadata Proxies with HTTP User Agent Filters (Metadata Service Protection)

Complexity: High
Effectiveness: Moderate
Scalability: Hard
Type: Preventative

All of these controls assume an attacker can steal the role credentials, but what if we had a way to reduce their ability to get those credentials even if they compromise the authorized instance or container? (This technique won’t work for Lambda functions). An emerging option is to restrict access to the metadata service in the first place. Although there have been some attempts to do this with IPTables that could also break required functionality for the code you have running on the instance. In November of 2018 AWS and Netflix worked together and started adding user data for API calls made from AWS SDKs to the HTTP headers. This is a defense against SSRF since most SSRF attacks rely on tricking an application to make HTTP requests on behalf of the attacker, but those requests usually come from a command line tool like curl or another process and will lack the user data header that comes from the AWS SDKs. To make this work you need to insert a proxy for those requests. There are some open source options for instances and containers, including proxies that run on the instance instead of requiring you to route traffic to a virtual appliance or squid proxy.

This technique will not work if the attacker compromises the host instance and runs a shell, since they can disable the Roxy or hijack the approved process.

You can read all about it in the details of this Netflix post.

Duplicate Role Usage Protection

Complexity: High
Effectiveness: High
Scalability: High
Type: Detective

This is another one that comes from the team at Netflix. They released a how-to for an excellent technique to detect when an IAM role is being used in an unauthorized location, even within AWS. I highly recommend you read the linked post, but the short version is they combine CloudTrail logs with some other tooling to keep a table of which instances are using which roles from which IP addresses. They then monitor other API calls to find when a role is being re-used from a new IP address at the same time it is in use from an approved one. Using this method you don’t need to know all your IP addresses in use throughout the organization, you dynamically build a table of what’s in use and detect when that role is being used someplace else at the same time. This is extremely scalable since you can run the logic centrally if you centralize CloudTrail, which is a common best practice anyway.

Summary

This is yet another beast of a post, and we don’t expect everyone can implement every one of these options in all deployments. To simplify, let’s walk through the IAM Role abuse kill chain:

Discover and exploit a vulnerability in an instance, container, or Lambda that allows them to access the role credentials. This is pretty much always a mistake on the customer side… such as failing to patch, opening up the wrong ports, or deploying vulnerable code.
- Vulnerability management (including tools like SASST and DAST for applications) and assessing your cloud configuration (using tools like DisruptOps or open source tools like Prowler and CloudMapper) are your fist defense.
Extract the current role credentials.
- Metadata Service Protection and vulnerability management
Successfully run allowed API calls in an environment under their control.
- Duplicate Role Usage Detection, Use IP, VPC, or other Request Origin Conditional Restrictions in the Permissions Policies, Use Service Endpoints with Policies + Resource Policies
Do something bad within the allowed IAM role’s permission policy scope. I mean, probably bad, it isn’t like most attackers patch your code for you.
- Least Privilege IAM Permissions Policies with Resource Restrictions

Hopefully this gives you a better picture of how to reduce the success of these kinds of attacks.

Get 9x
BETTER

Book your demo now

Sign Up Now

Customers

Partner Directory
Partner Portal
Technology Partners

Partners

Company

Something You Probably Should Include When Building Your Next Threat Models

FireMon — Mon, 12 Nov 2018 14:08:38 +0000

We are working on our threat models here at DisruptOps, so I decided to refresh my knowledge of different approaches. One thing that quickly stood out is that nearly none of the threat modeling documentation or tools I’ve seen covers the CI/CD pipeline.

This. Is. A. Problem. Include your pipeline in your threat models.

Over the past few years I’ve performed a few dozen cloud security assessments directly, on top of a variety of other advisory work. I consistently include development/deployment pipelines within scope, and they are often where I see some of the larger security issues. Your super-secure cloud environment is a bit of a house of cards if someone can modify fundamental infrastructure by changing a template somewhere or by compromising stored keys.

Most threat models start with a data flow diagram or architecture, which you can use to walk through your app modeling threats (I like STRIDE myself, which comes from Microsoft). This covers application functionality but not the pipeline.

When using your threat model *du jour* for your pipeline, treat your developers/admins as users, and also treat the pipeline itself as a user if it has stored credentials (considering how it connects to your environment).

For example, can someone spoof an API call into Jenkins to trigger a change in your production application? (Probably, the way Jenkins is often configured). How about repudiation of job changes vs. code changes? Privilege escalation in the pipeline?

Pipelines don’t tend to withstand security scrutiny very well, so we will address them in future posts on security fundamentals. It probably won’t shock you to learn I tend to recommend guardrails on both pipeline deployments and any connected infrastructure to reduce risk. For example your CI server should never be publicly exposed or exposable. You could use reinforcing guardrails to ensure it is on a network segment without an Internet Gateway and that your security groups don’t allow public access (better yet, also ensure CI servers are always behind elastic load balancers).

The days of an application’s attack surface being limited to its components are long over. In the cloud most of our applications are fed by pipelines, which have tremendous potential to be a soft underbelly, thanks to deep access and stored credentials.

Get 9x
BETTER

Book your demo now

Sign Up Now

Customers

Partner Directory
Partner Portal
Technology Partners

Partners

Company

The 4 Phases to Automating Cloud Management

FireMon — Tue, 30 Oct 2018 14:05:50 +0000

A Security Pro’s Cloud Automation Journey

Catch me at a conference and the odds are you will overhear my saying “cloud security starts with architecture and ends with automation.” I quickly follow with how important it is to adopt a cloud native mindset, even when you’re bogged down with the realities of an ugly lift and shift before the data center contract ends and you turn the lights off. While that’s a nice quip, it doesn’t really capture anything about how I went from a meat and potatoes (firewall and patch management) kind of security pro to an architecture and automation cloud native. Rather than preaching from the mount, I find it more useful to describe my personal journey and my technical realizations along the way. If you’re a security pro, or someone trying to up-skill a security pro for cloud, odds are you will end up on a very similar path.

Phase 1: Automating Configurations

For me it all started about nine years ago, when I was asked to build the first training program for the Cloud Security Alliance. Early on I realized we needed repeatable labs, which could run anyplace in the world, with both students and instructors possessing skills ranging from “developer” to “paper-pushing auditor”. In those days Amazon Web Services hadn’t really rolled out IAM and VPCs were private networks only. And concepts like Infrastructure as Code were just becoming feasible.

So there I was, trying to figure out how to build a hands-on application stack lab in the cloud for thousands of students. Consistently, *and* be able to update as AWS advanced their technology. At the time making your own AMIs was still a chore, but then I learned the wonders of `cloud-init`. A simple script I could host in an S3 bucket, with two small lines students could paste into the User Data field of their instances, which would configure the instances exactly as needed on launch. And when software updates broke things, I only needed to update that script at the published URL, and every new instance would use the new configuration — magic! While this wouldn’t help with patching anything running, it enabled me to maintain a good first-run experience, far more easily than updating and publishing new AMIs. And, in an act of utter reputational recklessnesses, you can still see the later version of one here on S3.

My first step was `cloud-init`. It isn’t something I use any more, but it was eye-opening that I could script an entire server and have it all run, using copy and paste and a single hosted file.

Phase 2: Automating Workflows

But the next step was far more impactful. After a couple years running hands-on trainings and building my own workloads, I started to play with the idea of Software Defined Security. Sitting in front of me was a cornucopia of cloud APIs, all whispering “call me” in my ears. I started looking for examples and found… nothing. Even Security Monkey wasn’t publicly released yet.

I had a class coming up for the Black Hat security conference, and I decided to use it as an excuse to learn Ruby and the AWS APIs (via the Ruby SDK). I ended up writing three demonstrations:

An incident response application that would quarantine an instance, analyze all its metadata, lock it down using AWS IAM, image all the storage, and launch a forensics analysis server ready to analyze the attached snapshots. This did in 3 seconds what used to take me 30 minutes.
A small app that would connect to AWS and Chef and identify all instances not running Chef (‘unmanaged’ servers). A process that could take weeks in a traditional datacenter.
Another app that would open security groups to a Qualys scanner, trigger a scan, and close the security group when it was done.

I hadn’t coded Ruby before so all three took about two months of part-time work to get up and running. They were pretty simple, but I learned some valuable lessons.

Managing credentials was critical, and also made it harder to share the code and get others to configure their environments correctly. Pulling from configuration files was… annoying. Especially for things like which security group in which region to use as the quarantine group.
Ruby on my local system worked fine, but then I would blow out service limits and had to insert delay timers when I ran the code in an instance in AWS. API service limits are not your friends.
All of these were really static. As slick as they were for demos, it still came down to manually running code from a desktop or instance. That hasn’t aged well.

I packaged these up as “SecuritySquirrel”, and you can find the 2014 versions on GitHub. Believe it or not, those aren’t even the originals I used for a couple years before posting.

Phase 3: Automating the Cloud Itself

When AWS released Rules for CloudWatch I slammed together enough Python code in about 2 hours the following Saturday morning to reverse any security group change within 10-15 seconds — including filters to scope defense based on tags, the VPC, or who requested the change. You can download the code and instructions, and unlike my Ruby code, this still works pretty well for 3-year-old cloud code.

Since that first demonstration I’ve built a library of event-driven automations running in Lambda, some of which you can download. In that package my favorite is `identify\_internet\_facing\_servers.py`, which, for demo purposes, I linked to trigger when I click an IoT version of an Amazon Dash button. That’s right, I carry around an actual physical Easy button in my pocket. It finds any instances with port 22 open to the Internet, and with a double-click of the button I can revoke the rules, getting a text message on my phone when everything is all safe and cozy.

My key lesson here was unexpected. It wasn’t that these event-driven automations replaced my host-based workflows, it’s that they served a different purpose. I realized I had moved on from building workflows to help me do things more quickly, to building guardrails to keep things safe in the background. Both have incredible value.

Phase 4: Automating Everything

My most recent work has been on using Jenkins and Infrastructure as Code (mostly CloudFormation) to enhance security. This combination enables me to automate security into the infrastructure and applications themselves and to rely less on external tools.

For example, I released a simple credentials scanner to run in Jenkins and find any stored access keys before even starting the build. Why wait and try to suss them out later? I then wrote some other test harnesses to allow me to run basically any assessment tool I want in Jenkins, and fail builds when they fail any security test, like a network scan (pro tip: Jenkins will fail a build if you send it any exit code other than 0 from a script).

Coming back around, we now run the training class using CloudFormation templates to build out all the elements of the application stack so students can focus on adding security. We went from building consistent training servers to consistent training environments, with custom AMIs we can update in minutes… globally… with very little effort, and all software pre-installed and ready for final configuration.

My cloud journey started around nine years ago and my automation journey nearly at the same time. I started by building things and then trying to automate pieces, but now start with the assumption of automation. My earliest work was about operations, but these days it is nearly all focused on security. On melting away the operational overhead and allowing the security parts of my mind to focus on what they are best at. Along the way I’ve also learned that not all automation is created equal; that there are places for guardrails, workflows, cross-platform orchestration, infrastructure as code, and automating pipelines. All of these now deliver nearly unimaginable security benefits, but we are still very much in the early days, when you can lose a week just reversing a poorly documented API.

If you are in security, time to get your code game on. If you are a developer or ops, time to get your security game on. Because the biggest lesson of all is that the days of security as an umbrella are over, and the days of security in the fabric are here.

Get 9x
BETTER

Book your demo now

Sign Up Now

Customers

Partner Directory
Partner Portal
Technology Partners

Partners

Company

Consolidating Config Guardrails with Aggregators

FireMon — Mon, 22 Oct 2018 14:02:01 +0000

In Quick and Dirty: Building an S3 guardrail with Config we highlighted one of the big problems with Config: you need to set it up in each region of each account. Your best bet to make that manageable is to use infrastructure as code tools like CloudFormation to replicate your settings across environments. We have a lot more to say on scaling out baseline security and operations settings, but for this post I want to highlight how to aggregate Config into a unified dashboard.

Earlier this year AWS came out with Config Aggregators. This allows you to centralize Config data and rules into a single view. You still need to configure Config separately in each account and region, but aggregators can provide a unified view of your resources and rule compliance. If you haven’t worked much with Config remember that it is a change management tool, tracking configuration state over time, and Rules is merely one feature to find compliant and non-compliant resources. With an aggregator you also get to view the full configuration state of the monitored resources over time.

Set up your first aggregator

Setting up an aggregator is easy. First, pick which account and region you want to use as your dashboard. You probably don’t want to set up your enterprise-wide aggregator in a developer’s playground account.

Then its a simple as going to Config -> Aggregated view -> Add aggregator:

You have two options:

Add individual account IDs to add accounts one by one.
Add my organization to add all accounts in your organization.

For each, you need to understand the next steps to make it work. Ideally you enable this for your entire organization and all regions (and check the box to add future regions).

For individual accounts added you need to log into Config in each of those accounts and authorize the connection. Here’s an AWS-provided screenshot of what that looks like:

For aggregation to work in an Organization, you basically need to turn on all the AWS Organizations features and ensure you authorize (and create if needed) the new IAM role to manage the data aggregation:

Assuming everything works this will now replicate all your data from the other accounts and regions into a single dashboard view. The local accounts still need to have Config configured and will still have access to their own data, but this does allow you to keep an eye on everything centrally.

Practically speaking, if you have more than a handful of accounts you should implement this with automation. Either infrastructure as code or programmatic automation (like our Ops) can wire all of this together over API. Even if you use automation tools like us we still recommend Config for its change management capabilities, but you may or may not want to aggregate everything depending on how you manage your environment operationally.

Config aggregation is a great enhancement and relatively simple to set up — the trick is to pick your strategy and ensure all your IAM is set up properly, and then remember to authorize each request on both sides. If you are worried about wiring in your entire organization, keep in mind this only pulls data when Config itself is set up in each account. We’d like to give you definitive strategic advice but this one really does depend a lot on how you use (or plan to use) Config and either pattern is completely viable.

Get 9x
BETTER

Book your demo now

Sign Up Now

Customers

Partner Directory
Partner Portal
Technology Partners

Partners

Company