AWS Security Mindmap

Amazon Web Services (AWS) has a great offering for their cloud services. It goes without saying that while you run your workload in the cloud, you want to ensure that it must be secured. To benefit their customers, AWS has built plenty of security tools in-house and also they comply to a myriad of industry standards such as PCI-DSS, HIPPA, FedRAMP/FISMA, just to name a few.

Due to the long list of security services of AWS, it can be sometimes overwhelming to identify what should one use for their use case. To solve that puzzle, AWS has come up with a Security White Paper. While this paper provides very details intended for the audience, to remember it for a longer duration – mind map is to the rescue.

I have come up with a mindmap of AWS Security Best practices. I am sure this may not be the first one in the AWS community but this serves my purpose, so keeping it here.

AWS Security Mind Map PDF



CISO Mindmap – Business Enablement

While doing some research about CISO function, noticed a very good MindMap created by Rafeeq Rehman.

While what he has come up with is mindmap, I will try to deconstruct this mindmap to elaborate more about the various functions performed by CISO.

Let’s begin:

  1. Business Enablement
  2. Security Operations
  3. Selling Infosec (internally)
  4. Compliance and Audit
  5. Security Architecture
  6. Project Delivery lifecycle
  7. Risk Management
  8. Governance
  9. Identity Management
  10. Budget
  11. HR and Legal
So why I numbered them and in the order?
I believe Business Enablement is the most important function of a CISO. If (s)he doesn’t know the business where (s)he operates, it will be a very difficult job to continue his duties as CISO. Consider a person coming from a technology background with no knowledge of Retail Business. If that person is hired as a CISO because (s)he knows the technology, that may not be a good deal. The only reason to become a successful CISO, one must know which business he is involved in. To understand the security function, he must understand the business climate.

If this retail business has a requirement of storing credit card information into their systems, CISO’s job is to make sure appropriate PCI-DSS controls are in place so the data doesn’t get into the wrong hands. While at the same time, making sure that PCI-DSS is not coming into the way of enabling the business to accept credit cards transactions. Yes, security is a requirement but not at the cost of not doing business.

That’s why I rate business enablement as a very important function as a CISO.

What are some of the way CISO can enable business to adopt technology and still not come in their way?

  • Cloud Computing
  • Mobile technologies
  • Internet of things
  • Artificial Intelligence
  • Data Analytics
  • Crypto currencies / Blockchain
  • Mergers and Acquisitions
We will review each of these items in details in the following blog posts.

TPM (Trusted Platform Module)

TPM or Trusted Platform Module as referred by TCG (Trusted Computing Group)  is a microcontroller used in Laptop and now also on servers to ensure the integrity of the platform. TPM can securely store artifacts used to authenticate the platform. These artifacts can include passwords, certificates, or encryption keys. A TPM can also be used to store platform measurements that help ensure that the platform remains trustworthy. Authentication (ensuring that the platform can prove that it is what it claims to be) and attestation (a process helping to prove that a platform is trustworthy and has not been breached) are necessary steps to ensure safer computing in all environments.

Above image depicts the overall function of TPM module. Standard use case I have seen is ensuring secure boot process of servers. Secure boot will validate the code run at each step in the process, and stop the boot if the code is incorrect. The first step is to measure each piece of code before it is run. In this context, a measurement is effectively a SHA-1 hash of the code, taken before it is executed. The hash is stored in a platform configuration register (PCR) in the TPM.

 TPM 1.2 only support SHA-1 algorithm 

Each TPM has at least 24 PCRs. The TCG Generic Server Specification, v1.0, March 2005, defines the PCR assignments for boot-time integrity measurements. The table below shows a typical PCR configuration. The context indicates if the values are determined based on the node hardware (firmware) or the software provisioned onto the node. Some values are influenced by firmware versions, disk sizes, and other low-level information.

Therefore, it is important to have good practices in place around configuration management to ensure that each system deployed is configured exactly as desired.

Register What is measured Context
PCR-00 Core Root of Trust Measurement (CRTM), BIOS code, Host platform extensions Hardware
PCR-01 Host platform configuration Hardware
PCR-02 Option ROM code Hardware
PCR-03 Option ROM configuration and data Hardware
PCR-04 Initial Program Loader (IPL) code. For example, master boot record. Software
PCR-05 IPL code configuration and data Software
PCR-06 State transition and wake events Software
PCR-07 Host platform manufacturer control Software
PCR-08 Platform specific, often kernel, kernel extensions, and drivers Software
PCR-09 Platform specific, often Initramfs Software
PCR-10 to PCR-23 Platform specific Software

So there are very good use case of TPM to ensure secure boot and integrity of hardware – who all are using TPM? There are many institutions who runs their private clouds have been seen using TPM chipset on their servers while many public clouds do not support TPM – why? that’s mystery!

Bare Metal – A dreary (but essential) part of Cloud

Recently I got a chance to attend Open Compute Summit 2016 in San Jose, CA. It was full of industry peers from web scale companies such as Facebook, Google, Microsoft along with many financial institutions like Goldman Sachs, Bloomberg, Fidelity, etc. Overall theme of this summit was to embrace the openness in hardware and embrace commodity hardware. 
From historical point of view, OCP was a project initiated by Facebook few years ago where they opened many of the hardware components – motherboard, power supply, chassis, rack, later switch etc. as they needed things at scale and doing it using branded servers (pre-cut for enterprise by HP, Dell, IBM) wasn’t going to cut for them – thus they created (designed) their own gears. More details here.  
Below is one of the OCP certified server (courtesy: It features very minimalistic feature and a stripped down version of typical Rack Mount server.
Coming back to this year’s summit, considering this was my first year at OCP summit, I had certain expectations and while being there I can say one thing for sure – “Bare Metal does look interesting again”. Why I say that? If it was only about Bare Metal, it certainly a boring thing but when you combine that bare metal with API and particularly if you are operating at a scale (doesn’t have to be at Facebook scale), it’s fun time. Let’s take a look.
Keynote started by Facebook’s Jason Taylor with journey over last year or so and where the community stands now. But fun begun when (another Jason) Jason Waxman from Intel talking about their involvement and how the server and storage (think NVMe) industry is growing and what they see coming in future – including Xeon D and Yosemite.

A good talk was given by Peter Winzer of Bell Labs. I knew UNIX and C born out of Bell Labs but it was fascinating to hear about the history and future of Bell Labs with innovations going in Fiber Optics and capacity of Fiber – with 100G is no brainer but 1Tbps is in the horizon. 

 Microsoft Azure’s CTO Mark Russinovich started discussing about how open Microsoft is – which to be honest other than their .NET framework being open, I had no idea that they have been contributing back to Open Source community – well, it’s a good thing!  In past Microsoft has contributed their server design specs – Open Cloud Server (OCS) and Switch Abstraction Interface (SAI). OCS is the same server and data center design that powers their Azure Hyper-Scale Cloud (~ 1M servers). Using SAI and available APIs help network infrastructure providers integrate software with hardware platforms that are continually and rapidly evolving at cloud speed and scale. For this year, they have been working on a network switch and proposed a new innovation for OCP inclusion called Software Open Networking in the Cloud (SONiC). More details here. 

There were many interesting technologies showcased in Expo but one struck my mind was Storage Archival Solution. This basic configuration can hold 26,112 disks (7.8 PB) with expandable modules spanning pair of datacenter row gives total capacity of up to 181 petabytes (HUGE!!).  Is AWS Glacier running this underneath? Some details here.
For a coder at heart, it was good demonstration by companies such as Microsoft and Intel showing some love for OpenBMC to manage the bare metal. Firmware update seems to be common pain across industries but innovative approach taken by Intel and Microsoft using Capsule – which bring API and Envelop via UEFI – try to make it easier than it seems. 
Overall, it was a good exposure to newer generation of hardware technologies and by accepting contributions from multiple companies, OCP is moving towards standardization on hardware. With standardization and API integration, it will make fun to play with Bare Metal.
Do you still think Bare Metal is dreary?

This article originally appeared on LinkedIn under the title Bare Metal – A dreary (but essential) part of Cloud

Amazon Web Services (AWS) Risk and Compliance

This is a summary of AWS’s Risk and Compliance White Paper
AWS publishes SOC1 report – formerly known as Statement on Auditing Standards (SAS) 70, Service Organization report, widely recognized auditing standard developed by AICPA (American Institute of Certified Public Accountants). 
SOC 1 audit is an in-depth audit of design and operating effectiveness of AWS’s defined control objectives and control activities. 
Type II – refers that each of the controls described in reports are not only evaluated for adequacy of design, but are also tested for operating effectiveness by the external auditor. 
With ISO 27001 certification AWS is complying with a broad, comprehensive security standard and follows best practices in maintaining a secure environment. 
With PCI Data Security Standards (PCI DSS), AWS is complying with set of controls important to companies that handle credit card information. 
With AWS’s compliance with FISMA standards, AWS complies with wide range of specific control requirements by US government agencies. 
Risk Management:
AWS management has developed a strategic business plan which includes risk identification and the implementation of controls to mitigate and manage risks. Based on my understanding, AWS management re-evaluate those plans at least twice a year. 
Also, AWS compliance team have adopted various Information Security and Compliance framework – including but not limited to COBIT, ISO 27001/27002, AICPA Trust Service Principles, NIST 800-53 and PCI DSS v3.1. 
Additionally, AWS regularly scan all their Internet facing services for possible vulnerabilities and notified parties involved in remediation. External Pen Test (VA test) are also performed by reputed independent companies and repots are shared with AWS management. 
FedRAMP: AWS is Federal Risk and Authorization Management Program (FedRAMPsm) compliant Cloud Service Provider. 
FIPS 140-2: The Federal Information Processing Standard (FIPS) Publication 140-2 is a US government security standard that specifies the security requirements for cryptographic modules protecting sensitive information. AWS is operating their GovCloud (US) with FIPS 140-2 validated hardware. 
To allow US government agencies to comply with FISMA (Federal Information Security Management Act), AWS infrastructure has been evaluated by independent assessors for a variety of government systems as part of their system owner’s approval process.
Many agencies have successfully achieved security authorization for systems hosted in AWS in accordance with Risk Management Framework (RMF) process defined in NIST 800-37 and DoD Information Assurance Certification and Accreditation Process (DIACAP).
Leveraging secure AWS environment to process, maintain and store protected health information, AWS is enabling entities to work in AWS cloud who need to comply with US Health Insurance Portability and Accountability Act (HIPPA). 
ISO 9001:
AWS has achieved ISO 9001 certification to directly support customers who develop, migrate and operate their quality-controlled IT systems in AWS cloud. This allows customers to utilize AWS’s compliance report as evidence of their ISO 9001 programs for industry specific quality programs such as ISO/TS 16949 in auto sector, ISO 13485 in medical devices, GxP in life science, AS9100 in aerospace industry. 
ISO 27001:
AWS has achieved ISO 27001 certification of their Information Security Management Systems (ISMS) covering AWS infrastructure, data centers, and multiple cloud services. 
AWS GovCloud (US) supports US International Traffic in Arms Regulations (ITAR) compliance. Companies subject to ITAR export regulations must control unintended exports by restricting access to protected data to US persons and restricting physical location of that data to US. AWS GovCloud provides such facilities and comply to the required compliance requirements. 
PCI DSS Level 1:
AWS is level 1 compliant under PCI DSS (Payment Card Industry Data Security Standards). Based on February 2013 guidelines by PCI Security Standards Council, AWS incorporated those guidelines in AWS PCI Compliance Package for customers. AWS PCI Compliance package include AWS PCI Attestation of Compliance (AoC), which shows that AWS has been successfully validated against standard applicable to a Level 1 Service Provider under PCI DSS Version 3.1.
AWS publishes Service Organization Controls 1 (SOC 1), Type II report. Audit of this report is done in accordance with AICPA: AT 801 (formerly SSAE 16) and International Standards for Assurance Engagements No. 3402 (ISAE 3402). 
This dual report intended to meet a broad range of financial auditing requirement of US and international bodies. 
In addition to SOC 1, AWS also publishes SOC 2, Type II report – that expands the evaluation of controls to the criteria set forth by the AICPA Trust Service Principles. These principle defines leading practice controls relevant to security, availability, processing integrity, confidentiality, and privacy applicable to service organization such as AWS. 
SOC 3 report is publicly-available summary of AWS SOC 2 report. The report includes the external auditor’s opinion of the operation of controls based on (AICPA’s Security Trust Principle included in SOC 2 report), the assertion from AWS management regarding effectiveness of controls, and overview of AWS infrastructure and Services.

Cloud Principles

This post explains some of the cloud pricinples to be utilized when working with Amazon Web Services. Though the references here are for AWS services, pricinples can be used across multiple clouds.

– Design for failure and nothing will fail:
  • What happens if a node in your system fails? How do you recognize that failure? How do I replace that node? What kind of scenarios do I have to plan for?
  • What are my single points of failure? If a load balancer is sitting in front of an array of application servers, what if that load balancer fails?
  • If there are master and slaves in your architecture, what if the master node fails? How does the failover occur and how is a new slave instantiated and brought into sync with the master?
  • What happens to my application if the dependent services changes its interface?
  • What if downstream service times out or returns an exception?
  • What if the cache keys grow beyond memory limit of an instance?
Best practice:
  1. Failover gracefully using Elastic IPs: Elastic IP is a static IP that is dynamically re-mappable. You can quickly remap and failover to another set of servers so that your traffic is routed to the new servers. It works great when you want to upgrade from old to new versions or in case of hardware failures
  2. Utilize multiple Availability Zones: Availability Zones are conceptually like logical datacenters. By deploying your architecture to multiple availability zones, you can ensure highly availability. Utilize Amazon RDS Multi-AZ [21] deployment functionality to automatically replicate database updates across multiple Availability Zones.
  3. Maintain an Amazon Machine Image so that you can restore and clone environments very easily in a different Availability Zone; Maintain multiple Database slaves across Availability Zones and setup hot replication.
  4. Utilize Amazon CloudWatch (or various real-time open source monitoring tools) to get more visibility and take appropriate actions in case of hardware failure or performance degradation. Setup an Auto scaling group to maintain a fixed fleet size so that it replaces unhealthy Amazon EC2 instances by new ones.
  5. Utilize Amazon EBS and set up cron jobs so that incremental snapshots are automatically uploaded to Amazon S3 and data is persisted independent of your instances. 
  6. Utilize Amazon RDS and set the retention period for backups, so that it can perform automated backups.
– Decouple your components:
the more loosely coupled the components of the system, the bigger and better it scales.
  • Which business component or feature could be isolated from current monolithic application and can run standalone separately?
  • And then how can I add more instances of that component without breaking my current system and at the same time serve more users?
  • How much effort will it take to encapsulate the component so that it can interact with other components asynchronously?
Best prCTICES:
  1. Use Amazon SQS to isolate components 
  2. Use Amazon SQS as buffers between components
  3. Design every component such that it expose a service interface and is responsible for its own scalability in all appropriate dimensions and interacts with other components asynchronously
  4. Bundle the logical construct of a component into an Amazon Machine Image so that it can be deployed more often 
  5. Make your applications as stateless as possible. Store session state outside of component (in Amazon SimpleDB, if appropriate)
– Implement elasticity
  1. Proactive Cyclic Scaling: Periodic scaling that occurs at fixed interval (daily, weekly, monthly, quarterly)
  2. Proactive Event-based Scaling: Scaling just when you are expecting a big surge of traffic requests due to a scheduled business event (new product launch, marketing campaigns) 
  3. Auto-scaling based on demand. By using a monitoring service, your system can send triggers to take appropriate actions so that it scales up or down based on metrics (utilization of the servers or network i/o, for instance)
Automate Your Infrastructure
  • Create a library of “recipes” – small frequently-used scripts (for installation and configuration)
  • Manage the configuration and deployment process using agents bundled inside an AMI 
  • Bootstrap your instances
Bootstrap Your Instances
  1. Recreate the (Dev, staging, Production) environment with few clicks and minimal effort
  2. More control over your abstract cloud-based resources
  3. Reduce human-induced deployment errors
  4. Create a Self Healing and Self-discoverable environment which is more resilient to hardware failure
Best Practices:
  1. Define Auto-scaling groups for different clusters using the Amazon Auto-scaling feature in Amazon EC2.
  2. Monitor your system metrics (CPU, Memory, Disk I/O, Network I/O) using Amazon CloudWatch and take appropriate actions (launching new AMIs dynamically using the Auto-scaling service) or send notifications.
  3. Store and retrieve machine configuration information dynamically: Utilize Amazon DynamoDB to fetch config data during boot-time of an instance (eg. database connection strings). SimpleDB may also be used to store information about an instance such as its IP address, machine name and role.
  4. Design a build process such that it dumps the latest builds to a bucket in Amazon S3; download the latest version of an application from during system startup.
  5. Invest in building resource management tools (Automated scripts, pre-configured images) or Use smart open source configuration management tools like Chef, Puppet, CFEngine or Genome.
  6. Bundle Just Enough Operating System (JeOS22) and your software dependencies into an Amazon Machine Image so that it is easier to manage and maintain. Pass configuration files or parameters at launch time and retrieve user data23 and instance metadata after launch.
  7. Reduce bundling and launch time by booting from Amazon EBS volumes24 and attaching multiple Amazon EBS volumes to an instance. Create snapshots of common volumes and share snapshots25 among accounts wherever appropriate.
  8. Application components should not assume health or location of hardware it is running on. For example, dynamically attach the IP address of a new node to the cluster. Automatically failover and start a new clone in case of a failure.
– Think Parallel: The cloud makes parallelization effortless.
Best Practices:
  1. Multi-thread your Amazon S3 requests  
  2. Multi-thread your Amazon SimpleDB GET and BATCHPUT requests
  3. Create a JobFlow using the Amazon Elastic MapReduce Service for each of your daily batch processes (indexing, log analysis etc.) which will compute the job in parallel and save time.
  4. Use the Elastic Load Balancing service and spread your load across multiple web app servers dynamically
– Keep Dynamic Data close to Compute and Static Data close to End User:
Best Practices:
  1. Ship your data drives to Amazon using the Import/Export service. It may be cheaper and faster to move large amounts of data using the sneakernet28 than to upload using the Internet.
  2. Utilize the same Availability Zone to launch a cluster of machines
  3. Create a distribution of your Amazon S3 bucket and let Amazon CloudFront caches content in that bucket across all the 14 edge locations around the world

XEN security bug may be forcing AWS, RackSpace to reboot their servers

It was reported by XEN security team about a bug where a buggy or malicious HVM gues can crash the host or read data relating to other guests or the hypervisor itself. This certainly cause a bigger security risk in public cloud environment – such as Amazon Web Services or RackSpace Public Cloud where they use XEN has a choice of hypervisor for guest VM.

Probably that’s the reason it was reported last week of AWS reboot across regions.