AWS Security Mindmap

Amazon Web Services (AWS) has a great offering for their cloud services. It goes without saying that while you run your workload in the cloud, you want to ensure that it must be secured. To benefit their customers, AWS has built plenty of security tools in-house and also they comply to a myriad of industry standards such as PCI-DSS, HIPPA, FedRAMP/FISMA, just to name a few.

Due to the long list of security services of AWS, it can be sometimes overwhelming to identify what should one use for their use case. To solve that puzzle, AWS has come up with a Security White Paper. While this paper provides very details intended for the audience, to remember it for a longer duration – mind map is to the rescue.

I have come up with a mindmap of AWS Security Best practices. I am sure this may not be the first one in the AWS community but this serves my purpose, so keeping it here.

AWS Security Mind Map PDF

 

 

OWASP Pro Active Controls

The OWASP Top 10 Proactive Controls 2016 is published by OWASP (Open Web Application Security Project). This is a list of security techniques should exist as a part of SDLC (Software Development Life Cycle).

1)  Verify for Security Early and Often:

This is the most important aspect of any Secure Software Development Life Cycle. Applications must be tested and verified for security in the beginning of the project and throughout the lifecycle of the project – thus any issue discovered early can be fixed early and don’t block entire project.

2)  Parameterize Queries:

SQL injection is one of the most dangerous vulnerabilities for the web applications. SQL injection allows evil attacker code to change the structure of a web application’s SQL statement in a way that can steal the data, modify the data or potentially facilitate native OS command injection. By using parameterized queries, one can prevent SQL Injection.

Here is an example of SQL Injection flaw.
This is a Unsfae JAVA code that allows an attacker to inject code into query that would be executed by the database.

 String query = "SELECT acct_balance FROM acct_data WHERE customer_name = "
   + request.getParameter("custName");
 
 try {
 	Statement statement = connection.createStatement( … );
 	ResultSet results = statement.executeQuery( query );
 }

Now, “custName” parameter is simply appended to the query allows an attacker to inject any SQL code they want.

So what can we do to avoid it? Use prepared statements with variable binding(eg. parameterized queries) should help alleviate this issue.

The following code example uses a PreparedStatement, Java’s implementation of a parameterized query, to execute the same database query.

 String custname = request.getParameter("custName"); // This should REALLY be validated too
 // perform input validation to detect attacks
 String query = "SELECT acct_balance FROM acct_data WHERE user_name = ? ";
 
 PreparedStatement pstmt = connection.prepareStatement( query );
 pstmt.setString( 1, custname); 
 ResultSet results = pstmt.executeQuery( );

3) Encode Data

Encoding help protect against many types of attacks – particularly XSS (Cross-site scripting). Encoding translates special characters into some equivalent that make safe for the target interpreter.

4) validate All Inputs

So that we can reduce or minimize the malformed data entering the system. This should not be used as a primary method to prevent XSS or SQL injection.

5) Implement Identity and Authentication Controls

Use standard methods for authentication, identity management, and session management. Ideally use appropriate guidelines for User IDs, password strength controls, securing password recovery mechanism, storing password, transmitting password etc. Additionally, it would be vital to ensure that all failures are logged and reviewed, all password failures are logged and reviews, as well as all account lockouts, are logged and reviewed.

Another option could be using authentication protocols that require no passwords – such as OAuth, OpenID, SAML, FIDO, etc.

For session management, one should consider a variety of factors such as Session ID properties – such as Name Fingerprinting, ID length, ID entropy, ID content; Use built-in language specific (though latest) session management implementation. Utilize secure cookie as much as possible. Follow best practices for Session ID lifecycle. Apply controls for Session Expiration and possible Session hijacking.

6) Implement Appropriate Access Controls

Deny access by default. Utilize Role Bases, Discretionary or Mandatory Access controls where applicable. By using access control, we are intentionally creating one more layer of security – known as authorization. Authorization is the process where requests to access a particular resource should be granted or denied. By creating access control policy we are ensuring that it meets the security requirements as described.

7) Protect Data

Encrypt your data in transit, at rest and during execution. Make sure to use strong encryption methods and libraries.

8) Implement Logging and Intrusion Detection

Log analysis and intrusion detection goes hand-in-hand. There are two ways of doing intrusion detection – Network based intrusion detection and log based intrusion detection. In this particular control, we need to design our log strategy such that we are able to detect the intrusion based on systems, networks, applications, devices,

9) Leverage Security Frameworks and Libraries

Leverage security frameworks and libraries as much as possible for your application language domain.

10) error and Exception Handling

Error messages give an attacker great insight into the inner working on your code. Thus its important to aspect of secure application development to prevent error, exceptions from leaking any information.

CIS Critical Security Controls

In the earlier post, we discussed CIS Security Benchmarks and how it can be useful to public or private organizations. In this post, we will explore some of the CIS Critical Security Controls.

The CIS Critical Security Controls, also known as CIS Controls, are a concise, prioritized set of cyber practices created to stop today’s most pervasive and dangerous cyber attacks. The CIS controls are developed, refined and validated by a community of leading experts around the world. Though it’s widely considered that by applying top 5 CIS controls, an organization should be able to reduce 85 percents of risk related to cyberattack, we will review all 20 CIS controls here for clarity sake.

  1. CSC # 1: Inventory of Authorized and Unauthorized Device
  2. CSC # 2: Inventory of Authorized and Unauthorized Software
  3. CSC # 3: Secure Configurations for Hardware and Software
  4. CSC # 4: Continuous Vulnerability Assessment and Remediation
  5. CSC # 5: Controlled Use of Administrative Privileges
  6. CSC # 6: Maintenance, Monitoring, and Analysis of Audit Logs
  7. CSC # 7: Email and Web Browser Protections
  8. CSC # 8: Malware Defenses
  9. CSC # 9: Limitation and Control of Network Ports
  10. CSC # 10: Data Recovery Capability
  11. CSC # 11: Secure Configurations for Network Devices
  12. CSC # 12: Boundary Defense
  13. CSC # 13: Data Protection
  14. CSC # 14: Controlled Access Based on the Need to Know
  15. CSC # 15: Wireless Access Control
  16. CSC # 16: Account Monitoring and Control
  17. CSC # 17: Security Skills Assessment and Appropriate Training to Fill Gaps
  18. CSC # 18: Application Software Security
  19. CSC # 19: Incident Response and Management
  20. CSC # 20: Penetration Tests and Red Team Exercises

Each of these controls has its own sub-control, which has it’s own threshold metrics (from Low Risk, Medium Risk, or High Risk). For example, our first control states that we should have an inventory of authorized and unauthorized devices. First sub-control requires us to deploy an “automated” asset inventory discovery tool and as a part of that our metric should be How many “Unauthorized” Devices present in our network at a given time. If that number is somewhere between 0-1%, that’s considered Low Risk. If that number is between 1-4%, it’s medium risk while anything above 4% is considered High Risk – and appropriate actions should be taken to mitigate such risks!

CIS Security Benchmarks

In the earlier post we talked about CIS (Center for Internet Security) and now we will take a deep dive into one of the area where CIS is focused on – Security Benchmarks.

 

 

CIS Security Benchmarks are consensus-based best practices derived from industry and they are completely vendor agnostic – thus no need to worry if today you are working with one vendor and next week you decided to move on to another vendor.

It covers multiple grounds for managing security in private or public organizations but mainly it covers:

  • secure configurations benchmarks
    • These are the recommended technical settings for operating systems, middleware, software applications and network devices. It also includes some of the cloud-related benchmarks, such as AWS Foundation Benchmarks where how to secure your AWS components – such as best practices with IAM, CloudTrail, CloudWatch etc.
  • automated configuration assessment tools and content
    • CIS’s Configuration Assessment Tool (CIS-CAT) is a tool for analyzing and monitoring the security status of information systems and the effectiveness of internal security controls and processes. This tool reports a target system’s conformance with the recommended settings in the Security Benchmarks.
  • security metrics
    • CIS has identified set of security metrics to be watched, create data related to those metrics, identify the results of such metrics and present them in an effective manner to stakeholders. Per CIS, there are twenty metrics to choose from distributed across business functions – such as Incident Management, Vulnerability Management, Patch Management, Configuration Management, Change Management, Application Security and Financial metrics.
  • security software product certifications

(ref: https://benchmarks.cisecurity.org/)

CISO Mindmap – Business Enablement

While doing some research about CISO function, noticed a very good MindMap created by Rafeeq Rehman.

While what he has come up with is mindmap, I will try to deconstruct this mindmap to elaborate more about the various functions performed by CISO.

Let’s begin:

  1. Business Enablement
  2. Security Operations
  3. Selling Infosec (internally)
  4. Compliance and Audit
  5. Security Architecture
  6. Project Delivery lifecycle
  7. Risk Management
  8. Governance
  9. Identity Management
  10. Budget
  11. HR and Legal
So why I numbered them and in the order?
I believe Business Enablement is the most important function of a CISO. If (s)he doesn’t know the business where (s)he operates, it will be a very difficult job to continue his duties as CISO. Consider a person coming from a technology background with no knowledge of Retail Business. If that person is hired as a CISO because (s)he knows the technology, that may not be a good deal. The only reason to become a successful CISO, one must know which business he is involved in. To understand the security function, he must understand the business climate.

If this retail business has a requirement of storing credit card information into their systems, CISO’s job is to make sure appropriate PCI-DSS controls are in place so the data doesn’t get into the wrong hands. While at the same time, making sure that PCI-DSS is not coming into the way of enabling the business to accept credit cards transactions. Yes, security is a requirement but not at the cost of not doing business.

That’s why I rate business enablement as a very important function as a CISO.

What are some of the way CISO can enable business to adopt technology and still not come in their way?

  • Cloud Computing
  • Mobile technologies
  • Internet of things
  • Artificial Intelligence
  • Data Analytics
  • Crypto currencies / Blockchain
  • Mergers and Acquisitions
We will review each of these items in details in the following blog posts.

CIS: Center for Internet Security

CIS:
The Center for Internet Security (CIS) is an organization dedicated to enhancing the Cybersecurity readiness and response among public and private sector entities. Utilizing its strong industry and government partnerships, CIS combats evolving Cybersecurity challenges on a global scale and helps organizations adopt key best practices to achieve immediate and effective defenses against cyber attacks. CIS is home to the Multi-State Information Sharing and Analysis Center (MS-ISAC), CIS Security Benchmarks, and CIS Critical Security Controls.
CIS mission is to:
  • Identify, develop, validate, promote, and sustain best practices in cybersecurity;
  • Deliver world-class security solutions to prevent and rapidly respond to cyber incidents; and
  • Build and lead communities to enable an environment of trust in cyberspace.

CIS live by the values as published:

  • Operate with Integrity
  • Commit to Excellence
  • Embody Collaboration
  • Focus on our Partners
  • Support our Employees
  • Promote Teamwork
  • Remain Agile

There are two resources of CIS which we will take a deep dive on:

  • Secure Configuration Guides (aka “Benchmarks”)
  • “Top 20” Critical Security Controls (CSC)
Benchmarks vs. Critical Security Controls:
  • Benchmarks are technology specific checklists that provide prescriptive guidance for secure configuration
  • CSCs are security program level activities:
    • Inventory your items
    • Securely configure them
    • Patch them
    • Reduce privileges
    • Train the humans
    • Monitor the access
CIS Benchmarks:
  • 140 benchmarks available here
  • AWS CIS Foundations Benchmark here
 (Ref: https://www.cisecurity.org/about/)

TPM (Trusted Platform Module)

TPM or Trusted Platform Module as referred by TCG (Trusted Computing Group)  is a microcontroller used in Laptop and now also on servers to ensure the integrity of the platform. TPM can securely store artifacts used to authenticate the platform. These artifacts can include passwords, certificates, or encryption keys. A TPM can also be used to store platform measurements that help ensure that the platform remains trustworthy. Authentication (ensuring that the platform can prove that it is what it claims to be) and attestation (a process helping to prove that a platform is trustworthy and has not been breached) are necessary steps to ensure safer computing in all environments.

source: http://www.trustedcomputinggroup.org
Above image depicts the overall function of TPM module. Standard use case I have seen is ensuring secure boot process of servers. Secure boot will validate the code run at each step in the process, and stop the boot if the code is incorrect. The first step is to measure each piece of code before it is run. In this context, a measurement is effectively a SHA-1 hash of the code, taken before it is executed. The hash is stored in a platform configuration register (PCR) in the TPM.

 TPM 1.2 only support SHA-1 algorithm 

Each TPM has at least 24 PCRs. The TCG Generic Server Specification, v1.0, March 2005, defines the PCR assignments for boot-time integrity measurements. The table below shows a typical PCR configuration. The context indicates if the values are determined based on the node hardware (firmware) or the software provisioned onto the node. Some values are influenced by firmware versions, disk sizes, and other low-level information.

Therefore, it is important to have good practices in place around configuration management to ensure that each system deployed is configured exactly as desired.

Register What is measured Context
PCR-00 Core Root of Trust Measurement (CRTM), BIOS code, Host platform extensions Hardware
PCR-01 Host platform configuration Hardware
PCR-02 Option ROM code Hardware
PCR-03 Option ROM configuration and data Hardware
PCR-04 Initial Program Loader (IPL) code. For example, master boot record. Software
PCR-05 IPL code configuration and data Software
PCR-06 State transition and wake events Software
PCR-07 Host platform manufacturer control Software
PCR-08 Platform specific, often kernel, kernel extensions, and drivers Software
PCR-09 Platform specific, often Initramfs Software
PCR-10 to PCR-23 Platform specific Software

So there are very good use case of TPM to ensure secure boot and integrity of hardware – who all are using TPM? There are many institutions who runs their private clouds have been seen using TPM chipset on their servers while many public clouds do not support TPM – why? that’s mystery!

Hadoop Stack

In this post, I am exploring Hadoop stack and it’s ecosystem.

Hadoop:

Oozie:

Oozie is a server-based workflow engine specialized in running workflow jobs with actions.  It is typically used for managing Apache Hadoop Map/Reduce and Pig Jobs. In Oozie, there are workflow jobs and Coordinator jobs. Typically workflow jobs are Directed Acyclical Graph (DAG) of actions while coordinator jobs are recurrent Ozzie workflow jobs which are triggered by time (or frequency) and based on data availability.

Due to Oozie’s integration with rest of the Hadoop stack, it is easy to support several types of Hadoop jobs out of the box.

From a product point of view, it’s a Java Web Application that runs on Java Servlet container. In Oozie, a workflow is a collection of actions (Hadoop Map/Reduce jobs, Pig jobs) arranged in control dependency DAG (Direct Acyclic Graph)… Here control dependency dictates that from one action to another action – but second action can’t run until the first action is completed.

These workflow definitions are written in hPDL (Process Definition Language). Oozie workflow actions start their jobs in remote systems (like Pig, Hadoop etc.). Once completed, remote systems callback Oozie to notify the action completion and then Oozie proceeds to the next actoin in workflow.

credit: https://oozie.apache.org/docs/4.2.0/DG_Overview.html

From Stackoverflow: DAG (Direct Acyclic Graph)

Graph = structure consisting of nodes, that are connected to each other with edges.
Directed  = The connections between nodes (edges) have a direction: A –> B is not the same as B -> A.
Acyclic = “non-circular” = moving from node to node by following the edges, you will never encounter the same node for the second time.

A good example of a directed acyclic graph is a tree. Note, however, not all directed acyclic graphs are trees 🙂

Bare Metal – A dreary (but essential) part of Cloud

Recently I got a chance to attend Open Compute Summit 2016 in San Jose, CA. It was full of industry peers from web scale companies such as Facebook, Google, Microsoft along with many financial institutions like Goldman Sachs, Bloomberg, Fidelity, etc. Overall theme of this summit was to embrace the openness in hardware and embrace commodity hardware. 
From historical point of view, OCP was a project initiated by Facebook few years ago where they opened many of the hardware components – motherboard, power supply, chassis, rack, later switch etc. as they needed things at scale and doing it using branded servers (pre-cut for enterprise by HP, Dell, IBM) wasn’t going to cut for them – thus they created (designed) their own gears. More details here.  
Below is one of the OCP certified server (courtesy: http://www.wiwynn.com). It features very minimalistic feature and a stripped down version of typical Rack Mount server.
Coming back to this year’s summit, considering this was my first year at OCP summit, I had certain expectations and while being there I can say one thing for sure – “Bare Metal does look interesting again”. Why I say that? If it was only about Bare Metal, it certainly a boring thing but when you combine that bare metal with API and particularly if you are operating at a scale (doesn’t have to be at Facebook scale), it’s fun time. Let’s take a look.
Keynote started by Facebook’s Jason Taylor with journey over last year or so and where the community stands now. But fun begun when (another Jason) Jason Waxman from Intel talking about their involvement and how the server and storage (think NVMe) industry is growing and what they see coming in future – including Xeon D and Yosemite.

A good talk was given by Peter Winzer of Bell Labs. I knew UNIX and C born out of Bell Labs but it was fascinating to hear about the history and future of Bell Labs with innovations going in Fiber Optics and capacity of Fiber – with 100G is no brainer but 1Tbps is in the horizon. 

 Microsoft Azure’s CTO Mark Russinovich started discussing about how open Microsoft is – which to be honest other than their .NET framework being open, I had no idea that they have been contributing back to Open Source community – well, it’s a good thing!  In past Microsoft has contributed their server design specs – Open Cloud Server (OCS) and Switch Abstraction Interface (SAI). OCS is the same server and data center design that powers their Azure Hyper-Scale Cloud (~ 1M servers). Using SAI and available APIs help network infrastructure providers integrate software with hardware platforms that are continually and rapidly evolving at cloud speed and scale. For this year, they have been working on a network switch and proposed a new innovation for OCP inclusion called Software Open Networking in the Cloud (SONiC). More details here. 

There were many interesting technologies showcased in Expo but one struck my mind was Storage Archival Solution. This basic configuration can hold 26,112 disks (7.8 PB) with expandable modules spanning pair of datacenter row gives total capacity of up to 181 petabytes (HUGE!!).  Is AWS Glacier running this underneath? Some details here.
For a coder at heart, it was good demonstration by companies such as Microsoft and Intel showing some love for OpenBMC to manage the bare metal. Firmware update seems to be common pain across industries but innovative approach taken by Intel and Microsoft using Capsule – which bring API and Envelop via UEFI – try to make it easier than it seems. 
Overall, it was a good exposure to newer generation of hardware technologies and by accepting contributions from multiple companies, OCP is moving towards standardization on hardware. With standardization and API integration, it will make fun to play with Bare Metal.
Do you still think Bare Metal is dreary?

This article originally appeared on LinkedIn under the title Bare Metal – A dreary (but essential) part of Cloud

Log Management

What are available options for Log Management?


There are logs everywhere – systems, applications, users, devices, thermostats, refrigerators, microwaves – you name it.. and as your deployment grows, your complexity increases. When you need to analyze a situation or an outage, logs are your lifesaver.
There are tons of tools available – open-source, pay-per-use and few others.. Let’s take a look at some of them here:



What are different tools/framework available to store these logs and analyze the logs – may be in real time, if not, after-the-fact analysis?


Splunk:


Splunk is a powerful log analysis software with choice of running in enterprise data center or over a cloud.

1. Splunk Enterprise: Search, monitor and analyze any machine data for powerful new insights.

2. Splunk Cloud: This provides Splunk enterprise and all it’s feature in a SaaS way over the cloud. 


3. Splunk Light: At a miniature scale of Splunk Enterprise – Log search and analysis for small IT environments


4. Hunk: Hunk provides the power to rapidly detect patterns and find anomalies across petabytes of raw data in Hadoop without the need to move or replicate data.


Apache Flume: 

Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application. 


Flume deploys as one or more agents, that’s contained within it’s own instance of JVM (Java Virtual Machine). Agents has three components: sources, sinks, and channels. An agent must have at least one of each in order to run. Sources collect incoming data as events. Sinks write events out, and channels provide a queue to connect the source and sink. Flume allows Hadoop users ingest high-volume streaming data directly into HDFS for storage. 


credit: flume.apache.org





Apache Kafka:

Apache Kafka is publish-subscribe messaging rethought as a distributed commit log. Kafka is fast, scalable, durable and distributed by design. This was a LinkedIn project at some point later open-sourced and now one of the top-level Apache open source project. There are many companies who has deployed Kafka in their infrastructure. 

Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design.

  • Kafka maintains feeds of messages in categories called topics.
  • We’ll call processes that publish messages to a Kafka topic producers.
  • We’ll call processes that subscribe to topics and process the feed of published messages consumers..
  • Kafka is run as a cluster comprised of one or more servers each of which is called a broker.

So, at a high level, producers send messages over the network to the Kafka cluster which in turn serves them up to consumers like this:

credit: kafka.apache.org


Kakfa has a good ecosystems surrounding the main product. With wide range of choice to select from, it might be a good “free” version of log management tool. For a large systems deployments, Kakfa can act as a broker with multiple publishers – may be from Syslog-ng (with agent running on each systems), FluentD (again, with fluentd agents running on nodes and plugin on Kakfa) may solve the purpose of log collections. With log4j appender, it might be extremely easy for applications which uses log4j framework, use it seamlessly. Once you have logs ingested via these subsystems, searching logs can be cumbersome.  With Kafka, there are some alternatives where you can dump these data into HDFS and run a Hive query against it and voila, you get your analysis. 

Still there is some work to be done in terms of how easily someone can retrieve it like via Kibana dashboard.

ELK:


When we are talking about logs, how can we not remember ELK stack. When I got introduced to ELK stack, it was presented as a Splunk alternative as open source. I agree, it does have the feature sets to complete against core splunk product and if there is a right sizing (think: small, medium) involved, we don’t need Splunk at all and ELK stack might be good enough. Though in recent usage, we have found some scalability issues when we reach few hundred gigs of logs per day. 


Though one good feature I like of ELK stack is all-in-one. I have my log aggregator, search indexer and dashboard within one suite of application. 


With so many choices, it becomes difficult to rely on one or the other. If someone has enough money to spend Splunk might be the right choice but if someone can throw a developer at it, either ELK stack or Kafka – depends on the scale at which they are growing, might be better off.