TPM (Trusted Platform Module)

TPM or Trusted Platform Module as referred by TCG (Trusted Computing Group)  is a microcontroller used in Laptop and now also on servers to ensure the integrity of the platform. TPM can securely store artifacts used to authenticate the platform. These artifacts can include passwords, certificates, or encryption keys. A TPM can also be used to store platform measurements that help ensure that the platform remains trustworthy. Authentication (ensuring that the platform can prove that it is what it claims to be) and attestation (a process helping to prove that a platform is trustworthy and has not been breached) are necessary steps to ensure safer computing in all environments.

Above image depicts the overall function of TPM module. Standard use case I have seen is ensuring secure boot process of servers. Secure boot will validate the code run at each step in the process, and stop the boot if the code is incorrect. The first step is to measure each piece of code before it is run. In this context, a measurement is effectively a SHA-1 hash of the code, taken before it is executed. The hash is stored in a platform configuration register (PCR) in the TPM.

 TPM 1.2 only support SHA-1 algorithm 

Each TPM has at least 24 PCRs. The TCG Generic Server Specification, v1.0, March 2005, defines the PCR assignments for boot-time integrity measurements. The table below shows a typical PCR configuration. The context indicates if the values are determined based on the node hardware (firmware) or the software provisioned onto the node. Some values are influenced by firmware versions, disk sizes, and other low-level information.

Therefore, it is important to have good practices in place around configuration management to ensure that each system deployed is configured exactly as desired.

Register What is measured Context
PCR-00 Core Root of Trust Measurement (CRTM), BIOS code, Host platform extensions Hardware
PCR-01 Host platform configuration Hardware
PCR-02 Option ROM code Hardware
PCR-03 Option ROM configuration and data Hardware
PCR-04 Initial Program Loader (IPL) code. For example, master boot record. Software
PCR-05 IPL code configuration and data Software
PCR-06 State transition and wake events Software
PCR-07 Host platform manufacturer control Software
PCR-08 Platform specific, often kernel, kernel extensions, and drivers Software
PCR-09 Platform specific, often Initramfs Software
PCR-10 to PCR-23 Platform specific Software

So there are very good use case of TPM to ensure secure boot and integrity of hardware – who all are using TPM? There are many institutions who runs their private clouds have been seen using TPM chipset on their servers while many public clouds do not support TPM – why? that’s mystery!

Hadoop Stack

In this post, I am exploring Hadoop stack and it’s ecosystem.



Oozie is a server-based workflow engine specialized in running workflow jobs with actions.  It is typically used for managing Apache Hadoop Map/Reduce and Pig Jobs. In Oozie, there are workflow jobs and Coordinator jobs. Typically workflow jobs are Directed Acyclical Graph (DAG) of actions while coordinator jobs are recurrent Ozzie workflow jobs which are triggered by time (or frequency) and based on data availability.

Due to Oozie’s integration with rest of the Hadoop stack, it is easy to support several types of Hadoop jobs out of the box.

From a product point of view, it’s a Java Web Application that runs on Java Servlet container. In Oozie, a workflow is a collection of actions (Hadoop Map/Reduce jobs, Pig jobs) arranged in control dependency DAG (Direct Acyclic Graph)… Here control dependency dictates that from one action to another action – but second action can’t run until the first action is completed.

These workflow definitions are written in hPDL (Process Definition Language). Oozie workflow actions start their jobs in remote systems (like Pig, Hadoop etc.). Once completed, remote systems callback Oozie to notify the action completion and then Oozie proceeds to the next actoin in workflow.


From Stackoverflow: DAG (Direct Acyclic Graph)

Graph = structure consisting of nodes, that are connected to each other with edges.
Directed  = The connections between nodes (edges) have a direction: A –> B is not the same as B -> A.
Acyclic = “non-circular” = moving from node to node by following the edges, you will never encounter the same node for the second time.

A good example of a directed acyclic graph is a tree. Note, however, not all directed acyclic graphs are trees 🙂