Thursday, September 13, 2007

Public Key Infrastructure (PKI) based security - Espresso!

Why (the hell) Am I Doing This ?
I was not new to PKI. But I never cared to know “what really happens inside” part of it. The best thing about PKI libraries is that it hides its rocket science form the programmer. So I was totally happy with such libraries. Ipso facto, I ended up taking the short cut by either following the steps provided or asking one of those FAQs.
Ever since I started working on MIDLets, data/application security had become an irritating reality that I must deal with. Unlike other software development, MIDLet development requires application signing and other security aspects in almost every step. After got involved a little bit in MIDLet development community, I had the privilege to answer some of FAQs. Then I thought may be it would be helpful if I put together a blog explaining some of the practical aspects of PKI security.

Disclaimer
The scope of PKI is way beyond this article. This blog has been written for readers with some security background. The main intention here is to present a big picture and highlight some key points rather than drilling down to specifics. As a result of my effort to make it simple, you may find it a little “wordy” if you are already familiar with PKI concepts. If you have suggestions or comments, please post it here or send it in my way.


The "Threat"

Every one who transmits data is exposed to some kind of data leak threat. It can be as simple as someone listening to your secret conversation to the complex cell phone frequency tapping. The Internet, being the most widely used data media, has become the most attack prone zone.

The “History
Way back in time, the only possible way to secure the data-in-transit was to use a dedicated line. Leased lines are isolated, closed circuits and are totally secure. This is very similar to connecting two machines back-to-back – more like a media level security. This method works pretty well, but as you can imagine, it is very expensive and practically impossible to scale. The IP world where all packets share the same media, demands a different kind of security.

The “Requirements
Any system that can accomplish the following can be qualified as an Internet security solution


1) Must be able to make sure that the data comes/goes from/to the right party
2) Only the indented recipient must be able to read the data
3) Must be able to protect (detect at least) data alteration in transit

The “New security
PKI was introduced as an inexpensive substitute for leased line based security approach. PKI offers content level security. It is independent of the media or the data itself. It brings in new concepts like “digital trust” “authentication” etc (explained later). One of the hardest to stop attack is packet sniffing and man-in-the-middle attacks. In both these cases the attacker either intercepts or monitors network packets and examines their contents. This is very similar to someone standing outside your house with a powerful microphone, listening to all your private conversations. As you can imagine, the easiest way to protect yourself form this attack is to speak in a language only you can understand. Or, encrypt your conversation. “Encryption”, that is the back-bone of PKI security. In the digital world, when the data is in process’s memory, it is assumed safe. The digital security starts when the data leaves memory to a media – such as network or storage. PKI uses strong encryption mechanisms so that it is practically impossible for the attacker to decrypt the data-in-transit.

Summary so far
PKI protects data-in-transit by encrypting it.




The “Challenge
Encryption can be as simple as adding a constant number to all bytes in the data to highly complex mathematical operations. No matter what method you use, there are 3 major components in encryption.
1) The encryption algorithm
2) The encryption key
3) The data itself

Apparently, stronger the encryption algorithm, better the security is. Bigger the encryption key, better the security is. But the down side is both demands high processing power. So the challenge is to use the right algorithm with an optimal key size. Decades ago mathematicians (human beings) developed several strong encryption algorithms - the good news is we still use them!!

"The best encryption is the one which can never be decrypted"

-me



The “Encrypted communication
Following is the essence of PKI based communication. When A wants to send data to B,
1) A encrypts the data with a secret key
2) A “somehow” shares the secret key with B
3) B uses the shared secret key to decrypts the data


The “somehow” is the next challenge. How do you securely share the secret key? If we can share the key why don’t we use the same mechanism to share the data itself? Well, it doesn’t work that way! One way to do is to use another media (say telephone) - but that breaks the continuity of the data flow. We need a sharing method which works more “smoothly”. PKI has solutions for this problem as well. Before proceeding further I should introduce an important PKI terminology.

The “Key Pairs

Like I said the PKI internals are real mathematical rocket science. Fortunately that’s not a requirement here. Now pay attention. The simplest way to look at it is this. There are two numbers, two large... too large to memorize. Those numbers are mathematically related in a single unique way. It is relatively easy to generate such a pair. But it is practically impossible to find the pairing number when you have only one available - because it is computationally extremely heavy. In PKI world, these numbers are known as the key pairs. One of them is called the private key and the other one the public key. It doesn't really matter which is which.

Sharing the Secret
Utilizing this unique mathematical relation between the keys in the key pair, mathematicians developed special encryption algorithms which can do a one way encryption. Meaning - if you encrypt the data using one of these keys, you absolutely require the other PKI key to decrypt it.
The core of the "secret" in PKI is the isolation of the encryption key from the decryption key

Typically, private keys are stored in a secret location and the public key is published to the world. Anyone who wishes to send PKI encrypted data to a target uses the target’s public key to encrypt the data. Now the data can be decrypted only with the private key of the target. (Even the encrypting party cannot decrypt it). Given that it is practically impossible to calculate the pair key, it is assumed that the data is secure. This type of "one way" encryption is known as "asymmetric encryption". Popular algorithms used for this are RSA, DSA etc.

If you read upon PKI basics on the Internet you will definitely come across two people "Alice" and "Bob" (A and B). They've been tortured almost every day, ever since people started to learn PKI concepts :)

Summary so far
PKI protects data by encrypting it. PKI protects the encryption keys by using asymmetric encryption algorithms where the encryption key itself doesn’t have to be shared at all.


The "Trust"
So far I've been trying to introduce how the data is protected by encrypting it and how the encrypted data is shared. Now I should move to the next concept - the trust. The idea is simple, when you have a secret, you will share it only with some whom you really trust. When it comes to PKI, it is still simple, but not very straight forward. The reason being, you are communicating with someone you probably never known before. So you need a third person whom you always trust, who can introduce the second party to you. So there are two sub categories in this.
1) An authorized party (like passport office)
2) A friend whom you trust in real life

Both methods are popular in PKI world. The first method is known as "X509 standard" and the second one is the "web of trust" (aka Pretty Good Privacy - PGP). In X509, there is a trusted root CA (Certificate Authority) (like Verisign, RSA, Thawte etc) whom everyone trusts. In PGP, you pick your own trusted parties.

Trusting the Trust
Back to the basics: everything has to be mathematics modeled to run on computers. How do you mathematically trust? PKI has the answer. Say you trust your friend (in real life). He gives his public key to you in a disk or so and you save it. When you receive encrypted data from him, you can “verify” it by trying to decrypt the data with his public key. If the decryption is successful, it is assures that the data came from your friend and you can trust it. The same way your friend can distribute his public key to everyone thus everyone can verify the legitimacy of the data using his public key. The reverse works as well. This is the base of PKI’s mathematical trust.

The “Certificate
To make it more appealing, the public keys are distributed as “certificates”. Besides the public key, it contains the following important fields.
1) The issuer’s public key
2) Hash(SHA/MD5) of subject’s public key, encrypted with issuer’s private key (the signature)
3) Date of issue and the expiry,
4) Algorithms used for hashing and encryption (MD5/RSA)


Think about it as a set of key-value pairs. Please note that you can issue a certificate for yourself (self-signed) or you can have a second person (like a CA) issue a certificate for you. There is a syntax in which certificate data should be formatted. X509 and PGP has different syntax. Certificate syntax itself is typically written and available in ASN.1 format.
If you receive a certificate from someone, the first thing you do would be to verify it. The verification process typically involves the following.

1) Whether the certificate is syntactically correct
2) Whether the date of issue and expiry are currently valid
3) Whether you trust the party who issued the certificate
4) Whether the signature is valid


1 and 2 are straightforward. But how do you know whether/if you trust the issuer? Well, you store all trusted issuers. In other words you will have a “certificate store”. If the incoming party’s issuer’s public key is in your trusted certificate store, you trust it.

Side note: Once you fill all the required fields in the right ASN syntax, you get a block of data, probably not very human readable. This will typically will be converted to a Base64(PEM) format, which (still can’t make sense of contents) is readable.

There are a bunch of standard formats (PKCSxx) to store and transfer certificates and private keys. Some times these are confusing, but hey.. “my way or highway”!. As a developer sometimes you need to deal with this, as some software accept or don’t accept some format or the other. OpenSSL is one of the most popular tools to generate and convert formats. Take a look at http://www.rsa.com/rsalabs/node.asp?id=2124 for further PKCSxx details.

Digital signature and signature verification
You must have come across this term hundred of times. Digital signature is the PKI way to ensure the integrity of the data. It allows the receiver of the data to check whether the data contents were altered since it was signed. This process works as follows.
1) Sender calculates the hash value of the data
2) Sender encrypts the hash value with his private key
3) Sender sends the original data along with the encrypted hash to the receiver
4) Sender can also include his public key
5) Recipient authenticates (explained below) sender’s public key
6) Recipient uses sender’s public key to decrypt the hash
7) Recipient uses the same algorithm and calculates the data’s hash value

If hash value in (7) matches with the one in included in the message from (3) the message is OK. If any of these steps fail, the data will not be trusted

The “Chain of trust
PKI is based on simple concepts, but very powerful features have been derived from it. The chain of trust is one of them. Chain of trust is the simple “Transitive Property” of trust: if A trust B and B trusts C then A trusts C. In PKI world, it works like this. The root CA generates a self-signed certificate for itself. Then the root CA issues a certificate to another intermediate-CA and intermediate-CA uses that certificate to issue more certificates – and this chain can go on. Note that you always trust the root CA not the intermediate one. Also note that the intermediate CA does NOT have a self-signed certificate. In other words, ONLY root CA’s can have self-signed certificate. When you receive a chain of certificate, you will verify that the chain winds up all the way to a root CA that you trust. This way the root CA can have distributor chains and keep the trust unchanged.

The PKI overhead and the workaround
Encryption is a complicated math procedure and it consumes lots of cpu cycles. As the key size increases it becomes even heavier. PKI keys (public/private) are typically 1024 bits so it is not efficient to use them for every encryption. So the solution is to use a shorter session key (32 or 64 bit) and exchange this key one time using the PKI keys.

Some Protocols Built Over PKI
Several different protocols have been built on top of PKI. I would like to give an overview of two of them - TLS and S/MIME. The reason why I picked those is because they are two important, but different ways using PKI for different purposes. TLS is a generic term for connection based security. It is used in HTTP/S, SSH, SMTP, LDAP etc. SMIME is connectionless. It is used in MIME – which is a data format (not a n/w protocol)

Typical TLS protocol flow is like this:
1) A and B are connected over standard unencrypted network connection
2) A sends StartTLS request to B over existing connection
3) A and B exchanges certificates (typically B sends first)
4) A and B verifies certificates against their local certificate store
5) A and B exchanges the session key and encryption algorithm
6) A and B uses the session key for further communication

As you can see after step 6, every bit of data passes thru the connection will be encrypted. From there on it will be called a “TLS connection”. Usually there is no going back from TLS to un-encrypted mode.

Unlike TLS, SMIME is a connection-less, data-based protocol. This is useful when you are communicating with someone whom you already know. It works as follows.
1) A generates a short key for encryption
2) A encrypts the data with that key
3) A encrypts the key with B’s public key
4) A packs the data from step 2, 3 and together to form an encrypted message
5) A signs the encrypted message with A’s private key to form a signed/encrypted message
6) A sends the message over unencrypted connection
7) B verifies the message signature against its key store
8) B uses its private key to decrypt the encryption key
9) B uses the decrypted key from step(8) to decrypt the message body


For further reading:

http://luca.ntop.org/Teaching/Appunti/asn1.html
http://www.x5.net/faqs/crypto/q166.html
http://people.cs.uchicago.edu/~cbarnard/pgptalk/index.html
http://technet.microsoft.com/en-us/library/aa995740.aspx