0

I've just been looking at adding a HMAC to PHP mcrypt encryption.

Is this simply hashing the encrypted data with hash_hmac using the encryption key and appending it to the encrypted data? Then on decryption you split off the HMAC, hash_hmac the rest of the data with the key again and check it matches the HMAC.

I'm confused because in this SO question When authenticating ciphertexts, what should be HMACed? it says:

you have to include in the HMAC input everything that impacts the decryption process, i.e. not only the encryption result per se, but also the IV which was used for that encryption, and, if the overall protocol supports algorithm agility, you should also input the specification of the encryption algorithm (otherwise, an attacker could alter the header of your message to replace the tag which says "AES-256" with the tag which says "AES-128" and you would unknowingly decrypt with the wrong algorithm).

Is this so? If this is true, why isn't using hash_hmac on just the encrypted data enough?

1 Answer 1

2

Short Answer: Yes

Long Answer:

HMAC is Hash-based message authentication code. You should HMAC anything which you want to authenticate, or in other words, anything which you want to protect against being modified.

Although the RFC standard is more complicated, it may make sense to think of HMAC as a salted hash.

e.g. hmac(message, key) = hash(message + key)

  1. You can only recreate the same hmac with an identical message and key.
  2. You can't recreate the same hmac if the key is identical but the message differs.
  3. You can't recreate the same hmac if the message is identical but the key differs.

An attacker (who doesn't have the HMAC key) cannot modify part of the HMAC message without invalidating the existing HMAC. It really does depend on your data format and your usage of that data to determine what should be included in the HMAC message and HMAC key. But assuming you are using the HMAC to authenticate the decryption, then you should always include in the HMAC message anything that the decryption depends on. The symmetric key is typically used as the HMAC key.

In your quote, the poster says the IV and the algorithm should also be hashed. Consider a file/database format consisting of

ALGORITHM + IV + CIPHERTEXT + HMAC

If you only HMAC the ciphertext, an attacker would be able to modify the algorithm or IV (corrupting the file) without affecting the validity of the HMAC. This is bad because you can end up with a corrupted encrypted file with a valid HMAC. Decryption will proceed as normal because your software will think everything is ok. The result is a totally garbled decryption, but the point is that your software is broken because it returned the wrong output when decrypting and didn't give any errors. This can be classed as a 'security risk' if your application tries to do something with that erroneous data because it assumes it is correct. It is not a security risk in the sense that it makes the underlying encryption weaker or easier to crack. HMAC and symmetric encryption are two totally different technologies doing different things. The point of using a HMAC is that you can assume that the decryption layer is returning data which is 100% correct.

In the above example the ALGORITHM is a dynamic piece of data which I used to explain "algorithm agility" in the OPs quote. It defines what encryption algorithm was used. The point is that it is dynamic so it needs to be read from somewhere rather than hardcoded. This fact makes it a dependency of the decryption so it should be included in the HMAC message. However, if you always use some static algorithm then it should be assumed by (hardcoded in to) your decryption code and there is no need to store this data anyway. There is no need to include static data in the HMAC message because it has no affect on the decryption.

An example of a file format which uses a static algorithm is the open source AES-256 Crypt File Format. The algorithm is consistent and so it is always assumed. It actually uses 2 HMACs for speed reasons. 1 to authenticate the IV and keys, and the 2nd to authenticate the encrypted data part.

Sign up to request clarification or add additional context in comments.

10 Comments

It can still be check if the decryption was successful if the encryption library checks all bytes of the padding. If there is even one error in the padding, it can throw an exception or something like that. An HMAC prevents you from doing the decryption if it is corrupted.
Depending on the padding used, authentication by checking for valid padding is nearly always less reliable than using a hmac. It might randomly create a valid padding by accident even with a bad decryption.
“This is bad because you can end up with a corrupted encrypted file with a valid HMAC.” This doesn't pose any security risk though, does it?
How you use your encryption/decryption layer is what is important here. Generally, assumptions are bad in software. If your application code makes the assumption that the decryption was successful and acts on the erroneous data, it could be harmless or it could be very bad. Whether it is a "security risk" depends on your definition, but it certainly would be classed as a problem which exists at the encryption/decryption layer. But no, it is not a security risk which makes it easier to crack the underlying encryption if that's what you're asking.
No problem. I have tried to incorporate everything we discussed here. Thanks
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.