Accidental Complexity in OpenSSL HMAC functions
- by Hassan Syed
SSL Documentation Analaysis
This question is pertaining the usage of the HMAC routines in OpenSSL.
Since Openssl documentation is a tad on the weak side in certain areas, profiling has revealed that using the:
unsigned char *HMAC(const EVP_MD *evp_md, const void *key,
int key_len, const unsigned char *d, int n,
unsigned char *md, unsigned int *md_len);
From here, shows 40% of my library runtime is devoted to creating and taking down **HMAC_CTX's behind the scenes.
There are also two additional function to create and destroy a HMAC_CTX explicetly:
HMAC_CTX_init() initialises a HMAC_CTX
before first use. It must be called.
HMAC_CTX_cleanup() erases the key and
other data from the HMAC_CTX and
releases any associated resources. It
must be called when an HMAC_CTX is no
longer required.
These two function calls are prefixed with:
The following functions may be used if
the message is not completely stored
in memory
My data fits entirely in memory, so I choose the HMAC function -- the one whose signature is shown above.
The context, as described by the man page, is made use of by using the following two functions:
HMAC_Update() can be called repeatedly
with chunks of the message to be
authenticated (len bytes at data).
HMAC_Final() places the message
authentication code in md, which must
have space for the hash function
output.
The Scope of the Application
My application generates a authentic (HMAC, which is also used a nonce), CBC-BF encrypted protocol buffer string. The code will be interfaced with various web-servers and frameworks Windows / Linux as OS, nginx, Apache and IIS as webservers and Python / .NET and C++ web-server filters.
The description above should clarify that the library needs to be thread safe, and potentially have resumeable processing state -- i.e., lightweight threads sharing a OS thread (which might leave thread local memory out of the picture).
The Question
How do I get rid of the 40% overhead on each invocation in a (1) thread-safe / (2) resume-able state way ? (2) is optional since I have all of the source-data present in one go, and can make sure a digest is created in place without relinquishing control of the thread mid-digest-creation. So,
(1) can probably be done using thread local memory -- but how do I resuse the CTX's ? does the HMAC_final() call make the CTX reusable ?.
(2) optional: in this case I would have to create a pool of CTX's.
(3) how does the HMAC function do this ? does it create a CTX in the scope of the function call and destroy it ?
Psuedocode and commentary will be useful.