Are PyArg_ParseTuple() "s" format specifiers useful in Python 3.x C API?

Posted by Craig McQueen on Stack Overflow See other posts from Stack Overflow or by Craig McQueen
Published on 2010-03-11T04:40:39Z Indexed on 2010/03/11 4:55 UTC
Read the original article Hit count: 474

Filed under:
|

I'm trying to write a Python C extension that processes byte strings, and I have something basically working for Python 2.x and Python 3.x.

For the Python 2.x code, near the start of my function, I currently have a line:

    if (!PyArg_ParseTuple(args, "s#:in_bytes", &src_ptr, &src_len))
    ...

I notice that the s# format specifier accepts both Unicode strings and byte strings. I really just want it to accept byte strings and reject Unicode. For Python 2.x, this might be "good enough"--the standard hashlib seems to do the same, accepting Unicode as well as byte strings. However, Python 3.x is meant to clean up the Unicode/byte string mess and not let the two be interchangeable.

So, I'm surprised to find that in Python 3.x, the s format specifiers for PyArg_ParseTuple() still seem to accept Unicode and provide a "default encoded string version" of the Unicode. This seems to go against the principles of Python 3.x, making the s format specifiers unusable in practice. Is my analysis correct, or am I missing something?

Looking at the implementation for hashlib for Python 3.x (e.g. see md5module.c, function MD5_update() and its use of GET_BUFFER_VIEW_OR_ERROUT() macro) I see that it avoids the s format specifiers, and just takes a generic object (O specifier) and then does various explicit type checks using the GET_BUFFER_VIEW_OR_ERROUT() macro. Is this what we have to do?

© Stack Overflow or respective owner

Related posts about python

Related posts about python-3.x