Are PyArg_ParseTuple() "s" format specifiers useful in Python 3.x C API?
Posted
by Craig McQueen
on Stack Overflow
See other posts from Stack Overflow
or by Craig McQueen
Published on 2010-03-11T04:40:39Z
Indexed on
2010/03/11
4:55 UTC
Read the original article
Hit count: 478
python
|python-3.x
I'm trying to write a Python C extension that processes byte strings, and I have something basically working for Python 2.x and Python 3.x.
For the Python 2.x code, near the start of my function, I currently have a line:
if (!PyArg_ParseTuple(args, "s#:in_bytes", &src_ptr, &src_len))
...
I notice that the s#
format specifier accepts both Unicode strings and byte strings. I really just want it to accept byte strings and reject Unicode. For Python 2.x, this might be "good enough"--the standard hashlib
seems to do the same, accepting Unicode as well as byte strings. However, Python 3.x is meant to clean up the Unicode/byte string mess and not let the two be interchangeable.
So, I'm surprised to find that in Python 3.x, the s
format specifiers for PyArg_ParseTuple()
still seem to accept Unicode and provide a "default encoded string version" of the Unicode. This seems to go against the principles of Python 3.x, making the s
format specifiers unusable in practice. Is my analysis correct, or am I missing something?
Looking at the implementation for hashlib
for Python 3.x (e.g. see md5module.c
, function MD5_update()
and its use of GET_BUFFER_VIEW_OR_ERROUT()
macro) I see that it avoids the s
format specifiers, and just takes a generic object (O
specifier) and then does various explicit type checks using the GET_BUFFER_VIEW_OR_ERROUT()
macro. Is this what we have to do?
© Stack Overflow or respective owner