Python file iterator over a binary file with newer idiom.
- by drewk
In Python, for a binary file, I can write this:
buf_size=1024*64 # this is an important size...
with open(file, "rb") as f:
while True:
data=f.read(buf_size)
if not data: break
# deal with the data....
With a text file that I want to read line-by-line, I can write this:
with open(file, "r") as file:
for line in file:
# deal with each line....
Which is shorthand for:
with open(file, "r") as file:
for line in iter(file.readline, ""):
# deal with each line....
This idiom is documented in PEP 234 but I have failed to locate a similar idiom for binary files.
I have tried this:
>>> with open('dups.txt','rb') as f:
... for chunk in iter(f.read,''):
... i+=1
>>> i
1 # 30 MB file, i==1 means read in one go...
I tried putting iter(f.read(buf_size),'') but that is a syntax error because of the parens after the callable in iter().
I know I could write a function, but is there way with the default idiom of for chunk in file: where I can use a buffer size versus a line oriented?
Thanks for putting up with the Python newbie trying to write his first non-trivial and idiomatic Python script.