Files ¶
8 file processing tips in Python ¶
-
https://www.pythonmorsels.com/creating-and-writing-file-python/
-
https://www.pythonmorsels.com/reading-binary-files-in-python/
-
https://www.pythonmorsels.com/unicode-character-encodings-in-python
This week I’d like to share a handful of quick tips, all related to processing files in Python.
-
When writing files <https://www.pythonmorsels.com/creating-and-writing-file-python/> (and ideally when reading them too) use a with block to auto-close your file when you’re done working with them.
-
When working with very large text files, process the file line-by-line by looping over it (this will only store 8KB of the file in memory at a time thanks to the way file buffering works).
-
You can process large binary files chunk-by-chunk to avoid reading them into memory all at once.
-
If your text files might not be in UTF-8, be sure to specify the encoding of your files use when opening them.
-
When working with untrusted files that might have extremely long lines, instead of looping line-by-line, call the readline method with a maximum size instead Ignore this advice if you know the untrusted file is small (due to file upload limits for example)
-
When manipulating file paths, use pathlib.Path objects. In fact, I tend to prefer pathlib pretty much anytime I work with files in Python.
-
If you ever need to read from a file twice, you may want to use the seek method.
-
If you need to ensure you don’t overwrite a file or you want to append to the end of a file, take a look into Python’s file modes.
Call the readline method with a maximum size instead ¶
For untrusted data, we could do something like this:
max_len = 2**16
with open(filename) as my_file:
while line := my_file.readline(max_len+1):
if len(line) > max_len:
raise ValueError("Line too long")
print("Processing", line)