Python pyfilesystem2 (fs) by https://fediverse.org/willmcgugan ¶
Filesystem Abstraction for Python ¶
Work with files and directories in archives, memory, the cloud etc. as easily as your local drive.
Write code now, decide later where the data will be stored; unit test without writing real files; upload files to the cloud without learning a new API; sandbox your file writing code; etc.
Create a folder, move a file ¶
The first example from Trey’s post, creates a folder then moves a file into it. Here it is
from pathlib import Path
Path('src/__pypackages__').mkdir(parents=True, exist_ok=True)
Path('.editorconfig').rename('src/.editorconfig')
The code above is straightforward, and hides the gory platform details which is a major benefit of pathlib over os.path.
The PyFilesystem version also does this, and the code is remarkably similar
from fs import open_fs
with open_fs('.') as cwd:
cwd.makedirs('src/__pypackages', recreate=True)
cwd.move('.editorconfig', 'src/.editorconfig')
Create a directory if it doesn’t already exist, write a blank file ¶
This next example from Trey’s post, creates a directory then creates an empty file if it doesn’t already exist
from pathlib import Path
def make_editorconfig(dir_path):
"""Create .editorconfig file in given directory and return filepath."""
path = Path(dir_path, '.editorconfig')
if not path.exists():
path.parent.mkdir(exist_ok=True, parent=True)
path.touch()
return path
This function is tricky to compare, as it does things you might not consider doing in a project with PyFilesystem, but if I was to translate it literally, it would be something like the following:
def make_editorconfig(dir_path):
"""Create .editorconfig file in given directory and return filename."""
with open_fs(dir_path, create=True) as fs:
fs.touch(".editorconfig")
return fs.getsyspath(".editorconfig")
The reason that you wouldn’t write this code with PyFilesystem, is that you rarely need to pass around paths .
You typically pass around FS objects which represent a subdirectory .
It’s perhaps not the best example to demonstrate this, but the PyFilesystem code would likely be more like the following
def make_editorconfig(directory_fs):
directory_fs.create(".editorconfig")
with open_fs("foo", create=True) as directory_fs:
make_editorconfig(directory_fs)
Rather than a str or a Path object, the function excepts an FS object .
An advantage of this is that file / directory operations are sandboxed under that directory unlike the Pathlib version, which has access to the entire filesystem.
For a trivial example, this won’t matter. But if you have more complex code, it can prevent you from unintentionally deleting or overwriting files if there is a bug.
Counting files by extension ¶
Next up, we have a short script which counts the Python files in a subdirectory using pathlib:
from pathlib import Path
extension = '.py'
count = 0
for filename in Path.cwd().rglob(f'*{extension}'):
count += 1
print(f"{count} Python files found")
Nice and simple. PyFilesystem has glob functionality (although no rglob yet). The code looks quite similar:
from fs import open_fs
extension = '.py'
with open_fs('.') as fs:
count = fs.glob(f"**/*{extension}").count().files
print(f"{count} Python files found")
There’s no for loop in the code above, because there is built in file counting functionality , but otherwise it is much the same.
I think Trey was using this example to compare performance. I haven’t actually compared performance of PyFilesystem’s globbing versus os.path or pathlib. That could be the subject for another post.
Write a file to the terminal if it exists ¶
The next example is a simple one for both pathlib and PyFilesystem. Here’s the pathlib version:
from pathlib import Path
import sys
directory = Path(sys.argv[1])
ignore_path = directory / '.gitignore'
if ignore_path.is_file():
print(ignore_path.read_text(), end='')
And here’s the PyFIlesystem equivalent:
import sys
from fs import open_fs
with open_fs(sys.argv[1]) as fs:
if fs.isfile(".gitignore"):
print(fs.readtext('.gitignore'), end='')
Note that there’s no equivalent of directory / ‘.gitignore’.
You don’t need to join paths in PyFilesystem as often, but when you do, you don’t need to worry about platform details.
All paths in PyFilesystem are a sort of idealized path with a common format .
Finding duplicates ¶
Trey offered a fully working script to find duplicates in a subdirectory with and without pathlib.
Coincidentally I’d recently added a similar example to PyFilesystem.
Here is Trey’s pathlib version:
from collections import defaultdict
from hashlib import md5
from pathlib import Path
def find_files(filepath):
for path in Path(filepath).rglob('*'):
if path.is_file():
yield path
file_hashes = defaultdict(list)
for path in find_files(Path.cwd()):
file_hash = md5(path.read_bytes()).hexdigest()
file_hashes[file_hash].append(path)
for paths in file_hashes.values():
if len(paths) > 1:
print("Duplicate files found:")
print(*paths, sep='\n')
And here we have equivalent functionality with PyFilesystem:
from collections import defaultdict
from hashlib import md5
from fs import open_fs
file_hashes = defaultdict(list)
with open_fs('.') as fs:
for path in fs.walk.files():
file_hash = md5(fs.readbytes(path)).hexdigest()
file_hashes[file_hash].append(path)
for paths in file_hashes.values():
if len(paths) > 1:
print("Duplicate files found:")
print(*paths, sep='\n')
The PyFilesystem version compares quite favourable here (in terms of lines of code at least). Mostly because there was already an iterator of paths method built in.
Conclusion ¶
First off, I would like to emphasise that I’m not suggesting you never use pathlib. It is better than the alternatives in the standard library .
Pathlib also has the advantage that it is actually in the standard library, whereas PyFilesystem is a pip install fs away.
I would say that I think PyFilesystem results in cleaner code for the most part , which could just be down to the fact that I’ve been working with PyFilesystem for a lot longer and it ‘fits my brain’ better.
I’ll let you be the judge.
Also note that as the primary author of PyFilesystem, there is obviously a bucket-load of bias here.
There is one area where I think PyFilesystem is a clear winner. The PyFilesystem code above would work virtually unaltered with files in an archive, in memory, on a ftp server, S3 etc. or any of the supported filesystems.
I’d like to apologise to Trey Hunner if I misrepresented anything he said in his post!