Python pyfilesystem2 (fs) by https://fediverse.org/willmcgugan

Filesystem Abstraction for Python

Work with files and directories in archives, memory, the cloud etc. as easily as your local drive.

Write code now, decide later where the data will be stored; unit test without writing real files; upload files to the cloud without learning a new API; sandbox your file writing code; etc.

Create a folder, move a file

The first example from Trey’s post, creates a folder then moves a file into it. Here it is

from pathlib import Path

Path('src/__pypackages__').mkdir(parents=True, exist_ok=True)
Path('.editorconfig').rename('src/.editorconfig')

The code above is straightforward, and hides the gory platform details which is a major benefit of pathlib over os.path.

The PyFilesystem version also does this, and the code is remarkably similar

from fs import open_fs

with open_fs('.') as cwd:
    cwd.makedirs('src/__pypackages', recreate=True)
    cwd.move('.editorconfig', 'src/.editorconfig')

Create a directory if it doesn’t already exist, write a blank file

This next example from Trey’s post, creates a directory then creates an empty file if it doesn’t already exist

from pathlib import Path


def make_editorconfig(dir_path):
    """Create .editorconfig file in given directory and return filepath."""
    path = Path(dir_path, '.editorconfig')
    if not path.exists():
        path.parent.mkdir(exist_ok=True, parent=True)
        path.touch()
    return path

This function is tricky to compare, as it does things you might not consider doing in a project with PyFilesystem, but if I was to translate it literally, it would be something like the following:

def make_editorconfig(dir_path):
    """Create .editorconfig file in given directory and return filename."""
    with open_fs(dir_path, create=True) as fs:
        fs.touch(".editorconfig")
    return fs.getsyspath(".editorconfig")

The reason that you wouldn’t write this code with PyFilesystem, is that you rarely need to pass around paths .

You typically pass around FS objects which represent a subdirectory .

It’s perhaps not the best example to demonstrate this, but the PyFilesystem code would likely be more like the following

def make_editorconfig(directory_fs):
    directory_fs.create(".editorconfig")

with open_fs("foo", create=True) as directory_fs:
    make_editorconfig(directory_fs)

Rather than a str or a Path object, the function excepts an FS object .

An advantage of this is that file / directory operations are sandboxed under that directory unlike the Pathlib version, which has access to the entire filesystem.

For a trivial example, this won’t matter. But if you have more complex code, it can prevent you from unintentionally deleting or overwriting files if there is a bug.

Counting files by extension

Next up, we have a short script which counts the Python files in a subdirectory using pathlib:

from pathlib import Path


extension = '.py'
count = 0
for filename in Path.cwd().rglob(f'*{extension}'):
    count += 1
print(f"{count} Python files found")

Nice and simple. PyFilesystem has glob functionality (although no rglob yet). The code looks quite similar:

from fs import open_fs

extension = '.py'

with open_fs('.') as fs:
    count = fs.glob(f"**/*{extension}").count().files
print(f"{count} Python files found")

There’s no for loop in the code above, because there is built in file counting functionality , but otherwise it is much the same.

I think Trey was using this example to compare performance. I haven’t actually compared performance of PyFilesystem’s globbing versus os.path or pathlib. That could be the subject for another post.

Write a file to the terminal if it exists

The next example is a simple one for both pathlib and PyFilesystem. Here’s the pathlib version:

from pathlib import Path
import sys


directory = Path(sys.argv[1])
ignore_path = directory / '.gitignore'
if ignore_path.is_file():
    print(ignore_path.read_text(), end='')

And here’s the PyFIlesystem equivalent:

import sys
from fs import open_fs


with open_fs(sys.argv[1]) as fs:
    if fs.isfile(".gitignore"):
        print(fs.readtext('.gitignore'), end='')

Note that there’s no equivalent of directory / ‘.gitignore’.

You don’t need to join paths in PyFilesystem as often, but when you do, you don’t need to worry about platform details.

All paths in PyFilesystem are a sort of idealized path with a common format .

Finding duplicates

Trey offered a fully working script to find duplicates in a subdirectory with and without pathlib.

Coincidentally I’d recently added a similar example to PyFilesystem.

Here is Trey’s pathlib version:

from collections import defaultdict
from hashlib import md5
from pathlib import Path


def find_files(filepath):
    for path in Path(filepath).rglob('*'):
        if path.is_file():
            yield path


file_hashes = defaultdict(list)
for path in find_files(Path.cwd()):
    file_hash = md5(path.read_bytes()).hexdigest()
    file_hashes[file_hash].append(path)

for paths in file_hashes.values():
    if len(paths) > 1:
        print("Duplicate files found:")
        print(*paths, sep='\n')

And here we have equivalent functionality with PyFilesystem:

from collections import defaultdict
from hashlib import md5
from fs import open_fs

file_hashes = defaultdict(list)
with open_fs('.') as fs:
    for path in fs.walk.files():
        file_hash = md5(fs.readbytes(path)).hexdigest()
        file_hashes[file_hash].append(path)

for paths in file_hashes.values():
    if len(paths) > 1:
        print("Duplicate files found:")
        print(*paths, sep='\n')

The PyFilesystem version compares quite favourable here (in terms of lines of code at least). Mostly because there was already an iterator of paths method built in.

Conclusion

First off, I would like to emphasise that I’m not suggesting you never use pathlib. It is better than the alternatives in the standard library .

Pathlib also has the advantage that it is actually in the standard library, whereas PyFilesystem is a pip install fs away.

I would say that I think PyFilesystem results in cleaner code for the most part , which could just be down to the fact that I’ve been working with PyFilesystem for a lot longer and it ‘fits my brain’ better.

I’ll let you be the judge.

Also note that as the primary author of PyFilesystem, there is obviously a bucket-load of bias here.

There is one area where I think PyFilesystem is a clear winner. The PyFilesystem code above would work virtually unaltered with files in an archive, in memory, on a ftp server, S3 etc. or any of the supported filesystems.

I’d like to apologise to Trey Hunner if I misrepresented anything he said in his post!