willmcgugan ¶

Filesystem Abstraction for Python ¶

Work with files and directories in archives, memory, the cloud etc. as easily as your local drive.

Write code now, decide later where the data will be stored; unit test without writing real files; upload files to the cloud without learning a new API; sandbox your file writing code; etc.

Create a folder, move a file ¶

The first example from Trey’s post, creates a folder then moves a file into it. Here it is

           from pathlib import Path

Path('src/__pypackages__').mkdir(parents=True, exist_ok=True)
Path('.editorconfig').rename('src/.editorconfig')

The code above is straightforward, and hides the gory platform details which is a major benefit of pathlib over os.path.

The PyFilesystem version also does this, and the code is remarkably similar

           from fs import open_fs

with open_fs('.') as cwd:
    cwd.makedirs('src/__pypackages', recreate=True)
    cwd.move('.editorconfig', 'src/.editorconfig')

Create a directory if it doesn’t already exist, write a blank file ¶

This next example from Trey’s post, creates a directory then creates an empty file if it doesn’t already exist

           from pathlib import Path


def make_editorconfig(dir_path):
    """Create .editorconfig file in given directory and return filepath."""
    path = Path(dir_path, '.editorconfig')
    if not path.exists():
        path.parent.mkdir(exist_ok=True, parent=True)
        path.touch()
    return path

          

This function is tricky to compare, as it does things you might not consider doing in a project with PyFilesystem, but if I was to translate it literally, it would be something like the following:

           def make_editorconfig(dir_path):
    """Create .editorconfig file in given directory and return filename."""
    with open_fs(dir_path, create=True) as fs:
        fs.touch(".editorconfig")
    return fs.getsyspath(".editorconfig")

          

The reason that you wouldn’t write this code with PyFilesystem, is that you rarely need to pass around paths .

You typically pass around FS objects which represent a subdirectory .

It’s perhaps not the best example to demonstrate this, but the PyFilesystem code would likely be more like the following

           def make_editorconfig(directory_fs):
    directory_fs.create(".editorconfig")

with open_fs("foo", create=True) as directory_fs:
    make_editorconfig(directory_fs)

Rather than a str or a Path object, the function excepts an FS object .

An advantage of this is that file / directory operations are sandboxed under that directory unlike the Pathlib version, which has access to the entire filesystem.

For a trivial example, this won’t matter. But if you have more complex code, it can prevent you from unintentionally deleting or overwriting files if there is a bug.

Counting files by extension ¶

Next up, we have a short script which counts the Python files in a subdirectory using pathlib:

           from pathlib import Path


extension = '.py'
count = 0
for filename in Path.cwd().rglob(f'*{extension}'):
    count += 1
print(f"{count} Python files found")

          

Nice and simple. PyFilesystem has glob functionality (although no rglob yet). The code looks quite similar:

           from fs import open_fs

extension = '.py'

with open_fs('.') as fs:
    count = fs.glob(f"**/*{extension}").count().files
print(f"{count} Python files found")

There’s no for loop in the code above, because there is built in file counting functionality , but otherwise it is much the same.

I think Trey was using this example to compare performance. I haven’t actually compared performance of PyFilesystem’s globbing versus os.path or pathlib. That could be the subject for another post.

Write a file to the terminal if it exists ¶

The next example is a simple one for both pathlib and PyFilesystem. Here’s the pathlib version:

           from pathlib import Path
import sys


directory = Path(sys.argv[1])
ignore_path = directory / '.gitignore'
if ignore_path.is_file():
    print(ignore_path.read_text(), end='')

          

And here’s the PyFIlesystem equivalent:

           import sys
from fs import open_fs

with open_fs(sys.argv[1]) as fs:
    if fs.isfile(".gitignore"):
        print(fs.readtext('.gitignore'), end='')

Note that there’s no equivalent of directory / ‘.gitignore’.

You don’t need to join paths in PyFilesystem as often, but when you do, you don’t need to worry about platform details.

All paths in PyFilesystem are a sort of idealized path with a common format .

Finding duplicates ¶

Trey offered a fully working script to find duplicates in a subdirectory with and without pathlib.

Coincidentally I’d recently added a similar example to PyFilesystem.

Here is Trey’s pathlib version:

           from collections import defaultdict
from hashlib import md5
from pathlib import Path


def find_files(filepath):
    for path in Path(filepath).rglob('*'):
        if path.is_file():
            yield path


file_hashes = defaultdict(list)
for path in find_files(Path.cwd()):
    file_hash = md5(path.read_bytes()).hexdigest()
    file_hashes[file_hash].append(path)

for paths in file_hashes.values():
    if len(paths) > 1:
        print("Duplicate files found:")
        print(*paths, sep='\n')

          

And here we have equivalent functionality with PyFilesystem:

           from collections import defaultdict
from hashlib import md5
from fs import open_fs

file_hashes = defaultdict(list)
with open_fs('.') as fs:
    for path in fs.walk.files():
        file_hash = md5(fs.readbytes(path)).hexdigest()
        file_hashes[file_hash].append(path)

for paths in file_hashes.values():
    if len(paths) > 1:
        print("Duplicate files found:")
        print(*paths, sep='\n')

          

The PyFilesystem version compares quite favourable here (in terms of lines of code at least). Mostly because there was already an iterator of paths method built in.

Conclusion ¶

First off, I would like to emphasise that I’m not suggesting you never use pathlib. It is better than the alternatives in the standard library .

Pathlib also has the advantage that it is actually in the standard library, whereas PyFilesystem is a pip install fs away.

I would say that I think PyFilesystem results in cleaner code for the most part , which could just be down to the fact that I’ve been working with PyFilesystem for a lot longer and it ‘fits my brain’ better.

I’ll let you be the judge.

Also note that as the primary author of PyFilesystem, there is obviously a bucket-load of bias here.

There is one area where I think PyFilesystem is a clear winner. The PyFilesystem code above would work virtually unaltered with files in an archive, in memory, on a ftp server, S3 etc. or any of the supported filesystems.

I’d like to apologise to Trey Hunner if I misrepresented anything he said in his post!