Django 3.1 pathlib

Python 3.6 (2016-12-23, PEP-0519) Adding a file system path protocol Adding a file system path protocol

This PEP proposes a protocol for classes which represent a file system path to be able to provide a str or bytes representation.

Changes to Python’s standard library are also proposed to utilize this protocol where appropriate to facilitate the use of path objects where historically only str and/or bytes file system paths are accepted.

The goal is to facilitate the migration of users towards rich path objects while providing an easy way to work with code expecting str or bytes.

Python 3.4 (2014-03-17, PEP-0428) The pathlib module – object-oriented filesystem paths

This PEP proposes the inclusion of a third-party module, pathlib [1], in the standard library.

The inclusion is proposed under the provisional label, as described in PEP 411.

Therefore, API changes can be done, either as part of the PEP process, or after acceptance in the standard library (and until the provisional label is removed).

The aim of this library is to provide a simple hierarchy of classes to handle filesystem paths and the common operations users do over them.

https://code.djangoproject.com/ticket/29983 Replace os.path with pathlib.Path in project template and docs

import os
from pathlib import Path

# Build paths inside the project like this: os.path.join(BASE_DIR, ...)
BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
# Build paths inside the project like this: BASE_DIR / 'subdir'.
BASE_DIR = Path(__file__).resolve(strict=True).parents[1]

https://adamj.eu/tech/2020/03/16/use-pathlib-in-your-django-project/ Use Pathlib in Your Django Settings File

Introduction

Django’s default settings file has always included a BASE_DIR pseudo-setting. I call it a “pseudo-setting” since it’s not read by Django itself.

But it’s useful for configuring path-based settings, it is mentioned in the documentation, and some third party packages use it.

(One that I maintain, the Scout APM Python integration, uses it.)

Django has, up until now, defined it as

import os

# Build paths inside the project like this: os.path.join(BASE_DIR, ...)
BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))

This changes in version 3.1, which as I write is still months in the future. Thanks to a contribution by Jon Dufresne and Curtis Maloney, it’s instead defined using pathlib:

from pathlib import Path

# Build paths inside the project like this: BASE_DIR / 'subdir'.
BASE_DIR = Path(__file__).resolve(strict=True).parent.parent

Note this is in the new project template only. If you upgrade an older project to Django 3.1, your settings file won’t be changed.

Python 3.4 (2014-03-17)

pathlib was added to Python’s standard library in Python 3.4, thanks to PEP 428.

All file-path using functions across Python were then enhanced to support pathlib.Path objects (or anything with a __fspath__ method) in Python 3.6, thanks to PEP 519.

pathlib is great! It has an easier API than os.path.join(), allows method chaining, and handles path normalization automatically.

See how you can define a subdirectory using BASE_DIR / ‘subdir’.

If you want to read more, see Trey Hunner’s articles Why you should be using pathlib and No really, pathlib is great.

https://calmcode.io

Introduction

The goal of this series of videos is to demonstrate how to deal with files, paths and folders from python programmatically. We’ll mainly discuss the python pathlib module.

https://rednafi.github.io

Listing Specific Types of Files in a Directory

Let’s say you want to recursively visit nested directories and list .py files in a directroy called source. The directory looks like this:

src/
├── stuff
│   ├── __init__.py
│   └── submodule.py
├── .stuffconfig
├── somefiles.tar.gz
└── module.py

Usually, glob module is used to resolve this kind of situation:

from glob import glob

top_level_py_files = glob("src/*.py")
all_py_files = glob("src/**/*.py", recursive=True)

print(top_level_py_files)
print(all_py_files)
>>> ['src/module.py']
>>> ['src/module.py', 'src/stuff/__init__.py', 'src/stuff/submodule.py']

The above approach works perfectly.

However, if you don’t want to use another module just for a single job, pathlib has embedded glob and rglob methods.

You can entirely ignore glob and achieve the same result in the following way:

from pathlib import Path

top_level_py_files = Path("src").glob("*.py")
all_py_files = Path("src").rglob("*.py")

print(list(top_level_py_files))
print(list(all_py_files))

https://pbpython.com

../../_images/pathlib_cheatsheet_p1.png

Introduction

It is difficult to write a python script that does not have some interaction with the file system.

The activity could be as simple as reading a data file into a pandas DataFrame or as complex as parsing thousands of files in a deeply nested directory structure. Python’s standard library has several helpful functions for these tasks - including the pathlib module.

The pathlib module was first included in python 3.4 and has been enhanced in each of the subsequent releases.

Pathlib is an object oriented interface to the filesystem and provides a more intuitive method to interact with the filesystem in a platform agnostic and pythonic manner.

I recently had a small project where I decided to use pathlib combined with pandas to sort and manage thousands of files in a nested directory structure.

Once it all clicked, I really appreciated the capabilities that pathlib provided and will definitely use it in projects going forward.

That project is the inspiration for this post.

Getting Started with Pathlib

The pathlib library is included in all versions of python >= 3.4.

I recommend using the latest version of python in order to get access to all the latest updates. For this article, I will use python 3.6.

One of the useful features of the pathlib module is that it is more intuitive to build up paths without using os.joindir .

For example, when I start small projects, I create in and out directories as subdirectories under the current working directory (using os.getcwd() ).

I use those directories to store the working input and output files.

Here’s what that code would look like:

import os

in_dir = os.path.join(os.getcwd(), "in")
out_dir = os.path.join(os.getcwd(), "out")
in_file = os.path.join(in_dir, "input.xlsx")
out_file = os.path.join(out_dir, "output.xlsx")

This works but it is a little clunky. For instance, if I wanted to define just the input and output files without defining the directories, it looks like this:

import os

in_file = os.path.join(os.path.join(os.getcwd(), "in"), "input.xlsx")
out_file = os.path.join(os.path.join(os.getcwd(), "out"), "output.xlsx")

Hmmm. That’s not complex but it is certainly not pretty.

Let’s see what it looks like if we use the pathlib module.

from pathlib import Path

in_file_1 = Path.cwd() / "in" / "input.xlsx"
out_file_1 = Path.cwd() / "out" / "output.xlsx"

Interesting. In my opinion this is much easier to mentally parse.

It’s a similar thought process to the os.path method of joining the current working directory (using Path.cwd() ) with the various subdirectories and file locations.

It is much easier to follow because of the clever overriding of the / to build up a path in a more natural manner than chaining many os.path.joins together.

Additionally, if you don’t like the syntax above, you can chain multiple parts together using joinpath :

in_file_2 = Path.cwd().joinpath("in").joinpath("input.xlsx")
out_file_2 = Path.cwd().joinpath("out").joinpath("output.xlsx")

This is a little clunkier in my opinion but still much better than the os.path.join madness above.

https://realpython.com/python-pathlib/ Python 3’s pathlib Module: Taming the File System

Introduction

Have you struggled with file path handling in Python ?

In Python 3.4 and above, the struggle is now over! You no longer need to scratch your head over code like:

>>> path.rsplit('\\', maxsplit=1)[0]

Or cringe at the verbosity of:

>>> os.path.isfile(os.path.join(os.path.expanduser('~'), 'realpython.txt'))

Counting Files

There are a few different ways to list many files.

The simplest is the .iterdir() method, which iterates over all files in the given directory.

The following example combines .iterdir() with the collections.Counter class to count how many files there are of each filetype in the current directory:

>>> import collections
>>> collections.Counter(p.suffix for p in pathlib.Path.cwd().iterdir())
Counter({'.md': 2, '.txt': 4, '.pdf': 2, '.py': 1})

https://learndjango.com/tutorials/whats-new-django-31

Django has switched from using os.path to the more modern and concise pathlib .

If you create a new project using the startproject command, the automatically generated settings.py file now defaults to pathlib.

Here is the Django 3.0 version:

# settings.py
import os

BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': os.path.join(BASE_DIR, 'db.sqlite3'),
    }
}

And here is the newer Django 3.1 version:

# settings.py
from pathlib import Path

BASE_DIR = Path(__file__).resolve(strict=True).parent.parent

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': BASE_DIR / 'db.sqlite3',
    }
}

https://treyhunner.com/2019/01/no-really-pathlib-is-great/

Normalizing file paths shouldn’t be your concern

If you’re developing on Linux or Mac, it’s very easy to add bugs to your code that only affect Windows users.

Unless you’re careful to use os.path.join to build your paths up or os.path.normcase to convert forward slashes to backslashes as appropriate, you may be writing code that breaks on Windows.

This is a Windows bug waiting to happen (we’ll get mixed backslashes and forward slashes here):

import sys
import os.path
directory = '.' if not sys.argv[1:] else sys.argv[1]
new_file = os.path.join(directory, 'new_package/__init__.py')

This just works on all systems:

import sys
from pathlib import Path
directory = '.' if not sys.argv[1:] else sys.argv[1]
new_file = Path(directory, 'new_package/__init__.py')

It used to be the responsibility of you the Python programmer to carefully join and normalize your paths, just as it used to be your responsibility in Python 2 land to use unicode whenever it was more appropriate than bytes.

This is the case no more. T

he pathlib.Path class is careful to fix path separator issues before they even occur.

NameError: name ‘os’ is not defined

If you’ve started a new Django 3.1+ project and are using older tutorials or guides, it’s likely to come across the following error on your command line:

NameError: name 'os' is not defined

Starting with Django 3.1, the startproject command generates a settings.py file that imports pathlib rather than os on the top line.

The quick fix is to import os at the top of your settings.py file:

# settings.py
import os # new
from pathlib import Path

The better fix is learn more about how pathlib works and update your BASE_DIR, DATABASES, STATICFILES_DIRS, and other files to use the newer, modern approach.

How To Use the pathlib Module to Manipulate Filesystem Paths in Python 3

Introduction

Python 3 includes the pathlib module for manipulating filesystem paths agnostically whatever the operating system. pathlib is similar to the os.path module, but pathlib offers a higher level—and often times more convenient—interface than os.path.

We can identify files on a computer with hierarchical paths.

For example, we might identify the file wave.txt on a computer with this path: /Users/sammy/ocean/wave.txt.

Operating systems represent paths slightly differently.

Windows might represent the path to the wave.txt file like C:Userssammyoceanwave.txt.

You might find the pathlib module useful if in your Python program you are creating or moving files on the filesystem, listing files on the filesystem that all match a given extension or pattern, or creating operating system appropriate file paths based on collections of raw strings. While you might be able to use other tools (like the os.path module) to accomplish many of these tasks, the pathlib module allows you to perform these operations with a high degree of readability and minimal amount of code.

In this tutorial, we’ll go over some of the ways to use the pathlib module to represent and manipulate filesystem paths.

Computing Relative Paths

We can use the Path.relative_to method to compute paths relative to one another.

The relative_to method is useful when, for example, you want to retrieve a portion of a long file path.

Consider the following code:

shark = Path("ocean", "animals", "fish", "shark.txt")
below_ocean = shark.relative_to(Path("ocean"))
below_animals = shark.relative_to(Path("ocean", "animals"))
print(shark)
print(below_ocean)
print(below_animals)

If we run this, we’ll receive output like the following

ocean/animals/fish/shark.txt
animals/fish/shark.txt
fish/shark.txt