Finding directories with specific markers
A tutorial on how to find directories with specific markers (filename) and check them for some conditions using `sly.fs.dirs_with_marker()` function.
Introduction
First of all, let's talk about the use cases of this function. When working with import apps, there are often cases where we need a specific structure for input data to ensure the application functions correctly. Frequently, it's necessary to identify a specific file marker that serves as a reference point for determining the rest of the structure. However, user-provided data can have varying structures. For instance, the target directory might not be in the root directory but nested within several other directories, or the data may contain not just one but multiple suitable directories.
For example, we're trying to find a directory with a config.json
file in it. The directory structure might look like this:
So, in this case, we need to find two directories: nested_dir_3
and nested_dir_4
. It's not a problem to find them, but why implement the same logic every time? It's much easier to use a function that will do it for us everywhere we need it.
And it's just a part of the problem, we're trying to solve here. Same as finding directories, we usually need to check them for some conditions. Because the file we've found maybe incorrect and a user may just forget to delete it, or the directory may not contain other needed files. If we try to work with the data from this directory, we may have an error or, much worse, incorrect results. Of course, we can write the function that will check the directory for us, and pass the directories to it one by one. Well, while using sly.fs.dirs_with_marker()
we still need to have this function, if we need to check the directory for some conditions, but we can make this process more convenient and the code more readable and clear.
Function signature
Parameters
input_path
str
Path to the directory from which the search will be performed.
markers
Union[str, List[str]]
Filename or list of filenames, which will be searched.
check_function
Optional[Callable]
Function that will be used to check the directory for some conditions.
ignore_case
Optional[bool]
If True
, the search will be case-insensitive. Default is False
.
Data Example
We prepared a short Python script, that will unpack an archive (as an example of input data from a user) and find directories with config.json
files in them. Then it will check if they're valid. Conditions for checking are the following:
The directory must contain
config.json
file.The
config.json
file must have a keyvalid
, and its value must betrue
.The directory must contain two other subdirectories:
images
andanns
.
Example archive structure:
So, we have two directories with config.json
files in them, but only one of them is valid.
You can find the above demo archive in the data directory of the dirs-with-marker repo - here
Tutorial content
Everything you need to reproduce this tutorial is on GitHub: main.py.
Step 1. How to extract the archive and remove junk files
It's not a rare case, when the archive from a user contains some unnecessary files. For example, the archive may contain a __MACOSX
directory or .DS_Store
files, which are created by macOS. If we don't handle them, we may have an error while working with the data, so in most cases, it's much easier to delete them.
To extract the archive, we'll use the sly.fs.unpack_archive()
function:
ℹ️ If the archive was already extracted and you want to remove junk files from it, you can use the sly.fs.remove_junk_from_dir()
function:
Now we have a directory without junk files, and we can start searching for directories with markers.
Step 2. How to find directories with markers
As we already have a directory without junk files, we need to specify the markers we want to find. In our case, it's config.json
file. We can pass it as a string or as a list of strings if we need to find multiple markers.
ℹ️ If you're passing a list of markers it's important to mention, that they will be searched with the OR
, not AND
condition. It means that if you pass a list of markers, the function will return all directories that contain at least one of the markers. If you need a condition when the directory contains several specific markers, you can implement this check in the check_function
parameter, we will talk about it in next section.
Now we can use the sly.fs.dirs_with_marker()
function to iterate over all directories with markers:
Ok, we've found the directories, but how can we check them for some conditions? Let's talk about it in the next section.
Step 3. How to check directories for specific conditions
As we've already mentioned, we can pass a function to the check_function
parameter, which will be used to check the directory for specific conditions. This function must return True
if the directory is valid and False
otherwise. So, our conditions for checking were: config.json
file must have a key valid
, and its value must be true
, and the directory must contain two subdirectories: images
and anns
. Let's implement this check:
Just as a reminder, the check_function
must return the bool
value.
Example of the final code
So, we've implemented the check function, and now we can use it in the sly.fs.dirs_with_marker()
function. Here's the full code:
Let's have a look on what we've got here:
We've extracted the archive and removed junk files.
We've specified the marker we want to find.
We've iterated over directories with markers without checking them. In our test case it will print two directories, while the correct is only one.
We've defined the check function that will be used to check the directory meet our requirements.
We've iterated over directories that passed all checks. In our test case it will print only one directory, which is correct.
And now we can easily work with the data from the directory (or directories) we've found, knowing that it's valid.
Summary
In this tutorial, we've learned how to find directories with specific markers and check them for some conditions using sly.fs.dirs_with_marker()
function. We've also learned how to extract the archive and clean it of junk files using sly.fs.unpack_archive()
or sly.fs.remove_junk_from_dir()
functions.
We hope this tutorial was helpful for you and it will save you some time in the future, while working with import apps.
Last updated