UFile (and UFileList)

Urgap provides a standardized interface to interact with different file storages and present the file objects in a standardized fashion.

As of writing this tutorial the supported file storages are:

  • google storage bucket

  • azure blob storage

  • samba network drives

  • local file paths

  • https

  • github

  • (s)ftp

Since we have abstracted the interaction with the file storages in an interface (see. urgap.ufile.io), other storage backends can be added with ease.

Urgap uses uri as pointer to files with a major difference to other systems, that is Urgap separates the location and the identity of a file.

Think of it declaring files like books irl, that is each book has a unique ISBN which defines its identity, yet it location, e.g. which bookstore currently holds that book, is not unique.

We used the fragment of a URI to specify the identity of a file, and the last element of the netloc as container, e.g.

uri: file://<any directory structure ...>/<container>#<object>
                                                       ^---- (fragment)
                                                ^-(url, last element is container name)

Lets import urgap first and try out a real life example on how to work with UFiles

[ ]:
import urgap

We are going to use the README.md file from the main branch of numpy as an example file, which is located at https://raw.githubusercontent.com/numpy/numpy/refs/heads/main#README.md.

We can declare a UFile object pointing to this file as follows using the uri schema https and the fragment README.md as the identity of the file:

[ ]:
uf = urgap.UFile(uri="https://raw.githubusercontent.com/numpy/numpy/refs/heads/main#README.md")

Now the UFile is instantiated, but not yet downloaded. To do so we can use the .download() method or simply call .path on the UFile object.

[ ]:
uf.download()
print(uf.path)
uf.path.exists()

As you can see the file now exists on a local scratch path on your machine.

If you want to download or upload files to a backend that requires credentials, please refer to the UCredentialManager tutorial in this series.

To change the location of the file, you can use the rebase method with (upload=True) flag.

You can rebase into any backend that you have the credentials to.

We are going to just rebase into a different location on the local file system for demonstration purposes.

Notice how in rebasing you don’t have to specify the fragment as it is preserved by default.

[ ]:
import tempfile

with tempfile.TemporaryDirectory() as tmpdir:
    uf.rebase(uri=f"file://{tmpdir}/new_directory", upload=True)
    print(uf.path)
    print(uf.path.exists())

You can define multiple uris in a list and create a UFileList object, to handle multiple UFiles at once.

A UFileList is required as input for a UNode run.

Initialize a UFileList like this:

[ ]:
ufl = urgap.UFileList(
    [
        urgap.UFile(uri="https://raw.githubusercontent.com/numpy/numpy/refs/heads/main#README.md"),
        urgap.UFile(uri="https://raw.githubusercontent.com/numpy/numpy/refs/heads/main#LICENSE.txt"),
    ]
)
print(ufl)
[ ]:
for file in ufl:
    print(uf.path.exists())