{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# UFile (and UFileList)\n", "\n", "Urgap provides a standardized interface to interact with different file storages and present the file objects in a standardized fashion.\n", "\n", "As of writing this tutorial the supported file storages are:\n", "\n", "- google storage bucket\n", "- azure blob storage\n", "- samba network drives\n", "- local file paths\n", "- https\n", "- github\n", "- (s)ftp\n", "\n", "Since we have abstracted the interaction with the file storages in an interface (see. urgap.ufile.io), other storage backends can be added with ease." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Urgap uses uri as pointer to files with a major difference to other systems, that is Urgap separates the location and the identity of a file.\n", "\n", "Think of it declaring files like books irl, that is each book has a unique ISBN which defines its identity, yet it location, e.g. which bookstore currently holds that book, is not unique.\n", "\n", "We used the fragment of a URI to specify the identity of a file, and the last element of the netloc as container, e.g.\n", "\n", "```bash\n", " uri: file:///#\n", " ^---- (fragment)\n", " ^-(url, last element is container name)\n", "```" ] }, { "metadata": {}, "cell_type": "markdown", "source": "Lets import urgap first and try out a real life example on how to work with UFiles" }, { "cell_type": "code", "metadata": {}, "source": [ "import urgap" ], "outputs": [], "execution_count": null }, { "metadata": {}, "cell_type": "markdown", "source": [ "We are going to use the README.md file from the main branch of numpy as an example file, which is located at https://raw.githubusercontent.com/numpy/numpy/refs/heads/main#README.md.\n", "\n", "We can declare a UFile object pointing to this file as follows using the uri schema `https` and the fragment `README.md` as the identity of the file:" ] }, { "cell_type": "code", "metadata": {}, "source": "uf = urgap.UFile(uri=\"https://raw.githubusercontent.com/numpy/numpy/refs/heads/main#README.md\")", "outputs": [], "execution_count": null }, { "metadata": {}, "cell_type": "markdown", "source": "Now the UFile is instantiated, but not yet downloaded. To do so we can use the .download() method or simply call .path on the UFile object." }, { "metadata": {}, "cell_type": "code", "source": [ "uf.download()\n", "print(uf.path)\n", "uf.path.exists()" ], "outputs": [], "execution_count": null }, { "metadata": {}, "cell_type": "markdown", "source": [ "As you can see the file now exists on a local scratch path on your machine.\n", "\n", "If you want to download or upload files to a backend that requires credentials, please refer to the UCredentialManager tutorial in this series.\n", "\n", "To change the location of the file, you can use the rebase method with (upload=True) flag.\n", "\n", "You can rebase into any backend that you have the credentials to.\n", "\n", "We are going to just rebase into a different location on the local file system for demonstration purposes.\n", "\n", "Notice how in rebasing you don't have to specify the fragment as it is preserved by default." ] }, { "metadata": {}, "cell_type": "code", "source": [ "import tempfile\n", "\n", "with tempfile.TemporaryDirectory() as tmpdir:\n", " uf.rebase(uri=f\"file://{tmpdir}/new_directory\", upload=True)\n", " print(uf.path)\n", " print(uf.path.exists())" ], "outputs": [], "execution_count": null }, { "metadata": {}, "cell_type": "markdown", "source": [ "You can define multiple uris in a list and create a UFileList object, to handle multiple UFiles at once.\n", "\n", "A UFileList is required as input for a UNode run.\n", "\n", "Initialize a UFileList like this:" ] }, { "metadata": {}, "cell_type": "code", "source": [ "ufl = urgap.UFileList(\n", " [\n", " urgap.UFile(uri=\"https://raw.githubusercontent.com/numpy/numpy/refs/heads/main#README.md\"),\n", " urgap.UFile(uri=\"https://raw.githubusercontent.com/numpy/numpy/refs/heads/main#LICENSE.txt\"),\n", " ]\n", ")\n", "print(ufl)" ], "outputs": [], "execution_count": null }, { "metadata": {}, "cell_type": "code", "source": [ "for file in ufl:\n", " print(uf.path.exists())" ], "outputs": [], "execution_count": null } ], "metadata": { "kernelspec": { "display_name": "u2_p310", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.11" } }, "nbformat": 4, "nbformat_minor": 2 }