UFile Class

class urgap.ufile.ufile.UFile(uri: str)

Urgap pipeline file interface.

__init__(uri: str) None

Create a new UFile instance.

Parameters:

uri – Full UUri to the file.

__repr__() str

Get string representation of the UFile.

Returns:

UUri as a string.

__eq__(other: object) bool

Test equality to another UFile by ucfs.

Parameters:

other – Object to compare.

Returns:

True if both are UFiles with the same ucfs, otherwise False.

__lt__(other: UFile) bool

Lexical comparison by ucfs for sorting.

Parameters:

other – UFile to compare.

Returns:

True if this ucfs is less than other’s.

__deepcopy__(memo: dict) UFile

Create a deep copy of this UFile.

Parameters:

memo – The copy memo dict.

Returns:

A deep copy of the UFile, with private attributes set to None.

as_storage_base_uri() str

Get the storage base UUri, omitting query and fragment.

Returns:

The storage base UUri as a string.

as_uri(scheme: str | None = None, netloc: str | None = None, path: str | None = None, fragment: str | None = None, query: str | None = None) str

Get a string UUri representation of this file, optionally overriding UUri components.

Parameters:
  • scheme – Override the scheme.

  • netloc – Override the network location.

  • path – Override the path.

  • fragment – Override the fragment (object name).

  • query – Override the query string.

Returns:

The UUri as a string.

check_uri_scheme_exists() None

Check if the URI scheme is supported.

Returns:

True if the URI scheme is supported.

Raises:

ValueError – If the scheme is not in the supported list.

compress(compression_format: str) UFile

Compress this UFile into a new compressed file.

Parameters:

compression_format – The format to use: ‘zip’, ‘gz’, or ‘tar’.

Returns:

A new compressed UFile.

Raises:

NotImplementedError – If the format is unsupported.

create_container() None

Create the remote container (e.g., bucket or folder) for this file.

create_lineage_graph(use_umeta: bool = True) DiGraph

Create a directed graph representing the lineage for this UFile.

Parameters:

use_umeta – Use UMeta to reconstruct lineage if True.

Returns:

The directed graph of UFile lineage.

download() None

Download this file from remote storage to local scratch.

format_uri() None

Format the URI if storage_base_uri and ucfs combination was used to construct uri.

classmethod from_path_object(path_object: Path, number_of_parents: int = 1, query: str | None = None) UFile

Construct a UFile from a filesystem Path.

Parameters:
  • path_object – Path object for the file.

  • number_of_parents – How many parent directories to include in the UUri.

  • query – Optional query string for tags.

Returns:

A new UFile instance.

get_object() Path | None

Get a local object from remote storage if it exists.

Returns:

The path to the local object, or None if it does not exist remotely.

guess_uftype_from_suffix() str | None

Guess the Urgap file type for this file based on its extension.

property hash: str

The hash checksum for this file.

Returns:

The hash string using the algorithm specified in the configuration. Will be calculated if missing.

identify_lineage_root_files(use_umeta: bool = True) list

List root files in the lineage graph, optionally using UMeta.

Parameters:

use_umeta – If True, use UMeta interface, else reconstruct from tags.

Returns:

List of root object names.

init_io_class() <property object at 0x7fbad68c0540>

Initialize the IO backend for this file, based on the UUri scheme.

Returns:

The IO class instance for this file.

Raises:

ImportError – If the IO backend is not installed.

property io: <module 'urgap.umeta.io' from '/home/runner/work/urgap/urgap/urgap/umeta/io/__init__.py'>

IO property to access the Urgap IO backend for this file.

Returns:

The initialized IO instance.

property is_borg: bool

Whether this file is part of a multifile (borg) collection.

Returns:

True if part of a collection, False otherwise.

property is_part_of_collection: bool

Whether this file is part of a multifile collection.

Returns:

True if part of a collection, False otherwise.

list_container_items(pattern: str | None = None, limit: int = 1000, full_string: bool = False, start_date: str | None = None, end_date: str | None = None) list

List all objects in the remote container.

Parameters:
  • pattern – Regex pattern for filtering object names.

  • limit – Maximum number of files to request in one query.

  • full_string – Whether to return the list with full strings or just fragments.

  • start_date – ISO format datetime string to filter files modified after this date.

  • end_date – ISO format datetime string to filter files modified before this date.

Returns:

List of object names.

property object_name: str

The object name portion from the UUri.

Returns:

The object name.

property parents: list

List the object names of direct parent files for this UFile.

Returns:

List of parent object names.

property path: Path

Path to the local scratch copy of this UFile.

If not present locally or if hashes differ, downloads the remote file.

Returns:

Local file path.

property provenance: None | DiGraph

Provenance of UFile as a directed graph.

Returns:

None or provenance representation as NX DiGraph.

purge_local() None

Remove the local file and its tags from the scratch disk.

purge_local_file() None

Remove only the local file from the scratch disk.

purge_local_tags() None

Remove only the cached local tags for this UFile.

rebase(uri: str | None = None, upload: bool = False, **kwargs: ~P) None

Change this UFile’s UUri and (optionally) upload it to new storage.

If the provided URI has no query string, clear dynamic tags (e.g. md5, parent_*) but preserve structural tags like uftype so downstream logic (e.g. rerun/skip) keeps working.

Parameters:
  • uri – New UUri string.

  • upload – If True, upload the file after rebasing.

  • **kwargs – Passed to upload().

recalculate_hashes(force_local: bool = False) None

Recalculate configured file hash for this file.

Parameters:

force_local – If True, always use the local file.

relocate_fragment_to_path(steps: int = 1, upload: bool = False) None

Move the fragment component into the path or vice versa by relocating the # separator.

Parameters:
  • steps – Number of path segments to move. Positive values move fragment segments into the path (fragment → path). Example: steps=2 converts ‘path#a/b/c’ to ‘path/a/b#c’ Negative values move path segments into the fragment (path → fragment). Example: steps=-2 converts ‘path/a/b#c’ to ‘path#a/b/c’

  • upload – If True, upload the file after rebasing.

Raises:

ValueError – If steps is zero.

remote_object_exists() bool

Check if the remote object exists.

Returns:

True if the remote object exists, False otherwise.

remove_remote_object() None

Remove this object from remote storage.

property simple_name: str

Returns a simplified file name (stem, no extension).

Returns:

Simple name string.

simplify_name(source_object_names: set, prefix: str | None = None, suffix: str | None = None, storage_base_uri: str | None = None) UFile | None

Rename and optionally rebase this file for user-friendly output.

Parameters:
  • source_object_names – Set of valid source object names to match in parents.

  • prefix – Optional prefix for the new name.

  • suffix – Optional suffix for the new name.

  • storage_base_uri – If given, rebase the output file here.

Returns:

The renamed UFile, or None if no matching parent found.

property storage_base_uri: str

The storage base UUri, omitting query and fragment.

Returns:

The storage base UUri as a string.

property tags: dict

Get tags associated with this UFile.

Returns:

A dictionary of tags for this UFile, merged from remote and UUri query if present.

property ucfs: str

Unique content file string for this UFile.

Returns:

String in the format object_name@hash.

property uftype: str

The Urgap file type for this file.

Returns:

The uftype string, or ‘UNKNOWN’ if not defined.

uncompress(compression_format: str | None = None, recursive: bool = True) UFileList

Uncompress this UFile (auto-detecting format if needed).

Parameters:
  • compression_format – The format to uncompress. If None, will be auto-detected.

  • recursive – If True, recursively uncompress nested archives.

Returns:

UFileList containing all uncompressed files.

Raises:

NotImplementedError – If the format is unsupported.

upload(overwrite: bool = True, verify: bool = False, purge: bool = True, retries: int = 3) None

Upload this file to remote storage.

Parameters:
  • overwrite – If True, always upload, overwriting existing remote files.

  • verify – If True, check that the remote hash matches local; re-upload if needed.

  • purge – If True, remove the local copy of the file after a successful upload.

  • retries – Number of times to retry verification on failure.

UUriDict Class

class urgap.ufile.uuri.UUri(uri: str)

A normalized UUri parser and container for Urgap.

This class encapsulates the components of a UUri or UCFS string as attributes, parses them, and provides helpers to access the elements as a dict. It also extracts tags from queries and can handle different storage backends.

__init__(uri: str) None

Initialize a UUri instance from a full UUri string.

Parameters:

uri – A full UUri string to be parsed.

Raises:

ValueError – If the URI scheme is not supported or the URI is malformed.

property azure_directory_list: list | None

Get the directory_list.

property azure_object_directory_list: list | None

Get the object_directory_list.

property azure_object_file: str | None

Get the object_file.

property azure_share: str | None

Get the share.

check_fragment_integrity() None

Validate that query parameters are not incorrectly placed in the fragment.

Raises:

ValueError – If query content is found in the fragment string.

property container_name: str

Get the container_name.

property file_remote_path: Path | None

Get the file_remote_path.

property file_remote_tag_path: Path | None

Get the file_remote_tag_path.

get_azure_directory_list() list

Get the directory_list.

get_azure_object_directory_list() list

Get the object_directory_list.

get_azure_object_file() str

Get the object_file.

get_azure_share() str

Get the share.

get_container_name() str

Get the container_name.

get_file_remote_path() Path

Get the file_remote_path.

get_file_remote_tag_path() Path

Get the file_remote_tag_path.

get_github_resource_name(resource: str = 'repo') str

Get the github resource name.

get_host() str | None

Get the host.

get_https_remote_path() str

Get the https_remote_path.

get_https_remote_tag_path() str

Get the https_remote_tag_path.

get_mylabdata_api_url() str

Get the api_url.

get_mylabdata_api_url_files() str

Get the api_url_files.

get_mylabdata_url() str

Return a mylabdata URL with the encoded fragment.

get_object_name() str

Get the object_name.

get_port() str | None

Get the port.

get_samba_share() str

Get the samba_share.

property github_resource_name: str | None

Get the github resource name.

property host: str | None

Get the host.

property https_remote_path: str | None

Get the https_remote_path.

property https_remote_tag_path: str | None

Get the https_remote_tag_path.

property mylabdata_api_url: str | None

Get the api_url.

property mylabdata_api_url_files: str | None

Get the api_url_files.

property mylabdata_url: str | None

Get the mylabdata URL.

property object_name: str

Get the object_name.

parse_query_tags() dict

Parse a query string into a dictionary, inferring types for each value.

Returns:

Dictionary of parsed query tags.

property password: str | None

Return the password for this UUri. If not already set, attempts to load it.

Returns:

Password associated with this UUri, or empty string if not found.

Return type:

str

property port: str | None

Get the port.

property samba_share: str | None

Get the samba_share.

property storage_uri: str

Get the storage_uri.

property user: str | None

Return the username for this UUri. If not already set, attempts to load it.

Returns:

Username associated with this UUri, or empty string if not found.

Return type:

str

UFile IO Classes

class urgap.ufile.io._base.UIOBase(**kwargs: ~P)

Base class for local file IO implementations in Urgap.

All UIO (Urgap Input/Output) classes should inherit from this to provide a consistent interface for working with local copies of files.

__init__(**kwargs: ~P) None

Create a new UIOBase instance and set the uuri attribute from kwargs.

Parameters:

kwargs – Must include “uuri”, which is the parsed UUri object.

add_storage_uri_to_container_items(container_items: list) list

Add the storage URI to the container items.

Most IO backends will not add urgap style uri to the listed container items, hence this function can be called to add the missing storage_uri#.

download() None

Download the file from remote storage to local scratch disk.

This method must be implemented in a subclass for a specific storage backend. Raises NotImplementedError if not overridden.

local_object_exists() bool

Check whether the local scratch file already exists.

Returns:

True if the file exists on disk, False otherwise.

property scratch_path: Path

Get the full local file path for the scratch file.

The parent directory is created if it doesn’t exist.

Returns:

The Path object pointing to the file on the local scratch disk.

upload() None

Upload the file from local scratch disk to remote storage.

This method must be implemented in a subclass for a specific storage backend. Raises NotImplementedError if not overridden.

class urgap.ufile.io.azure_blob.IOAzureBlobStorage(**kwargs: ~P)

UIO class interface for Azure Blob Storage.

Provides methods for uploading, downloading, and listing blobs, as well as fetching blob metadata.

__init__(**kwargs: ~P) None

Initialize the Azure Blob Storage IO class.

Parameters:

kwargs – Passed to UIOBase. Requires “uri” for connection setup.

download() None

Download the blob to the scratch path from remote storage.

Downloads only if the local file is missing or the hash does not match the remote.

get_file_properties() dict | None

Get properties associated with the referenced file.

Returns:

Dictionary with properties of the file, or None if not found.

get_remote_tags() dict | None

Get remote tags (metadata) for the referenced blob.

Returns:

The dictionary of metadata tags, creation_time, last_modified if the object exists, otherwise None.

is_within_date_range(blob: object, start_date: str, end_date: str) bool

Check if the item’s last modified date is within the specified date range.

Parameters:
  • blob – The azure blob to check.

  • start_date – ISO format datetime string for the start date filter.

  • end_date – ISO format datetime string for the end date filter.

Returns:

True if the item’s last modified date is within the range, False otherwise.

list_container_items(pattern: str | None = None, limit: int | None = 1000, full_string: bool = False, start_date: str | None = None, end_date: str | None = None) list

List all objects in the Azure container, optionally filtering by regex pattern.

Parameters:
  • pattern – Regular expression pattern to filter blob names.

  • limit – Maximum number of files to request in one query.

  • full_string – Whether to return the list with full strings or just fragments.

  • start_date – ISO format datetime string to filter blobs modified after this date.

  • end_date – ISO format datetime string to filter blobs modified before this date.

Returns:

A list of blob names that match the pattern, or all blob names if pattern is None.

remote_object_exists() bool

Check if the blob exists in the container.

Returns:

True if the blob exists, otherwise False.

property remote_path: None

Azure blobs do not have a traditional remote path.

Returns:

None.

property remote_tag_path: None

Azure blobs do not have a separate remote tag path.

Returns:

None.

upload(tags: dict | None = None) None

Upload the scratch file to the remote blob, attaching provided tags as metadata.

Parameters:

tags – Dictionary of metadata tags to write to remote location. If too many, parent keys are removed.

class urgap.ufile.io.file.IOPython(**kwargs: ~P)

UIO Class interface for regular Python file objects.

Provides basic file IO for local files on disk.

__init__(**kwargs: ~P) None

Create new UIO class for processing file scheme.

Parameters:

**kwargs – Requires ‘uuri’ to set up the object and path information.

create_container(exist_ok: bool = True) None

Create a new container (folder) at referenced remote location.

Parameters:

exist_ok – Whether it is okay if the directory already exists.

download() None

Download referenced remote object (copy from remote_path to scratch_path).

If the file does not exist, a debug message is logged.

get_object() Path

Get the referenced UUri as a local Path.

Returns:

Path object of the file.

get_remote_tags() dict | None

Get remote tags associated with referenced file.

Returns:

Dictionary with tags/metadata if present, else None.

list_container_items(pattern: str | None = None, full_string: bool = False, **_kwargs: ~P) list

Get all objects in a container (directory), optionally filtered by pattern.

Parameters:
  • pattern – Optional regex pattern for filtering file names.

  • full_string – Whether to return the list with full strings or just fragments.

Returns:

List of object names (relative paths) in the container matching the pattern.

remote_object_exists() bool

Verify referenced remote object exists.

Returns:

True if the file exists on disk, otherwise False.

remove_remote_object() None

Delete referenced remote location file and associated .tag file, if present.

upload(tags: dict | None = None) None

Upload local scratch file and associated tag to remote location.

Parameters:

tags – Optional dictionary with metadata to store as .tag file.

Raises:

OSError – If the file cannot be copied.

class urgap.ufile.io.google_storage.IOGoogleCloudStorage(**kwargs: ~P)

UIO class interface for Google Cloud Storage.

Provides interaction and file operations for Google Cloud Storage buckets and objects.

__init__(**kwargs: ~P) None

Create new UIO class for processing Google Cloud Storage.

Parameters:

**kwargs – Passed to UIOBase. Must contain UUri and relevant parsed attributes.

download() None

Download file to scratch path from remote location.

Checks local hash and remote hash (if tags present); downloads only if different or not present. Logs a message if remote does not exist.

get_remote_tags() dict | None

Get remote tags associated with referenced file.

Returns:

Dictionary of metadata tags if the blob exists, otherwise None.

list_container_items(pattern: str | None = None, full_string: bool = False) list

Get objects in folder/’container’, optionally filtered by a regex pattern.

Parameters:
  • pattern – Optional regex pattern for filtering blob names.

  • full_string – Whether to return the list with full strings or just fragments.

Returns:

List of blob names (strings) matching the pattern, or all if pattern is None.

remote_object_exists() bool

Check if object exists in the container.

Returns:

True if the blob exists, otherwise False.

property remote_path: str | None

Get remote file path.

Returns:

Always None for Google Cloud Storage, as the full UUri is managed by GCS.

property remote_tag_path: str | None

Get remote file tag path.

Returns:

Always None for Google Cloud Storage, as tags are stored as blob metadata.

upload(tags: dict | None = None) None

Upload scratch file to remote location with associated tags.

Parameters:

tags – Optional dictionary of metadata to set for the blob.

class urgap.ufile.io.https.IOHTTPS(**kwargs: ~P)

UIO Class interface for http/https file objects.

Handles interaction with files accessible via HTTP/S URLs, including download and tag retrieval.

__init__(**kwargs: ~P) None

Create new UIO class for processing https scheme.

Parameters:

**kwargs – Requires ‘uuri’ key to set respective attribute.

download() None

Download referenced remote object.

Writes the remote object to the local scratch path. If download fails, removes the partially downloaded file.

get_object() str

Get referenced URL.

Returns:

The remote URL as a string.

get_remote_tags() dict | None

Get remote tags associated with the referenced file.

Returns:

A dictionary containing remotely stored tags, or None if unavailable or decoding fails.

remote_object_exists() bool

Verify referenced remote object exists.

Returns:

True if the remote object exists, otherwise False.

upload(tags: dict | None = None) None

Upload method unsupported for https.

Parameters:

tags – Tags to write to remote location (ignored).

Raises:

NotImplementedError – Always raised, as HTTP/S does not support upload.