UFile Class
- class urgap.ufile.ufile.UFile(uri: str)
Urgap pipeline file interface.
- __init__(uri: str) None
Create a new UFile instance.
- Parameters:
uri – Full UUri to the file.
- __repr__() str
Get string representation of the UFile.
- Returns:
UUri as a string.
- __eq__(other: object) bool
Test equality to another UFile by ucfs.
- Parameters:
other – Object to compare.
- Returns:
True if both are UFiles with the same ucfs, otherwise False.
- __lt__(other: UFile) bool
Lexical comparison by ucfs for sorting.
- Parameters:
other – UFile to compare.
- Returns:
True if this ucfs is less than other’s.
- __deepcopy__(memo: dict) UFile
Create a deep copy of this UFile.
- Parameters:
memo – The copy memo dict.
- Returns:
A deep copy of the UFile, with private attributes set to None.
- as_storage_base_uri() str
Get the storage base UUri, omitting query and fragment.
- Returns:
The storage base UUri as a string.
- as_uri(scheme: str | None = None, netloc: str | None = None, path: str | None = None, fragment: str | None = None, query: str | None = None) str
Get a string UUri representation of this file, optionally overriding UUri components.
- Parameters:
scheme – Override the scheme.
netloc – Override the network location.
path – Override the path.
fragment – Override the fragment (object name).
query – Override the query string.
- Returns:
The UUri as a string.
- check_uri_scheme_exists() None
Check if the URI scheme is supported.
- Returns:
True if the URI scheme is supported.
- Raises:
ValueError – If the scheme is not in the supported list.
- compress(compression_format: str) UFile
Compress this UFile into a new compressed file.
- Parameters:
compression_format – The format to use: ‘zip’, ‘gz’, or ‘tar’.
- Returns:
A new compressed UFile.
- Raises:
NotImplementedError – If the format is unsupported.
- create_container() None
Create the remote container (e.g., bucket or folder) for this file.
- create_lineage_graph(use_umeta: bool = True) DiGraph
Create a directed graph representing the lineage for this UFile.
- Parameters:
use_umeta – Use UMeta to reconstruct lineage if True.
- Returns:
The directed graph of UFile lineage.
- download() None
Download this file from remote storage to local scratch.
- format_uri() None
Format the URI if storage_base_uri and ucfs combination was used to construct uri.
- classmethod from_path_object(path_object: Path, number_of_parents: int = 1, query: str | None = None) UFile
Construct a UFile from a filesystem Path.
- Parameters:
path_object – Path object for the file.
number_of_parents – How many parent directories to include in the UUri.
query – Optional query string for tags.
- Returns:
A new UFile instance.
- get_object() Path | None
Get a local object from remote storage if it exists.
- Returns:
The path to the local object, or None if it does not exist remotely.
- guess_uftype_from_suffix() str | None
Guess the Urgap file type for this file based on its extension.
- property hash: str
The hash checksum for this file.
- Returns:
The hash string using the algorithm specified in the configuration. Will be calculated if missing.
- identify_lineage_root_files(use_umeta: bool = True) list
List root files in the lineage graph, optionally using UMeta.
- Parameters:
use_umeta – If True, use UMeta interface, else reconstruct from tags.
- Returns:
List of root object names.
- init_io_class() <property object at 0x7fbad68c0540>
Initialize the IO backend for this file, based on the UUri scheme.
- Returns:
The IO class instance for this file.
- Raises:
ImportError – If the IO backend is not installed.
- property io: <module 'urgap.umeta.io' from '/home/runner/work/urgap/urgap/urgap/umeta/io/__init__.py'>
IO property to access the Urgap IO backend for this file.
- Returns:
The initialized IO instance.
- property is_borg: bool
Whether this file is part of a multifile (borg) collection.
- Returns:
True if part of a collection, False otherwise.
- property is_part_of_collection: bool
Whether this file is part of a multifile collection.
- Returns:
True if part of a collection, False otherwise.
- list_container_items(pattern: str | None = None, limit: int = 1000, full_string: bool = False, start_date: str | None = None, end_date: str | None = None) list
List all objects in the remote container.
- Parameters:
pattern – Regex pattern for filtering object names.
limit – Maximum number of files to request in one query.
full_string – Whether to return the list with full strings or just fragments.
start_date – ISO format datetime string to filter files modified after this date.
end_date – ISO format datetime string to filter files modified before this date.
- Returns:
List of object names.
- property object_name: str
The object name portion from the UUri.
- Returns:
The object name.
- property parents: list
List the object names of direct parent files for this UFile.
- Returns:
List of parent object names.
- property path: Path
Path to the local scratch copy of this UFile.
If not present locally or if hashes differ, downloads the remote file.
- Returns:
Local file path.
- property provenance: None | DiGraph
Provenance of UFile as a directed graph.
- Returns:
None or provenance representation as NX DiGraph.
- purge_local() None
Remove the local file and its tags from the scratch disk.
- purge_local_file() None
Remove only the local file from the scratch disk.
- purge_local_tags() None
Remove only the cached local tags for this UFile.
- rebase(uri: str | None = None, upload: bool = False, **kwargs: ~P) None
Change this UFile’s UUri and (optionally) upload it to new storage.
If the provided URI has no query string, clear dynamic tags (e.g.
md5,parent_*) but preserve structural tags likeuftypeso downstream logic (e.g. rerun/skip) keeps working.- Parameters:
uri – New UUri string.
upload – If True, upload the file after rebasing.
**kwargs – Passed to upload().
- recalculate_hashes(force_local: bool = False) None
Recalculate configured file hash for this file.
- Parameters:
force_local – If True, always use the local file.
- relocate_fragment_to_path(steps: int = 1, upload: bool = False) None
Move the fragment component into the path or vice versa by relocating the # separator.
- Parameters:
steps – Number of path segments to move. Positive values move fragment segments into the path (fragment → path). Example: steps=2 converts ‘path#a/b/c’ to ‘path/a/b#c’ Negative values move path segments into the fragment (path → fragment). Example: steps=-2 converts ‘path/a/b#c’ to ‘path#a/b/c’
upload – If True, upload the file after rebasing.
- Raises:
ValueError – If steps is zero.
- remote_object_exists() bool
Check if the remote object exists.
- Returns:
True if the remote object exists, False otherwise.
- remove_remote_object() None
Remove this object from remote storage.
- property simple_name: str
Returns a simplified file name (stem, no extension).
- Returns:
Simple name string.
- simplify_name(source_object_names: set, prefix: str | None = None, suffix: str | None = None, storage_base_uri: str | None = None) UFile | None
Rename and optionally rebase this file for user-friendly output.
- Parameters:
source_object_names – Set of valid source object names to match in parents.
prefix – Optional prefix for the new name.
suffix – Optional suffix for the new name.
storage_base_uri – If given, rebase the output file here.
- Returns:
The renamed UFile, or None if no matching parent found.
- property storage_base_uri: str
The storage base UUri, omitting query and fragment.
- Returns:
The storage base UUri as a string.
- property tags: dict
Get tags associated with this UFile.
- Returns:
A dictionary of tags for this UFile, merged from remote and UUri query if present.
- property ucfs: str
Unique content file string for this UFile.
- Returns:
String in the format object_name@hash.
- property uftype: str
The Urgap file type for this file.
- Returns:
The uftype string, or ‘UNKNOWN’ if not defined.
- uncompress(compression_format: str | None = None, recursive: bool = True) UFileList
Uncompress this UFile (auto-detecting format if needed).
- Parameters:
compression_format – The format to uncompress. If None, will be auto-detected.
recursive – If True, recursively uncompress nested archives.
- Returns:
UFileList containing all uncompressed files.
- Raises:
NotImplementedError – If the format is unsupported.
- upload(overwrite: bool = True, verify: bool = False, purge: bool = True, retries: int = 3) None
Upload this file to remote storage.
- Parameters:
overwrite – If True, always upload, overwriting existing remote files.
verify – If True, check that the remote hash matches local; re-upload if needed.
purge – If True, remove the local copy of the file after a successful upload.
retries – Number of times to retry verification on failure.
UUriDict Class
- class urgap.ufile.uuri.UUri(uri: str)
A normalized UUri parser and container for Urgap.
This class encapsulates the components of a UUri or UCFS string as attributes, parses them, and provides helpers to access the elements as a dict. It also extracts tags from queries and can handle different storage backends.
- __init__(uri: str) None
Initialize a UUri instance from a full UUri string.
- Parameters:
uri – A full UUri string to be parsed.
- Raises:
ValueError – If the URI scheme is not supported or the URI is malformed.
- property azure_directory_list: list | None
Get the directory_list.
- property azure_object_directory_list: list | None
Get the object_directory_list.
- property azure_object_file: str | None
Get the object_file.
Get the share.
- check_fragment_integrity() None
Validate that query parameters are not incorrectly placed in the fragment.
- Raises:
ValueError – If query content is found in the fragment string.
- property container_name: str
Get the container_name.
- property file_remote_path: Path | None
Get the file_remote_path.
- property file_remote_tag_path: Path | None
Get the file_remote_tag_path.
- get_azure_directory_list() list
Get the directory_list.
- get_azure_object_directory_list() list
Get the object_directory_list.
- get_azure_object_file() str
Get the object_file.
Get the share.
- get_container_name() str
Get the container_name.
- get_file_remote_path() Path
Get the file_remote_path.
- get_file_remote_tag_path() Path
Get the file_remote_tag_path.
- get_github_resource_name(resource: str = 'repo') str
Get the github resource name.
- get_host() str | None
Get the host.
- get_https_remote_path() str
Get the https_remote_path.
- get_https_remote_tag_path() str
Get the https_remote_tag_path.
- get_mylabdata_api_url() str
Get the api_url.
- get_mylabdata_api_url_files() str
Get the api_url_files.
- get_mylabdata_url() str
Return a mylabdata URL with the encoded fragment.
- get_object_name() str
Get the object_name.
- get_port() str | None
Get the port.
Get the samba_share.
- property github_resource_name: str | None
Get the github resource name.
- property host: str | None
Get the host.
- property https_remote_path: str | None
Get the https_remote_path.
- property https_remote_tag_path: str | None
Get the https_remote_tag_path.
- property mylabdata_api_url: str | None
Get the api_url.
- property mylabdata_api_url_files: str | None
Get the api_url_files.
- property mylabdata_url: str | None
Get the mylabdata URL.
- property object_name: str
Get the object_name.
- parse_query_tags() dict
Parse a query string into a dictionary, inferring types for each value.
- Returns:
Dictionary of parsed query tags.
- property password: str | None
Return the password for this UUri. If not already set, attempts to load it.
- Returns:
Password associated with this UUri, or empty string if not found.
- Return type:
str
- property port: str | None
Get the port.
Get the samba_share.
- property storage_uri: str
Get the storage_uri.
- property user: str | None
Return the username for this UUri. If not already set, attempts to load it.
- Returns:
Username associated with this UUri, or empty string if not found.
- Return type:
str
UFile IO Classes
- class urgap.ufile.io._base.UIOBase(**kwargs: ~P)
Base class for local file IO implementations in Urgap.
All UIO (Urgap Input/Output) classes should inherit from this to provide a consistent interface for working with local copies of files.
- __init__(**kwargs: ~P) None
Create a new UIOBase instance and set the uuri attribute from kwargs.
- Parameters:
kwargs – Must include “uuri”, which is the parsed UUri object.
- add_storage_uri_to_container_items(container_items: list) list
Add the storage URI to the container items.
Most IO backends will not add urgap style uri to the listed container items, hence this function can be called to add the missing storage_uri#.
- download() None
Download the file from remote storage to local scratch disk.
This method must be implemented in a subclass for a specific storage backend. Raises NotImplementedError if not overridden.
- local_object_exists() bool
Check whether the local scratch file already exists.
- Returns:
True if the file exists on disk, False otherwise.
- property scratch_path: Path
Get the full local file path for the scratch file.
The parent directory is created if it doesn’t exist.
- Returns:
The Path object pointing to the file on the local scratch disk.
- upload() None
Upload the file from local scratch disk to remote storage.
This method must be implemented in a subclass for a specific storage backend. Raises NotImplementedError if not overridden.
- class urgap.ufile.io.azure_blob.IOAzureBlobStorage(**kwargs: ~P)
UIO class interface for Azure Blob Storage.
Provides methods for uploading, downloading, and listing blobs, as well as fetching blob metadata.
- __init__(**kwargs: ~P) None
Initialize the Azure Blob Storage IO class.
- Parameters:
kwargs – Passed to UIOBase. Requires “uri” for connection setup.
- download() None
Download the blob to the scratch path from remote storage.
Downloads only if the local file is missing or the hash does not match the remote.
- get_file_properties() dict | None
Get properties associated with the referenced file.
- Returns:
Dictionary with properties of the file, or None if not found.
- get_remote_tags() dict | None
Get remote tags (metadata) for the referenced blob.
- Returns:
The dictionary of metadata tags, creation_time, last_modified if the object exists, otherwise None.
- is_within_date_range(blob: object, start_date: str, end_date: str) bool
Check if the item’s last modified date is within the specified date range.
- Parameters:
blob – The azure blob to check.
start_date – ISO format datetime string for the start date filter.
end_date – ISO format datetime string for the end date filter.
- Returns:
True if the item’s last modified date is within the range, False otherwise.
- list_container_items(pattern: str | None = None, limit: int | None = 1000, full_string: bool = False, start_date: str | None = None, end_date: str | None = None) list
List all objects in the Azure container, optionally filtering by regex pattern.
- Parameters:
pattern – Regular expression pattern to filter blob names.
limit – Maximum number of files to request in one query.
full_string – Whether to return the list with full strings or just fragments.
start_date – ISO format datetime string to filter blobs modified after this date.
end_date – ISO format datetime string to filter blobs modified before this date.
- Returns:
A list of blob names that match the pattern, or all blob names if pattern is None.
- remote_object_exists() bool
Check if the blob exists in the container.
- Returns:
True if the blob exists, otherwise False.
- property remote_path: None
Azure blobs do not have a traditional remote path.
- Returns:
None.
- property remote_tag_path: None
Azure blobs do not have a separate remote tag path.
- Returns:
None.
- upload(tags: dict | None = None) None
Upload the scratch file to the remote blob, attaching provided tags as metadata.
- Parameters:
tags – Dictionary of metadata tags to write to remote location. If too many, parent keys are removed.
- class urgap.ufile.io.file.IOPython(**kwargs: ~P)
UIO Class interface for regular Python file objects.
Provides basic file IO for local files on disk.
- __init__(**kwargs: ~P) None
Create new UIO class for processing file scheme.
- Parameters:
**kwargs – Requires ‘uuri’ to set up the object and path information.
- create_container(exist_ok: bool = True) None
Create a new container (folder) at referenced remote location.
- Parameters:
exist_ok – Whether it is okay if the directory already exists.
- download() None
Download referenced remote object (copy from remote_path to scratch_path).
If the file does not exist, a debug message is logged.
- get_object() Path
Get the referenced UUri as a local Path.
- Returns:
Path object of the file.
- get_remote_tags() dict | None
Get remote tags associated with referenced file.
- Returns:
Dictionary with tags/metadata if present, else None.
- list_container_items(pattern: str | None = None, full_string: bool = False, **_kwargs: ~P) list
Get all objects in a container (directory), optionally filtered by pattern.
- Parameters:
pattern – Optional regex pattern for filtering file names.
full_string – Whether to return the list with full strings or just fragments.
- Returns:
List of object names (relative paths) in the container matching the pattern.
- remote_object_exists() bool
Verify referenced remote object exists.
- Returns:
True if the file exists on disk, otherwise False.
- remove_remote_object() None
Delete referenced remote location file and associated .tag file, if present.
- upload(tags: dict | None = None) None
Upload local scratch file and associated tag to remote location.
- Parameters:
tags – Optional dictionary with metadata to store as .tag file.
- Raises:
OSError – If the file cannot be copied.
- class urgap.ufile.io.google_storage.IOGoogleCloudStorage(**kwargs: ~P)
UIO class interface for Google Cloud Storage.
Provides interaction and file operations for Google Cloud Storage buckets and objects.
- __init__(**kwargs: ~P) None
Create new UIO class for processing Google Cloud Storage.
- Parameters:
**kwargs – Passed to UIOBase. Must contain UUri and relevant parsed attributes.
- download() None
Download file to scratch path from remote location.
Checks local hash and remote hash (if tags present); downloads only if different or not present. Logs a message if remote does not exist.
- get_remote_tags() dict | None
Get remote tags associated with referenced file.
- Returns:
Dictionary of metadata tags if the blob exists, otherwise None.
- list_container_items(pattern: str | None = None, full_string: bool = False) list
Get objects in folder/’container’, optionally filtered by a regex pattern.
- Parameters:
pattern – Optional regex pattern for filtering blob names.
full_string – Whether to return the list with full strings or just fragments.
- Returns:
List of blob names (strings) matching the pattern, or all if pattern is None.
- remote_object_exists() bool
Check if object exists in the container.
- Returns:
True if the blob exists, otherwise False.
- property remote_path: str | None
Get remote file path.
- Returns:
Always None for Google Cloud Storage, as the full UUri is managed by GCS.
- property remote_tag_path: str | None
Get remote file tag path.
- Returns:
Always None for Google Cloud Storage, as tags are stored as blob metadata.
- upload(tags: dict | None = None) None
Upload scratch file to remote location with associated tags.
- Parameters:
tags – Optional dictionary of metadata to set for the blob.
- class urgap.ufile.io.https.IOHTTPS(**kwargs: ~P)
UIO Class interface for http/https file objects.
Handles interaction with files accessible via HTTP/S URLs, including download and tag retrieval.
- __init__(**kwargs: ~P) None
Create new UIO class for processing https scheme.
- Parameters:
**kwargs – Requires ‘uuri’ key to set respective attribute.
- download() None
Download referenced remote object.
Writes the remote object to the local scratch path. If download fails, removes the partially downloaded file.
- get_object() str
Get referenced URL.
- Returns:
The remote URL as a string.
- get_remote_tags() dict | None
Get remote tags associated with the referenced file.
- Returns:
A dictionary containing remotely stored tags, or None if unavailable or decoding fails.
- remote_object_exists() bool
Verify referenced remote object exists.
- Returns:
True if the remote object exists, otherwise False.
- upload(tags: dict | None = None) None
Upload method unsupported for https.
- Parameters:
tags – Tags to write to remote location (ignored).
- Raises:
NotImplementedError – Always raised, as HTTP/S does not support upload.