API

SIP (Submission Information Package)

Module for Submission Information Package (SIP) handling.

class siptools_ng.sip.SIP(mets, files=None)

Class for Submission Information Package (SIP) handling.

Note

To create a SIP from existing directory or files use SIP.from_directory() or SIP.from_files().

Constructor for SIP class.

Parameters:
  • mets (METS) – METS object representing the METS file of this SIP.

  • files (Optional[Iterable[File]]) – Files to be added into the SIP.

add_metadata(metadata)

Add an iterable of metadata to SIP.

The metadata is applied to all files of the SIP. Technically the metadata is added to root div of default structural map.

If metadata is imported metadata, also an event that describes the import process is created.

Parameters:

metadata (Iterable[Metadata]) – The iterable of metadata objects that is added.

property default_structural_map

Default structural map.

finalize(output_filepath, sign_key_filepath)

Build the SIP.

The SIP will be built to the given output filepath, packed as a tar file. The SIP will contain the METS object of this SIP object serialized as an XML file, a signature file that signs the serialized METS document, and the digital objects declared in the METS object of this SIP object.

The SIP will appear with ‘.tmp’ suffix until finished.

Parameters:
  • output_filepath (Union[str, Path]) – Path where the SIP is built to.

  • sign_key_filepath (Union[str, Path]) – Path to the signature key file that is used to sign the SIP.

Return type:

None

classmethod from_directory(directory_path, mets)

Generate a SIP object according to the contents of a directory.

All files found in the directory tree are detected and technical metadata generated for the files. Structural map is generated according to the directory structure found in the given directory_path, and simple file references are generated.

Parameters:
  • directory_path (Union[Path, str]) – Path to a local directory.

  • mets (METS) – Initialized METS object. This METS object will be edited in place by this method to represent the files and the directory structure in the given directory_path.

Raises:

ValueError if the given directory_path does not exist or is not a directory.

Return type:

SIP

Returns:

SIP object initialized according to the directory structure in the given path.

classmethod from_files(files, mets)

Generate a complete SIP object from a list of File instances.

Technical metadata is generated for given files if missing, and structural map is generated according to the directory structure defined in the files.

Parameters:
  • mets (METS) – Initialized METS object. The METS object will be populated with additional entries (structural map, agents, events).

  • files (Iterable[File]) – File instances. Technical metadata is automatically generated for those that don’t already have it.

Return type:

SIP

Returns:

SIP object initialized according to the given files

File

Module for handling digital objects in SIP.

class siptools_ng.file.File(path, digital_object_path, metadata=None, identifier=None)

Class for handling digital objects in SIPs.

A mets_builder.digital_object.DigitalObject is created for the given file and is available under the File.digital_object property. This can be used to enrich the underlying METS entry with additional metadata.

Constructor for File.

Parameters:
  • path (Union[str, Path]) – File path of the local source file for this digital object. Symbolic links in the path are resolved.

  • digital_object_path (Union[str, PurePath]) – File path of this digital object in the SIP, relative to the SIP root directory. Note that this can be different than the path in the local filesystem.

  • metadata (Optional[Iterable[Metadata]]) – Iterable of metadata objects that describe this file.

  • identifier (Optional[str]) – Identifier for the digital object. The identifier must be unique in the METS document. If None, the identifier is generated automatically.

add_metadata(metadata=None)

Add metadata to file.

Parameters:

metadata (Optional[Iterable[Metadata]]) – Iterable of metadata objects that describe this file.

Return type:

None

generate_technical_metadata(file_format=None, file_format_version=None, checksum_algorithm=None, checksum=None, file_created_date=None, object_identifier_type=None, object_identifier=None, charset=None, original_name=None, csv_has_header=None, csv_delimiter=None, csv_record_separator=None, csv_quoting_character=None, format_registry_name=None, format_registry_key=None, creating_application=None, creating_application_version=None, scraper_result=None)

Generate technical metadata for the digital object.

Scrapes the file found in File.path, turning the scraped information into a mets_builder.metadata.TechnicalFileObjectMetadata object, and finally adds the metadata to this digital object.

The metadata is overridden or enriched with the user-given predefined values, whenever provided. It is possible, however, to provide no predefined values at all and use only scraped values.

Also file type specific technical metadata object is created and added to the digital object.

Parameters:
  • file_format (Optional[str]) – Overrides scraped file format of the object with a predefined value. Mimetype of the file, e.g. ‘image/tiff’. When set, predef_file_format_version has to be set as well.

  • file_format_version (Optional[str]) – Overrides scraped file format version of the object with a predefined value. Version number of the file format, e.g. ‘1.2’. When set, predef_file_format has to be set as well.

  • checksum_algorithm (Union[ChecksumAlgorithm, str, None]) – Overrides scraped checksum algorithm of the object with a predefined value. The specific algorithm used to construct the checksum for the digital object. If given as string, the value is cast to mets_builder.metadata.ChecksumAlgorithm and results in error if it is not a valid checksum algorithm. The allowed values can be found from ChecksumAlgorithm documentation. When set, predef_checksum has to be set as well.

  • checksum (Optional[str]) – Overrides scraped checksum of the object with a predefined value. The output of the message digest algorithm. When set, predef_checksum_algorithm has to be set as well.

  • file_created_date (Optional[str]) – Overrides scraped file created date of the object with a predefined value. The actual or approximate date and time the object was created. The time information must be expressed using either the ISO-8601 format, or its extended version ISO_8601-2.

  • object_identifier_type (Optional[str]) – Overrides generated object identifier type of the object with a predefined value. Standardized identifier types should be used when possible (e.g., an ISBN for books). When set, predef_object_identifier has to be set as well.

  • object_identifier (Optional[str]) – Overrides generated object identifier of the object with a predefined value. File identifiers should be globally unique. When set, predef_object_identifier_type has to be set as well.

  • charset (Union[Charset, str, None]) – Overrides scraped charset of the object with a predefined value. Character encoding of the file. If given as string, the value is cast to mets_builder.metadata.Charset and results in error if it is not a valid charset. The allowed values can be found from Charset documentation.

  • original_name (Optional[str]) – Overrides scraped original name of the object with a predefined value.

  • csv_has_header (Optional[bool]) – A boolean indicating whether this CSV file has a header row or not. If set as True, the first row of the file is used as header information. If set as False, the header metadata is set as “header1”, “header2”, etc. according to the number of fields in a row.

  • csv_delimiter (Optional[str]) – Overrides the scraped delimiter character(s) with a predefined value. The character or combination of characters that are used to separate fields in the CSV file.

  • csv_record_separator (Optional[str]) – Overrides the scraped record separator character(s) with a predefined value. The character or combination of characters that are used to separate records in the CSV file.

  • csv_quoting_character (Optional[str]) – Overrides the scraped quoting character with a predefined value. The character that is used to encapsulate values in the CSV file. Encapsulated values can include characters that are otherwise treated in a special way, such as the delimiter character.

  • format_registry_name (Optional[str]) – Enriches generated metadata with format registry name. Name identifying a format registry, if a format registry is used to give further information about the file format. When set, format_registry_key has to be set as well.

  • format_registry_key (Optional[str]) – Enriches generated metadata with format registry key. The unique key used to reference an entry for this file format in a format registry. When set, format_registry_name has to be set as well.

  • creating_application (Optional[str]) – Enriches generated metadata with creating application. Software that was used to create this file. When set, creating_application_version has to be set as well.

  • creating_application_version (Optional[str]) – Enriches generated metadata with creating application version. Version of the software that was used to create this file. When set, creating_application has to be set as well.

  • scraper_result (Optional[dict]) – Scraper result to use with this file. When not set a new scraper result will be computed.

Return type:

dict

Returns:

Dictionary containing scraper result data. This value can be passed to File.with_scraper_result() to generate the same metadata without having to scrape the file again.

property metadata: Iterable[Metadata]

Metadata of file.

Returns all metadata that has been added or generated.

property path: Path

Getter for path.

classmethod with_scraper_result(path, digital_object_path, scraper_result, identifier=None)

Constructor for File using previously scraped metadata.

Parameters:
  • path (Union[str, Path]) – File path of the local source file for this digital object. Symbolic links in the path are resolved.

  • digital_object_path (Union[str, Path]) – File path of this digital object in the SIP, relative to the SIP root directory. Note that this can be different than the path in the local filesystem.

  • identifier (Optional[str]) – Identifier for the digital object. The identifier must be unique in the METS document. If None, the identifier is generated automatically.

  • scraper_result (dict) – Previously scraped metadata as returned by File.generate_technical_metadata().

Return type:

None

exception siptools_ng.file.MetadataGenerationError

Error raised when there is an error in metadata generation.

Agent

Module for creating agent metadata.

siptools_ng.agent.get_file_scraper_agent()

Return agent metadata representing file-scraper.

Return type:

DigitalProvenanceAgentMetadata

siptools_ng.agent.get_siptools_ng_agent()

Return agent metadata representing dpres-siptools-ng itself.

Return type:

DigitalProvenanceAgentMetadata