API

Digital objects

Module for classes related to digital objects.

class mets_builder.digital_object.DigitalObject(path, metadata=None, streams=None, identifier=None, use=None)

Class representing a digital object (i.e. a file).

DigitalObject represents a file that is included in the METS document and in the SIP that the METS document describes. A DigitalObject instance should be created for each file in the METS document, and any administrative metadata that describes the file should be added to the corresponding DigitalObject instance. However, any descriptive metadata should be added to a div in a structural map (StructuralMapDiv in a StructuralMap).

Some files may include multiple streams, for example video files may include separate video and audio streams. A DigitalObject instance should be created for the file, whereas the streams should be added to the DigitalObject instance as DigitalObjectStream instances using the DigitalObject class constructor or using the DigitalObject.add_streams() method. Metadata objects describing a stream should be added to the individual stream objects, instead of adding them to the DigitalObject.

Constructor for DigitalObject.

Parameters:
  • path (Union[str, Path]) – File path of this digital object in the SIP, relative to the SIP root directory. Note that this can be different than the path in the local filesystem.

  • metadata (Optional[Iterable[Metadata]]) – Iterable of metadata objects that describe this stream. Note that the metadata should be administrative metadata, and any descriptive metadata of a digital object should be added to a div in a structural map.

  • streams (Optional[Iterable[DigitalObjectStream]]) – Iterable of DigitalObjectStreams, representing the streams of this digital object.

  • identifier (Optional[str]) – Identifier for the digital object. The identifier must be unique in the METS document. If None, the identifier is generated automatically.

  • use (Optional[str]) – USE attribute of file. USE attribute defines usage of file. The recommended controlled vocabulary for attribute: https://digitalpreservation.fi/resources/vocabulary

add_streams(streams)

Add streams to this digital object.

Parameters:

stream – The iterable containing stream objects that are added to this digital object.

Return type:

None

property path: str

Getter for path.

class mets_builder.digital_object.DigitalObjectBase(metadata=None)

Base class for digital objects and streams of a digital object.

This class is abstract and should not be instantiated.

Constructor for DigitalObjectBase.

Parameters:

metadata (Optional[Iterable[Metadata]]) – Iterable of metadata objects that describe this object. Note that the metadata should be administrative metadata, and any descriptive metadata of a digital object should be added to a div in a structural map.

add_metadata(metadata)

Add administrative metadata to this object.

Any descriptive metadata should be added to a div in structural map (StructuralMapDiv in a StructuralMap)

Parameters:

metadata (Iterable[Metadata]) – The iterable containing metadata objects that are added to this object.

Raises:

ValueError – If the given metadata is descriptive metadata.

Return type:

None

class mets_builder.digital_object.DigitalObjectStream(metadata=None)

Class representing a stream in a digital object.

Some files may include multiple streams, for example video files may include separate video and audio streams. A DigitalObject instance should be created for the file, whereas the streams should be added to the DigitalObject instance as DigitalObjectStream instances using the DigitalObject class constructor or the DigitalObject.add_streams() method. Metadata objects describing a stream should be added to the individual stream objects, instead of adding them to the DigitalObject.

Constructor for DigitalObjectStream.

Parameters:

metadata (Optional[Iterable[Metadata]]) – Iterable of metadata objects that describe this stream. Note that the metadata should be administrative metadata, and any descriptive metadata of a stream should be added to a div in a structural map.

File references

Module for classes related to file references (METS fileSec).

class mets_builder.file_references.FileGroup(use=None, digital_objects=None)

Class representing fileGrp element in METS.

Can be used to group files together in METS file references, and describing the purpose of the files in the group. File references must have at least one file group.

Constructor for FileGroup.

Parameters:
add_digital_objects(digital_objects)

Add digital objects to this file group.

Parameters:

digital_object – The iterable containing digital objects to add to the file group.

Return type:

None

class mets_builder.file_references.FileReferences(file_groups=None)

Class representing fileSec element in METS.

The purpose of the fileSec element and this class is to link metadata to the digital objects they describe. This is achieved here using DigitalObject instances to represent individual files, adding relevant metadata objects to the corresponding DigitalObjects, and finally creating a FileReferences object out of the DigitalObjects.

In file references digital objects can be grouped to file groups using a FileGroup object. A file group can have a ‘use’ attribute that describes the purpose of the files in the group, and the files are grouped together in METS file references under fileGrp elements. There must be at least one file group in the file references.

If no special structure for file references are needed, they can be generated automatically using FileReferences.generate_file_references() class method.

Constructor for FileReferences.

Parameters:

file_groups (Optional[Iterable[FileGroup]]) – The file groups of these file references.

add_file_groups(file_groups)

Add a file groups to these file references.

Parameters:

file_group – The iterable containing FileGroup instances to add to this FileReferences instance.

Return type:

None

classmethod generate_file_references(digital_objects)

A shortcut method for generating simple file references.

Returns a FileReferences instance where given digital objects have been grouped into a single file group.

Parameters:

digital_objects (Iterable[DigitalObject]) – The DigitalObject instances to include in the file references.

Metadata

mets_builder.metadata default imports.

class mets_builder.metadata.Charset(value)

Enum of allowed character encodings.

class mets_builder.metadata.ChecksumAlgorithm(value)

Enum for allowed checksum algorithms.

class mets_builder.metadata.ComparableMixin

Mixin that makes most classes comparable as-is. This means that class instances with identical data will be evaluated as identical: for example, sets will only accept one instance with the correct metadata.

Some classes might contain non-hashable fields. In such case, override _vars to return a copy of metadata object’s variables where they are all hashable. In most cases, this means converting lists to tuples.

class mets_builder.metadata.DigitalProvenanceAgentMetadata(name, agent_type, version=None, note=None, agent_identifier_type=None, agent_identifier=None, **kwargs)

Class for creating digital provenance agent metadata.

The Agent entity aggregates information about attributes or characteristics of agents (persons, organizations, or software) associated with rights management and preservation events in the life of a data object. Agent information serves to identify an agent unambiguously from all other Agent entities.

Constructor for DigitalProvenanceAgentMetadata class.

For advanced configurations keyword arguments for Metadata class can be given here as well. Look Metadata documentation for more information.

Parameters:
  • name (str) – Name of the agent.

  • agent_type (Union[DigitalProvenanceAgentType, str]) – The type of this agent, given as DigitalProvenanceAgentType enum or string. If given as string, the value is cast to DigitalProvenanceAgentType and results in error if it is not a valid digital provenance agent type. The allowed values can be found from DigitalProvenanceAgentType documentation.

  • version (Optional[str]) – The version of the agent. Does not have effect if agent type is not ‘software’ or ‘hardware’.

  • note (Optional[str]) – Additional information about the agent.

  • agent_identifier_type (Optional[str]) – Type of agent identifier.

  • agent_identifier (Optional[str]) – The agent identifier value. If not given by the user, agent identifier is generated automatically.

classmethod get_mets_builder_agent()

Get agent metadata representing dpres-mets-builder.

Convenience method for creating agent metadata object that represents this library itself, dpres-mets-builder.

Return type:

DigitalProvenanceAgentMetadata

class mets_builder.metadata.DigitalProvenanceEventMetadata(event_type, detail, outcome, outcome_detail, datetime=None, event_identifier_type=None, event_identifier=None, **kwargs)

Class for creating digital provenance event metadata.

The Event entity aggregates information about an action that involves one or more digital objects.

Constructor for DigitalProvenanceEventMetadata class.

For advanced configurations keyword arguments for Metadata class can be given here as well. Look Metadata documentation for more information.

Parameters:
  • event_type (str) – A categorization of the nature of the event.

  • detail (str) – Additional information about the event.

  • outcome (Union[EventOutcome, str]) – A categorization of the overall result of the event in terms of success, partial success, or failure. If given as string, the value is cast to EventOutcome and results in error if it is not a valid event outcome. The allowed values can be found from EventOutcome documentation.

  • outcome_detail (str) – A detailed description of the result or product of the event.

  • datetime (Optional[str]) –

    The single date and time, or date and time range, at or during which the event occurred.

    If set to None, the event date will be generated during serialization and will be set to the same date on all metadata objects that have it set to None.

  • event_identifier_type (Optional[str]) – Type of event identifier.

  • event_identifier (Optional[str]) – The event identifier value. If not given by the user, event identifier is generated automatically.

Link a digital provenance agent metadata to this event.

Parameters:
  • agent_metadata (DigitalProvenanceAgentMetadata) – The agent that is associated with this event.

  • agent_role (str) – The role of the agent in relation to this event.

Return type:

None

Link a technical object metadata to this event.

Parameters:
  • object_metadata (TechnicalObjectMetadata) – The object metadata that is associated with this event.

  • object_role (str) – The role of the object in relation to this event.

Return type:

None

property linked_metadata

Return linked metadata i.e. agents and object linked to this event.

Returns:

The set of metadata linked metadata.

property outcome

Getter for outcome.

class mets_builder.metadata.EventOutcome(value)

Enum for valid event outcomes.

ET_ALIA = '(:etal)'

Too numerous to list (et alia).

FAILURE = 'failure'

Unsuccesful outcome.

NONE = '(:none)'

Never had a value, never will.

NULL = '(:null)'

Explicitly and meaningfully empty.

SUCCESS = 'success'

Succesful outcome.

TO_BE_ANNOUNCED = '(:tba)'

To be assigned or announced later.

UNACCESSIBLE = '(:unac)'

Temporarily inaccessible.

UNALLOWED = '(:unal)'

Unallowed, suppressed intentionally.

UNAPPLICABLE = '(:unap)'

Not applicable, makes no sense.

UNAVAILABLE = '(:unav)'

Value unavailable, possibly unknown.

UNKNOWN = '(:unkn)'

Known to be unknown (e.g., Anonymous, Inconnue).

class mets_builder.metadata.ImportedMetadata(metadata_type, metadata_format, format_version, data_path=None, data_string=None, other_format=None, created=None, **kwargs)

Class for importing metadata files.

Note

ImportedMetadata.from_path() or ImportedMetadata.from_string() can be used to automatically detect the correct metadata and construct the ImportedMetadata instance

Constructor for ImportedMetadata class.

Parameters:
  • data_path (Union[str, Path, None]) – Path to the metadata file. Mutually exclusive with data_string.

  • metadata_type (Union[MetadataType, str]) – The type of metadata, given as MetadataType enum or string. If given as string, the value is cast to MetadataType and results in error if it is not a valid metadata type. The allowed values can be found from MetadataType enum documentation.

  • metadata_format (Union[MetadataFormat, str, None]) – The format of the metadata, given as MetadataFormat enum or a string. If given as string, it is cast to MetadataFormat and results in error if it is not a valid metadata format. The allowed values can be found in MetadataFormat enum documentation.

  • format_version (str) – The version number of the used metadata format, given as string.

  • other_format (Optional[str]) – Can be used to define the metadata format, if none of the allowed values in ‘metadata_format’ apply. If set, ‘other_format’ overrides any value given in ‘metadata_format’ with MetadataFormat.OTHER.

  • created (Union[datetime, str, None]) –

    The time the metadata record was created.

    If given as a datetime object, it is interpreted as the precise time of creation.

    If given as a string, it is interpreted as an approximate time the metadata record was created, and has to be given in the extended ISO 8601 format [ISO_8601-1, ISO_8601-2].

    If set to None, the time this object is created is used as the default value.

  • data_string (Optional[str]) – String containing metadata. Mutually exclusive with data_path.

classmethod from_path(path)

Create ImportedMetadata instance from an external XML file.

Metadata type, format and format version will be determined automatically by checking the XML schema in use.

Return type:

ImportedMetadata

classmethod from_string(string)

Create ImportedMetadata instance from an XML string.

Metadata type, format and format version will be determined automatically by checking the XML schema in use.

Return type:

ImportedMetadata

class mets_builder.metadata.Metadata(metadata_type, metadata_format, format_version, other_format=None, identifier=None, created=None)

Base class representing metadata elements in a METS document.

This class is abstract and cannot be instantiated.

Constructor for Metadata class.

Parameters:
  • metadata_type (Union[MetadataType, str]) – The type of metadata, given as MetadataType enum or string. If given as string, the value is cast to MetadataType and results in error if it is not a valid metadata type. The allowed values can be found from MetadataType enum documentation.

  • metadata_format (Union[MetadataFormat, str, None]) – The format of the metadata, given as MetadataFormat enum or a string. If given as string, it is cast to MetadataFormat and results in error if it is not a valid metadata format. The allowed values can be found in MetadataFormat enum documentation.

  • format_version (str) – The version number of the used metadata format, given as string.

  • other_format (Optional[str]) – Can be used to define the metadata format, if none of the allowed values in ‘metadata_format’ apply. If set, ‘other_format’ overrides any value given in ‘metadata_format’ with MetadataFormat.OTHER.

  • identifier (Optional[str]) – Identifier for the metadata element. The identifier must be unique in the METS document. If None, the identifier is generated during serialization.

  • created (Union[datetime, str, None]) –

    The time the metadata record was created.

    If given as a datetime object, it is interpreted as the precise time of creation.

    If given as a string, it is interpreted as an approximate time the metadata record was created, and has to be given in the extended ISO 8601 format [ISO_8601-1, ISO_8601-2].

    If set to None, the metadata creation date will be generated during serialization and will be set to the same date on all metadata objects that have it set to None.

property is_administrative: bool

Tells if this metadata is administrative metadata. All non-descriptive metadata is administrative metadata.

Returns:

True if this metadata is administrative metadata, otherwise False.

property is_descriptive: bool

Tells if this metadata is descriptive metadata.

Returns:

True if this metadata is descriptive metadata, otherwise False.

property linked_metadata

Gets set of linked metadata.

Returns:

The set of metadata linked metadata.

to_xml_element_tree()

Serialize this metadata object to an intermediate XML representation using lxml.

Return type:

_Element

Returns:

The root element of the XML document

class mets_builder.metadata.MetadataFormat(value)

Enum for metadata formats.

DC = 'DC'

DC (Dublin Core), descriptive metadata format

DDI = 'DDI'

DDI (Data Documentation Initiative), descriptive metadata format

EAC_CPF = 'EAC-CPF'

EAD-CPF (Encoded Archival Context for Corporate Bodies, Persons, and Families), descriptive metadata format

EAD = 'EAD'

EAD (Encoded Archival Description), descriptive metadata format

LIDO = 'LIDO'

LIDO (Lightweight Information Describing Objects), descriptive metadata format

MARC = 'MARC'

MARC (Machine-Readable Cataloging), descriptive metadata format

MODS = 'MODS'

MODS (Metadata Object Description Schema), descriptive metadata format

NISOIMG = 'NISOIMG'

NISOIMG, technical metadata format

OTHER = 'OTHER'

Use if none of the other options apply to the metadata format.

PREMIS_AGENT = 'PREMIS:AGENT'

PREMIS:AGENT, digital provenance metadata format

PREMIS_EVENT = 'PREMIS:EVENT'

PREMIS:EVENT, digital provenance metadata format

PREMIS_OBJECT = 'PREMIS:OBJECT'

PREMIS:OBJECT, technical metadata format

VRA = 'VRA'

VRA (Visual Resources Association), descriptive metadata format

class mets_builder.metadata.MetadataType(value)

Enum for metadata types.

DESCRIPTIVE = 'descriptive'

Descriptive metadata

DIGITAL_PROVENANCE = 'digital provenance'

Digital provenance metadata

RIGHTS = 'rights'

Intellectual property rights metadata

SOURCE = 'source'

Source metadata

TECHNICAL = 'technical'

Technical metadata

class mets_builder.metadata.PREMISObjectType(value)

Enum for PREMIS object types.

BITSTREAM = 'bitstream'

Object representing a bitstream, non stand-alone data within a file

FILE = 'file'

Digital item representing a file

REPRESENTATION = 'representation'

Object representing a set of file objects forming one entity

class mets_builder.metadata.TechnicalAudioMetadata(codec_quality, data_rate_mode, audio_data_encoding, bits_per_sample, codec_creator_app, codec_creator_app_version, codec_name, data_rate, sampling_frequency, duration, num_channels, **kwargs)

Class for creating technical metadata for audio files.

Constructor for TechnicalAudioMetadata class.

For advanced configurations keyword arguments for Metadata class can be given here as well. Look Metadata documentation for more information.

Parameters:
  • codec_quality (Union[CodecQuality, str]) – Impact of the compression on quality e.g. ‘lossless’ or ‘lossy’. If given as string, the value is cast to CodecQuality and results in error if it is not a valid codec quality value. The allowed values can be found from CodecQuality documentation.

  • data_rate_mode (Union[DataRateMode, str]) – Indicator whether the data rate is fixed or variable. If given as string, the value is cast to DataRateMode and results in error if it is not a valid data rate mode. The allowed values can be found from DataRateMode documentation.

  • audio_data_encoding (str) – Structure for audio data. If the value is unavailable, ‘(:unav)’ can be used as the value.

  • bits_per_sample (str) – Number of bits per audio sample as a string, e.g. ‘16’, ‘20’, ‘24’, etc. If the value is unavailable, ‘0’ can be used as the value.

  • codec_creator_app (str) – Name of the creator of the compression application. If the value is unavailable, ‘(:unav)’ can be used as the value. If the audio is not compressed, ‘(:unap)’ can be used.

  • codec_creator_app_version (str) – Version of the compression application. If the value is unavailable, ‘(:unav)’ can be used as the value. If the audio is not compressed or the used software doesn’t have versioning, ‘(:unap)’ can be used.

  • codec_name (str) – Name and version (or subtype) of the compression algorithm used, e.g. Frauenhofer 1.0. If the value is unavailable, ‘(:unav)’ can be used as the value. If the audio is not compressed, ‘(:unap)’ can be used.

  • data_rate (str) – Data rate of the audio in an MP3 or other compressed file, expressed in kbps, e.g., ‘64’, ‘128’, ‘256’, etc. Should be an integer value represented as a string. Float values are rounded to integers automatically. If the value is unavailable, ‘0’ can be used as the value.

  • sampling_frequency (str) – Rate at which the audio was sampled, expressed in kHz, e.g., ‘22’, ‘44.1’, ‘48’, ‘96’, etc. If the value is unavailable, ‘0’ can be used as the value.

  • duration (str) – Elapsed time of the entire file, expressed using ISO 8601 syntax. If the value is unavailable, ‘(:unav)’ can be used as the value.

  • num_channels (str) – Number of audio channels as a string, e.g., ‘1’, ‘2’, ‘4’, ‘5’. If the value is unavailable, ‘(:unav)’ can be used as the value.

property codec_quality

Getter for codec_quality.

property data_rate

Getter for data_rate.

property data_rate_mode

Getter for data_rate_mode.

class mets_builder.metadata.TechnicalBitstreamObjectMetadata(file_format, file_format_version, checksum_algorithm=None, checksum=None, object_identifier_type=None, object_identifier=None, charset=None, original_name=None, format_registry_name=None, format_registry_key=None, creating_application=None, creating_application_version=None, **kwargs)

Class for creating technical object metadata for a single bitstream contained within a file, such as an audio or video stream.

The Object entity aggregates information about a digital object held by a preservation repository and describes those characteristics relevant to preservation management.

Constructor for TechnicalBitstreamObjectMetadata.

For advanced configurations keyword arguments for Metadata class can be given here as well. Look Metadata documentation for more information.

Parameters:
  • file_format (str) – Mimetype of the file, e.g. ‘image/tiff’.

  • file_format_version (str) –

    Version number of the file format, e.g. ‘1.2’.

    If given as ‘(:unap)’ (unapplicable), the value will be left out entirely from the serialized metadata.

  • checksum_algorithm (Union[ChecksumAlgorithm, str, None]) – The specific algorithm used to construct the checksum for the digital object. If given as string, the value is cast to ChecksumAlgorithm and results in error if it is not a valid checksum algorithm. The allowed values can be found from ChecksumAlgorithm documentation.

  • checksum (Optional[str]) – The output of the message digest algorithm.

  • file_created_date – The actual or approximate date and time the object was created. The time information must be expressed using either the ISO-8601 format, or its extended version ISO_8601-2.

  • object_identifier_type (Optional[str]) – Type of object identifier. Standardized identifier types should be used when possible (e.g., an ISBN for books). When set, object_identifier has to be set as well.

  • object_identifier (Optional[str]) – The object identifier value. If not given by the user, object identifier is generated automatically. File identifiers should be globally unique. When set, object_identifier_type has to be set as well.

  • charset (Union[Charset, str, None]) – Character encoding of the file. If given as string, the value is cast to Charset and results in error if it is not a valid charset. The allowed values can be found from Charset documentation.

  • original_name (Optional[str]) – Original name of the file.

  • format_registry_name (Optional[str]) – Name identifying a format registry, if a format registry is used to give further information about the file format. When set, format_registry_key has to be set as well.

  • format_registry_key (Optional[str]) – The unique key used to reference an entry for this file format in a format registry. When set, format_registry_name has to be set as well.

  • creating_application (Optional[str]) – Software that was used to create this file. When set, creating_application_version has to be set as well.

  • creating_application_version (Optional[str]) – Version of the software that was used to create this file. When set, creating_application has to be set as well.

property file_format: str

Getter for file_format.

property file_format_version: str

Getter for file_format_version.

class mets_builder.metadata.TechnicalCSVMetadata(filenames, header, charset, delimiter, record_separator, quoting_character, **kwargs)

Class for creating technical metadata for CSV files.

Constructor for TechnicalCSVMetadata class.

For advanced configurations keyword arguments for Metadata class can be given here as well. Look Metadata documentation for more information.

Parameters:
  • filenames (Iterable[str]) – Iterable of names of the files that the metadata describes.

  • header (Iterable[str]) – Header column names of the CSV file given as an iterable of strings.

  • charset (str) – Character set used in the CSV files, e.g. “UTF-8”

  • delimiter (str) – The character or combination of characters that are used to separate fields in the CSV file.

  • record_separator (str) – The character or combination of characters that are used to separate records in the CSV file.

  • quoting_character (Optional[str]) – The character that is used to encapsulate values in the CSV file. Encapsulated values can include characters that are otherwise treated in a special way, such as the delimiter character.

add_files(filenames)

Add files that this metadata describes.

Parameters:

filenames (Iterable[str]) – The names of the files that this metadata describes.

Return type:

None

property filenames: Iterable[str]

Getter for filenames.

class mets_builder.metadata.TechnicalFileObjectMetadata(file_format, file_format_version, checksum_algorithm, checksum, file_created_date, object_identifier_type=None, object_identifier=None, charset=None, original_name=None, format_registry_name=None, format_registry_key=None, creating_application=None, creating_application_version=None, **kwargs)

Class for creating technical object metadata for a single file.

The Object entity aggregates information about a digital object held by a preservation repository and describes those characteristics relevant to preservation management.

Constructor for TechnicalFileObjectMetadata class.

For advanced configurations keyword arguments for Metadata class can be given here as well. Look Metadata documentation for more information.

Parameters:
  • file_format (str) – Mimetype of the file, e.g. ‘image/tiff’.

  • file_format_version (str) –

    Version number of the file format, e.g. ‘1.2’.

    If given as ‘(:unap)’ (unapplicable), the value will be left out entirely from the serialized metadata.

  • checksum_algorithm (Union[ChecksumAlgorithm, str]) – The specific algorithm used to construct the checksum for the digital object. If given as string, the value is cast to ChecksumAlgorithm and results in error if it is not a valid checksum algorithm. The allowed values can be found from ChecksumAlgorithm documentation.

  • checksum (str) – The output of the message digest algorithm.

  • file_created_date (str) – The actual or approximate date and time the object was created. The time information must be expressed using either the ISO-8601 format, or its extended version ISO_8601-2.

  • object_identifier_type (Optional[str]) – Type of object identifier. Standardized identifier types should be used when possible (e.g., an ISBN for books). When set, object_identifier has to be set as well.

  • object_identifier (Optional[str]) – The object identifier value. If not given by the user, object identifier is generated automatically. File identifiers should be globally unique. When set, object_identifier_type has to be set as well.

  • charset (Union[Charset, str, None]) – Character encoding of the file. If given as string, the value is cast to Charset and results in error if it is not a valid charset. The allowed values can be found from Charset documentation.

  • original_name (Optional[str]) – Original name of the file.

  • format_registry_name (Optional[str]) – Name identifying a format registry, if a format registry is used to give further information about the file format. When set, format_registry_key has to be set as well.

  • format_registry_key (Optional[str]) – The unique key used to reference an entry for this file format in a format registry. When set, format_registry_name has to be set as well.

  • creating_application (Optional[str]) – Software that was used to create this file. When set, creating_application_version has to be set as well.

  • creating_application_version (Optional[str]) – Version of the software that was used to create this file. When set, creating_application has to be set as well.

property checksum: str

Getter for checksum.

property checksum_algorithm

Getter for checksum_algorithm.

property file_format: str

Getter for file_format.

property file_format_version: str

Getter for file_format_version.

class mets_builder.metadata.TechnicalImageMetadata(compression, colorspace, width, height, bps_value, bps_unit, samples_per_pixel, mimetype=None, byte_order=None, icc_profile_name=None, **kwargs)

Class for creating technical metadata for image files.

Constructor for TechnicalImageMetadata class.

For advanced configurations keyword arguments for Metadata class can be given here as well. Look Metadata documentation for more information.

Parameters:
  • compression (str) – Compression scheme, e.g. ‘jpeg’ or ‘zip’

  • colorspace (str) – Color space of the image, e.g. ‘rgb’

  • width (str) – Width of the image as pixels.

  • height (str) – Height of the image as pixels.

  • bps_value (str) – Bits per sample.

  • bps_unit (str) – Unit of the bps_value, e.g. ‘integer’

  • samples_per_pixel (str) – Samples per pixel.

  • mimetype (Optional[str]) – File mimetype, e.g. ‘image/tiff’.

  • byte_order (Optional[str]) – Byte order of the file, e.g. ‘little endian’

  • icc_profile_name (Optional[str]) – ICC profile name.

class mets_builder.metadata.TechnicalObjectMetadata(file_format, file_format_version, checksum_algorithm=None, checksum=None, file_created_date=None, object_identifier_type=None, object_identifier=None, charset=None, original_name=None, format_registry_name=None, format_registry_key=None, creating_application=None, creating_application_version=None, **kwargs)

Abstract class for representing technical object metadata. Do not instantiate this directly, use either TechnicalFileObjectMetadata or TechnicalBitstreamObjectMetadata instead!

The Object entity aggregates information about a digital object held by a preservation repository and describes those characteristics relevant to preservation management.

Constructor for TechnicalObjectMetadata abstract class.

For advanced configurations keyword arguments for Metadata class can be given here as well. Look Metadata documentation for more information.

Parameters:
  • file_format (str) – Mimetype of the file, e.g. ‘image/tiff’.

  • file_format_version (str) –

    Version number of the file format, e.g. ‘1.2’.

    If given as ‘(:unap)’ (unapplicable), the value will be left out entirely from the serialized metadata.

  • checksum_algorithm (Union[ChecksumAlgorithm, str, None]) – The specific algorithm used to construct the checksum for the digital object. If given as string, the value is cast to ChecksumAlgorithm and results in error if it is not a valid checksum algorithm. The allowed values can be found from ChecksumAlgorithm documentation.

  • checksum (Optional[str]) – The output of the message digest algorithm.

  • file_created_date (Optional[str]) – The actual or approximate date and time the object was created. The time information must be expressed using either the ISO-8601 format, or its extended version ISO_8601-2.

  • object_identifier_type (Optional[str]) – Type of object identifier. Standardized identifier types should be used when possible (e.g., an ISBN for books). When set, object_identifier has to be set as well.

  • object_identifier (Optional[str]) – The object identifier value. If not given by the user, object identifier is generated automatically. File identifiers should be globally unique. When set, object_identifier_type has to be set as well.

  • charset (Union[Charset, str, None]) – Character encoding of the file. If given as string, the value is cast to Charset and results in error if it is not a valid charset. The allowed values can be found from Charset documentation.

  • original_name (Optional[str]) – Original name of the file.

  • format_registry_name (Optional[str]) – Name identifying a format registry, if a format registry is used to give further information about the file format. When set, format_registry_key has to be set as well.

  • format_registry_key (Optional[str]) – The unique key used to reference an entry for this file format in a format registry. When set, format_registry_name has to be set as well.

  • creating_application (Optional[str]) – Software that was used to create this file. When set, creating_application_version has to be set as well.

  • creating_application_version (Optional[str]) – Version of the software that was used to create this file. When set, creating_application has to be set as well.

add_relationship(technical_object_metadata, relationship_type, relationship_subtype)

Add a relationship to another technical object metadata.

Parameters:
  • technical_object_metadata (TechnicalObjectMetadata) – The technical object metadata object that is linked to this technical object metadata.

  • relationship_type (str) – A high-level categorization of the nature of the relationship.

  • relationship_subtype (str) – A specific characterization of the nature of the relationship.

Return type:

None

property charset

Getter for charset.

property checksum: str

Getter for checksum.

property checksum_algorithm

Getter for checksum_algorithm.

property file_format: str

Getter for file_format.

property file_format_version: str

Getter for file_format_version.

class mets_builder.metadata.TechnicalVideoMetadata(duration, data_rate, bits_per_sample, color, codec_creator_app, codec_creator_app_version, codec_name, codec_quality, data_rate_mode, frame_rate, pixels_horizontal, pixels_vertical, par, dar, sampling, signal_format, sound, **kwargs)

Class for creating technical metadata for video files.

Constructor for TechnicalVideoMetadata class.

For advanced configurations keyword arguments for Metadata class can be given here as well. Look Metadata documentation for more information.

Parameters:
  • duration (str) –

    Elapsed time of the entire file, expressed using ISO 8601 syntax; see http://www.w3.org/TR/NOTE-datetime.

    A value “(:unav)” can be allowed as an unknown value if the information cannot be easily found out.

  • data_rate (str) –

    Data rate of the audio in an MPEG or other compressed file expressed in mbps, e.g., “8”, “12”, “15”, etc.

    A value “0” can be allowed as an unknown value if the information cannot be easily found out.

  • bits_per_sample (str) –

    The number of bits of sample depth, e.g., “8”, “24”, etc.

    A value “0” can be allowed as an unknown value if the information cannot be easily found out.

  • color (Union[Color, str]) –

    Presented color of the digital video file.

    If given as string, the value is cast to Color and results in error if it is not a valid color. The allowed values can be found from Color documentation.

  • codec_creator_app (str) –

    Name of the creator of the compression application e.g. “Adobe Premiere”

    Values “(:unav)” or “(:unap)” can be allowed as an unknown value if the information cannot be easily found out. Use “(:unap)” only for uncompressed video.

  • codec_creator_app_version (str) –

    Version of the compression application e.g. “6.0”

    Values “(:unav)” or “(:unap)” can be allowed as an unknown value if the information cannot be easily found out. Use “(:unap)” only for uncompressed video or for software that does not have versioning.

  • codec_name (str) –

    Name of the compression algorithm used e.g. “MPEG”

    Values “(:unav)” or “(:unap)” can be allowed as an unknown value if the information cannot be easily found out. Use “(:unap)” only for uncompressed video.

  • codec_quality (Union[CodecQuality, str]) –

    Impact of the compression on quality e.g. “lossless” or “lossy”.

    If given as string, the value is cast to CodecQuality and results in error if it is not a valid codec quality value. The allowed values can be found from CodecQuality documentation.

  • data_rate_mode (Union[DataRateMode, str]) –

    Mode of the data rate in a digital video file.

    If given as string, the value is cast to DataRateMode and results in error if it is not a valid data rate mode. The allowed values can be found from DataRateMode documentation.

  • frame_rate (str) –

    The rate of frames displayed in one second (or average rate of frames per second in the case of variable frame-rate). Present as a ratio of time base over frame duration, such as “30000/1001” or as a decimal, such as “29.970”.

    A value “0” can be allowed as an unknown value if the information cannot be easily found out.

  • pixels_horizontal (str) –

    The horizontal dimension of a frame in pixels.

    A value “0” can be allowed as an unknown value if the information cannot be easily found out.

  • pixels_vertical (str) –

    The vertical dimension of a frame in pixels.

    A value “0” can be allowed as an unknown value if the information cannot be easily found out.

  • par (str) –

    Pixel aspect ratio (present as a ratio or decimal).

    A value “0” can be allowed as an unknown value if the information cannot be easily found out.

  • dar (str) –

    Display aspect ratio (present as a ratio or decimal such as “4/3” or “6/9” or “1.33333”).

    Values “(:unav)” or “(:etal)” can be allowed as an unknown value if the information cannot be easily found out.

  • sampling (str) –

    The video sampling format used in a digital video file. (in terms of luminance and chrominance), e.g., “4:2:0”, “4:2:2”, “2:4:4”, etc.

    Values “(:unav)” or “(:unap)” can be allowed as an unknown value if the information cannot be easily found out.

  • signal_format (str) –

    The signal format of a video source item e.g. “NTSC”, “PAL”, “SECAM”.

    Values “(:unav)” or “(:unap)” can be allowed as an unknown value if the information cannot be easily found out.

  • sound (Union[Sound, str]) –

    Indicator of the presence of sound in the video file. If the value “Yes” is selected, then the video file should also be associated with an instance of audioMD (audio metadata).

    If given as string, the value is cast to Sound and results in error if it is not a valid sound value. The allowed values can be found from Sound documentation.

property codec_quality

Getter for codec_quality.

property color

Getter for color.

property data_rate_mode

Getter for data_rate_mode.

property sound

Getter for sound.

METS

Module for METS class representing a METS document.

class mets_builder.mets.AgentRole(value)

Enum for METS agent roles.

ARCHIVIST = 'ARCHIVIST'

The person(s) or institution(s) responsible for the document/collection.

CREATOR = 'CREATOR'

The person(s) or institution(s) responsible for the METS document.

CUSTODIAN = 'CUSTODIAN'

The person(s) or institution(s) charged with the oversight of a document/collection.

DISSEMINATOR = 'DISSEMINATOR'

The person(s) or institution(s) responsible for dissemination functions.

EDITOR = 'EDITOR'

The person(s) or institution(s) that prepares the metadata for encoding.

IPOWNER = 'IPOWNER'

Intellectual Property Owner: The person(s) or institution holding copyright, trade or service marks or other intellectual property rights for the object.

OTHER = 'OTHER'

Use if none of the other options apply to the agent role.

PRESERVATION = 'PRESERVATION'

The person(s) or institution(s) responsible for preservation functions.

class mets_builder.mets.AgentType(value)

Enum for METS agent types.

INDIVIDUAL = 'INDIVIDUAL'

Use if an individual has served as the agent.

ORGANIZATION = 'ORGANIZATION'

Use if an institution, corporate body, association, non-profit enterprise, government, religious body, etc. has served as the agent.

OTHER = 'OTHER'

Use if none of the other options apply to the agent type.

class mets_builder.mets.METS(mets_profile, contract_id, creator_name, creator_type, creator_other_type=None, package_id=None, content_id=None, label=None, create_date=None, last_mod_date=None, record_status=MetsRecordStatus.SUBMISSION, catalog_version='1.7.6', specification='1.7.6')

Class representing a METS document.

Constructor for METS class.

Parameters:
  • mets_profile (Union[MetsProfile, str]) – The METS profile for this METS document, given as MetsProfile enum or string. If given as string, the value is cast to MetsProfile and results in error if it is not a valid mets profile. The allowed values can be found from MetsProfile documentation.

  • contract_id (str) – Contract identifier of a DPS contract to which the package content belongs. Attribute value should be an UUID expressed as a string.

  • creator_name (str) – Name of the person or entity who created the information package.

  • creator_type (Union[AgentType, str]) –

    The type of creator, given as AgentType enum or string. If given as string, the value is cast to AgentType and results in error if it is not a valid agent type. The allowed values can be found from AgentType documentation.

    If none of the AgentType values apply, any other values should be given using the ‘creator_other_type’ attribute.

  • creator_other_type (Optional[str]) – Can be used to describe the creator type, if none of the pre-defined types in ‘creator_type’ attribute apply. If set, ‘creator_other_type’ overrides any value set to ‘creator_type’ with AgentType.OTHER.

  • package_id (Optional[str]) – Organization’s unique identifier for the information package (objid). Attribute value should be expressed in printable US-ASCII characters. If set to None, an UUID is generated as the default value.

  • content_id (Optional[str]) – Identifier for the content in the package. Attribute value should be expressed in printable US-ASCII characters.

  • label (Optional[str]) – Short description of the information package.

  • create_date (Optional[datetime]) – The package creation time with a resolution of one second. If not set, the moment when this METS object is created is used as default value.

  • last_mod_date (Optional[datetime]) – If the package has been modified since the initial creation, the modification time must be expressed with last_mod_date using the same resolution as create_date.

  • record_status (Union[MetsRecordStatus, str]) – The record status of the information package, given as MetsRecordStatus enum or string. If given as string, the value is cast to MetsRecordStatus and results in error if it is not a valid record status. The allowed values can be found from MetsRecordStatus documentation.

  • catalog_version (Optional[str]) – Version number of the schema catalog used when data package is created. If there is no “catalog_version” present, it has to be replaced by “specification” attribute.

  • specification (Optional[str]) – Version number of packaging specification used in creation of data package. Mandatory only when the use of the “catalog_version” attribute is not possible.

Raises:

ValueError – if the given attributes are invalid

Returns:

METS object

add_agent(name, *, agent_role=None, other_role=None, agent_type=None, other_type=None)

Add an agent to the METS object.

Agents are a way to document different peoples’ and parties’ role in making of the information package. These agents will become agent elements in the metsHdr element in the final METS document.

Parameters:
  • name (str) – The name of the agent. For example, if agent type is set as “ORGANIZATION”, the name should be set as the name of the organization.

  • agent_role (Union[AgentRole, str, None]) –

    Specifies the function of the agent with respect to the METS record, given as AgentRole enum or string. If given as string, the value is cast to AgentRole and results in error if it is not a valid agent role. The allowed values can be found from AgentRole documentation.

    If none of the AgentRole values apply, other values should be given using the ‘other_role’ attribute.

  • other_role (Optional[str]) – Can be used to describe the agent role, if none of the pre-defined roles in ‘agent_role’ attribute apply. If set, ‘other_role’ overrides any value set to ‘agent_role’ with AgentRole.OTHER.

  • agent_type (Union[AgentType, str, None]) –

    Specifies the type of agent, given as AgentType enum or string. If given as string, the value is cast to AgentType and results in error if it is not a valid agent type. The allowed values can be found from AgentType documentation.

    If none of the AgentType values apply, other values should be given using the ‘other_type’ attribute.

  • other_type (Optional[str]) – Can be used to describe the agent type, if none of the pre-defined types in ‘agent_type’ attribute apply. If set, ‘other_type’ overrides any value set to ‘agent_type’ with AgentType.OTHER.

Raises:

ValueError – if the given attributes are invalid

Return type:

None

Returns:

None

add_file_references(file_references)

Add file references to this METS.

This will replace any previously added file references.

Parameters:

file_references (FileReferences) – FileReferences instance that is added to this METS.

Return type:

None

add_structural_maps(structural_maps)

Add a structural map to this METS.

Parameters:

structural_map – The StructuralMap instance that is added to this METS.

Return type:

None

property content_id: str | None

Getter for content_id.

property contract_id: str

Getter for contract_id.

property digital_objects: Set[DigitalObject]

Get all digital objects that have been added to this METS via a structural map.

generate_file_references()

Generate file references for this METS.

If no special structure for file references are needed, they can be generated here automatically. The file references are generated out of the digital objects that have been added to this METS instance via structural maps.

This will replace any previously added file references.

Return type:

None

property metadata: Set[Metadata]

Get all metadata that have been added to this METS via digital objects.

property package_id: str

Getter for package_id.

to_xml()

Serialize this METS object into XML-formatted bytestring.

Return type:

bytes

write(output_filepath)

Serialize METS object to XML and write to given file path.

Return type:

None

class mets_builder.mets.MetsProfile(value)

Enum for METS profiles.

CULTURAL_HERITAGE = 'http://digitalpreservation.fi/mets-profiles/cultural-heritage'

Profile for cultural heritage resources.

RESEARCH_DATA = 'http://digitalpreservation.fi/mets-profiles/research-data'

Profile for research data resources.

class mets_builder.mets.MetsRecordStatus(value)

Enum for METS record statuses.

DISSEMINATION = 'dissemination'

The information package is a DIP.

SUBMISSION = 'submission'

The information package is a new SIP. If the package identifier is the same as in some other information package ingested earlier belonging to the same contract, the package will be rejected.

UPDATE = 'update'

The SIP is an updated version of a previous SIP.

Structural map

Module for classes related to structural map (METS structMap).

class mets_builder.structural_map.StructuralMap(root_div, structural_map_type=None, label=None, pid=None, pid_type=None)

Class representing structMap element in METS.

The purpose of structMap element and this class is to organize the digital objects into a structure, and additionally to link metadata to group of files or the entire package, rather than just to individual digital objects.

Structural map provides a means for organizing the digital objects into a coherent hierarchical structure. Such a hierarchical structure can be presented to users to facilitate their comprehension and navigation of the digital content. It can further be applied to any purpose requiring an understanding of the structural relationship of the content files or parts of the content files. The organization may be specified to any level of granularity (intellectual and or physical) that is desired. Since there can be multiple structural maps in a METS document, more than one organization can be applied to the digital content represented by the METS document. The hierarchical structure is achieved here by forming a tree of nested StructuralMapDiv objects, containing DigitalObjects.

In addition to providing a means for organizing content, the structural map provides a mechanism for linking content at any hierarchical level with relevant metadata. This means that by linking metadata to a StructMapDiv, the metadata applies to all DigitalObjects that are included in the div.

Structural map has to contain one and only one root div, that contains the whole structure as further nested divs, with metadata and digital objects linked to them.

Constructor for StructuralMap.

Parameters:
  • root_div (StructuralMapDiv) – StructuralMapDiv that is the root div of this structural map. The structural map has to have one and only one root div, but the root div can contain multiple nested divs.

  • structural_map_type (Optional[str]) – String that identifies the type of structure represented by the structural map. For example, a structural map that represents a purely logical or intellectual structure could be described with value ‘logical’ whereas a structural map that represented a purely physical structure could be described with value ‘physical’. However, the METS schema neither defines nor requires a common vocabulary for this attribute.

  • label (Optional[str]) – String that describes the structural map to viewers of the METS document. This would be useful primarily where more than one structural map is provided for a single object. A descriptive label value, in that case, could clarify to users the purpose of each of the available structural maps.

  • pid (Optional[str]) – Unique identifier of metadata, given as a string. Attribute value should be expressed in printable US-ASCII characters.

  • pid_type (Optional[str]) – Identifier system used in the ‘pid’ attribute, given as a string. Attribute is mandatory if the pid attribute is used. Attribute value should be expressed in printable US-ASCII characters.

property pid: str | None

Getter for pid.

property pid_type: str | None

Getter for pid_type.

class mets_builder.structural_map.StructuralMapDiv(div_type, order=None, label=None, orderlabel=None, metadata=None, divs=None, digital_objects=None)

Class representing a div element in structMap in METS.

The structural divisions of the hierarchical organization provided by a structural map are represented by a StructuralMapDiv division element, which can be nested to any depth. Each division element can represent either an intellectual (logical) division or a physical division.

Any number of DigitalObjects, metadata objects or more divisions can be added to a StructuralMapDiv. When metadata is added to a StructMapDiv, the metadata applies to all DigitalObjects contained in the StructMapDiv.

Constructor for StructuralMapDiv.

Parameters:
  • div_type (str) – A string that specifies the type of structural division that the division element represents. Possible values include: ‘chapter’, ‘article’, ‘page’, ‘track’, ‘segment’, ‘section’ etc. METS places no constraints on the possible values.

  • order (Optional[int]) – A representation of the divison element’s order among its siblings (e.g., its absolute, numeric sequence), given as integer value. For further of the distinction between ‘order’ and ‘orderlabel’ see the description of the ‘orderlabel’ attribute.

  • label (Optional[str]) – A string that is used, for example, to identify a div to an end user viewing the document. Thus a hierarchical arrangement of the div label values could provide a table of contents to the digital content represented by a METS document and facilitate the users’ navigation of the digital object. Note that a div label should be specific to its level in the structural map. In the case of a book with chapters, the book div label should have the book title and the chapter div labels should have the individual chapter titles, rather than having the chapter div labels combine both book title and chapter title. For further of the distinction between ‘label’ and ‘orderlabel’ see the description of the ‘orderlabel’ attribute.

  • orderlabel (Optional[str]) – A string representation of the element’s order among its siblings (e.g., ‘xii’), or of any non-integer native numbering system. It is presumed that this value will still be machine actionable (e.g., it would support ‘go to page ___’ function), and it should not be used as a replacement/substitute for the ‘label’ attribute. To understand the differences between ‘order’, ‘orderlabel’ and ‘label’, imagine a text with 10 Roman numbered pages followed by 10 Arabic numbered pages. Page iii would have an ‘order’ of 3, an ‘orderlabel’ of ‘iii’ and a label of ‘Page iii’, while page 3 would have an ‘order’ of 13, an ‘orderlabel’ of ‘3’ and a ‘label’ of ‘Page 3’.

  • metadata (Optional[Iterable[Metadata]]) – Metadata that applies to all digital objects under this div.

  • divs (Optional[Iterable[StructuralMapDiv]]) – Divisions that this division should be divided further.

  • digital_objects (Optional[Iterable[DigitalObject]]) – Digital objects that belong to this hierarchical division.

add_digital_objects(digital_objects)

Add digital objects to this div.

Note

Note that it is much more performant add multiple digital objects at once, rather than adding them one by one.

Parameters:

digital_objects (Iterable[DigitalObject]) – Iterable of DigitalObjects that are added to this div.

Raises:

ValueError – If any of the given DigitalObjects already exist in the div tree.

Return type:

None

add_divs(divs)

Add a further divisions to this division.

Note

Note that it is much more performant add multiple divs at once, rather than adding divs one by one.

Parameters:

divs (Iterable[StructuralMapDiv]) – An iterable of StructuralMapDivs that are added to this div.

Raises:

ValueError – If the given div already exists or contains a div that exists in the div tree, if the added div already has a parent div, or if any of the added divs contain digital objects that already exists in the div tree.

Return type:

None

add_metadata(metadata)

Add metadata to this div.

The metadata should apply to all digital objects under this div (as well as digital objects under the divs nested in this div)

Parameters:

metadata (Iterable[Metadata]) – The iterable containing metadata objects that are added to this div.

Return type:

None

bundle_metadata()

Bundle shared non-technical metadata to structural map div recursively.

If all child nodes share the same non-technical metadata, be it structural map div or digital object, delete that metadata from the child node and move it to the parent structural map div. The structural map div tree is traversed depth-first in post-order i.e. starting from the leaves and moving towards the root (self). This way the shared metadata can be propagated as close to the root (self) as possible.

property nested_digital_objects: Set[DigitalObject]

Get all digital objects in this div and its nested divs.

property root_div: StructuralMapDiv

Return the root div of this div. Returns self if the div has no parents.