Advanced usage

Generating technical metadata

Siptools-ng usually detects file formats correctly, but it might sometimes generate wrong technical metadata due to missing context. For example, a CSV file could be detected as plain text file. The detected file format can be verified as follows:

techmd = next(
    metadata for metadata in file.metadata
    if metadata.metadata_type.value == "technical"
    and metadata.metadata_format.value == "PREMIS:OBJECT"
)
techmd.file_format  # The detected mimetype of the file, for example "text/plain"

If we know that we are importing a CSV file, we can ensure that it is detected correctly by generating technical metadata manually using siptools_ng.file.File.generate_technical_metadata() method:

file.generate_technical_metadata(
    file_format="text/csv", csv_has_header=True
)

Note

Siptools-ng generates the technical metadata with file-scraper. File-scraper also provides a command-line interface that can be used study files without siptools-ng.

Enriching the SIP/files with additional metadata

Both siptools_ng.sip.SIP and siptools_ng.file.File accept metadata using the add_metadata method. This includes all metadata classes available in dpres-mets-builder under mets_builder.metadata.

For example, we might know that one of the files was uploaded into collection management system ArchiveStar. We can add an event for this:

from mets_builder.metadata import DigitalProvenanceEventMetadata, DigitalProvenanceAgentMetadata
from mets_builder.mets import AgentRole

file = File(...)

event = DigitalProvenanceEventMetadata(
    event_type="creation",
    datetime="2024-01-01",
    outcome="success",
    detail=(
        "The file was uploaded into the collection management system ArchiveStar"
    )
)
agent = DigitalProvenanceAgentMetadata(
    name="ArchiveStar",
    agent_type="software",
    version="1.2.0"
)
event.link_agent_metadata(agent, agent_role="executing program")

# Add the event into the file. Agent does not need to be added specifically,
# as it was linked to the event.
file.add_metadata([event])

Linking files to other files

For example, we might have two versions of the same file: the original non-supported file, and version that has been migrated to a supported file format:

source_file = File(
    path="example_files/movie.mov",
    digital_object_path="data/movie.mov"
)
source_file.generate_technical_metadata()
outcome_file = File(
    path="example_files/movie.mkv",
    digital_object_path="data/movie.mkv"
)
outcome_file.generate_technical_metadata()

The non-supported source file will not pass validation in DPS, so the validation of the non-supported file must be skipped:

source_file.digital_object.use = "fi-dpres-no-file-format-validation"

To link the files to each other we create a migration event, which is linked to technical metadata of the files:

event = mets_builder.metadata.DigitalProvenanceEventMetadata(
    event_type = "migration",
    detail = "Normalization of digital object.",
    outcome = "success",
    outcome_detail = ("Source file format has been normalized. Outcome "
                      "object has been created as a result."),
    datetime = "2024-08-14T15:22:00",
)

source_file_techmd = next(
    metadata for metadata in source_file.metadata
    if metadata.metadata_type.value == "technical"
    and metadata.metadata_format.value == "PREMIS:OBJECT"
)
event.link_object_metadata(
    source_file_techmd,
    object_role="source"
)
outcome_file_techmd = next(
    metadata for metadata in outcome_file.metadata
    if metadata.metadata_type.value == "technical"
    and metadata.metadata_format.value == "PREMIS:OBJECT"
)
event.link_object_metadata(
    outcome_file_techmd,
    object_role="outcome"
)

Finally, the event is added to the files:

source_file.add_metadata([event])
outcome_file.add_metadata([event])

Modifying and reading the underlying METS object

In the previous sections, siptools-ng has taken care of adding the requested entries into the underlying METS object.

However, if siptools-ng does not provide the necessary interface for adding certain entries into the METS (eg. custom structural maps), you can access the METS and add them manually. The mets_builder.mets.METS is available via SIP.mets.

For example, to add a structural map, you can do the following:

from mets_builder import StructuralMapDiv, StructuralMap

file1 = File(...)
file2 = File(...)

root_div = StructuralMapDiv(
    "custom_div",
    digital_objects=[
        file1.digital_object,
        file2.digital_object
    ],
)

# Add the custom div to a structural map
structural_map = StructuralMap(root_div=root_div)

# Add the custom structural map to METS and generate file references
mets.add_structural_maps([structural_map])

Warning

Avoid adding or removing files after you have created the SIP instance, as this can cause the state between siptools-ng and mets-builder to diverge.

You can also print the in-progress METS document or write it to a file:

sip = SIP.from_directory(...)

# Print the METS as a string
print(sip.mets.to_xml())

# Write the METS to a file
sip.mets.write("/home/alice/mets.xml")

For more information on the available METS classes, see dpres-mets-builder documentation.