Data model¶
The knowledge base consists of information that enable identification and classification of the errors raised by the validator software running a problematic file. In addition to identification and classification, that enable solution discovery, the database will eventually contain analyses and repairs for the problems as the encountered problems are studied.
1. Error¶
JSON Schema ID : http://digitalpreservation.fi/schemas/2025-12/error.schema.ld.json
Description of an error. An error may have many analyses and it may rise from many files. Output example from the validating software may and should present context for the error message and the error message is also most likely found in the output example. Validator, validator version and error message fields should define unique error objects. Objects without set error message are preliminary errors and they are undefined.
Index fields: [‘validator’, ‘validatorVersion’, ‘errorMessage’]
- filesarray of string
List of references to Files.
- analysesarray of string
List of references to Analyses
- validatorstring
Name of the validating software reporting the error.
- validatorVersionarray of string
List of versions of the validating software that produce the error message for the set of the referred files.
- errorMessagestring
The message reported and identified by the validating software. The message can be (should be) exact match but may contain regular expressions if the message contains varying data in the middle of the string, such as a file name or an offset. A certain amount of uncertainty inevitably hovers around when raisin-picking the error message from validator output. The output example field may be used to provide implicit reasoning for the picked string or pattern. Notes field may be used for explicit reasoning.
- type[‘general’, ‘exact’, ‘’]
The error message can (perhaps) be classified either general, exact or unknown. Exact error message may have exact repair solution, general error message needs to be investigated further, has probably many analyses and repairs, and should probably have a more specific error message. Error messages of unknown are not studied yet in such a way.
- outputExamplearray
Additional output from the validator to the error mesasge to give context for later handling and verification. For example the full output from the validating software.
- notesarray
Additional information on the error.
2. Analysis¶
JSON Schema ID : http://digitalpreservation.fi/schemas/2025-09/analysis.schema.ld.json
Description of the problem.
Index fields: [‘@id’, ‘softwareProlem’, ‘fixable’]
- @idstring
Analysis object identifier.
- repairsarray of string
List of reference to Repairs.
- analysisarray of string
The analysis of the error. String items in the array can be presented as paragraphs.
- softwareProblem[‘yes’, ‘no’, ‘’]
Classification whetever the cause of the problem is in the validating software, that is a software bug, or not. Bugs should most likely be handled by the developers.
- significantPropertiesarray of string
The significant properties that are especially taken into account in the analysis. Significant properties may provide a perspective to the problem. String items in the array can be presented as paragraphs.
- fixable[‘yes’, ‘no’, ‘’]
Should a fix be determined for the error based on this analysis?
3. Repair¶
JSON Schema ID : http://digitalpreservation.fi/schemas/2025-11/repair.schema.ld.json
A repair solution for an error based on an analysis.
Index fields: [‘@id’]
- @idstring
Identifier for the repair.
- repairstring
Detailed description of the repair
- headingstring
Heading or a very short description of the repair for an article.
- executionstring
Command execution example for the repair, if there is such.
- effectsstring
Description of how performing the repair affects data.
- justificationstring
Rationale for accepting the repair.
- filesarray of string
Knowledge base file object identifiers that refer to file that were normalised using this solution.
4. File¶
JSON Schema ID : http://digitalpreservation.fi/schemas/2025-09/file.schema.ld.json
Description of a file. File objects describe content from which a problem arises. Although the file may be also valid and have no error object referring to it.
Index fields: [‘checksum’]
- @idstring
Identifier for the file
- sourcestring
Identifier for the source classifying the file.
- descriptionstring
Description of the content.
- wellFormed[True, False, None, ‘virtual’]
Should the file validate as well-formed. ‘null’ value equals to undetermined. ‘virtual’ value refers to virtual or dummy files that will never have a location but connect an error to a file format.
- checksumobject
Checksums of the content in the locations. Two checksums are used to mitigate checksum collisions. The other is faster and the other is more reliable.
- locationarray of string
List of free form descriptions of the file locations to the source. Locations may be URIs (URL preferred), relative file paths or instructions on how to ask for the file.
5. Format¶
JSON Schema ID : http://digitalpreservation.fi/schemas/2026-01/source.schema.ld.json
Description of a file format. The file format classifies file objects and determines the valid form for the data.
Index fields: [‘fileFormat’, ‘version’]
- @idstring
Identifier for the source.
- fileFormatstring
MIME type of the file format.
- versionstring
Version of the file format.
- profilestring
Further classification of the data.
- descriptionstring
Free form description.
- specificationarray of objects
List of objects with location, name or other description of the specification as the value and name for the resource as a key.