Skip to main content

RO-Crate Quickstart

Practical Perspective of RO-Crate

RO-Crate is essentially a compressed file and a special description stored within it. In general, users of the ARP system do not encounter this compressed file (except for the import/export functionality) or the file containing the metadata description in their work. What they see is the file structure stored within it, which is displayed through the AROMA software component, and the screens used for metadata annotation. To use these, it is worth familiarizing oneself with some important RO-Crate concepts.

Datasets

An RO-Crate data package can be imagined as a file structure. The concept of an RO-Crate dataset corresponds to the elements that give structure to this file structure, namely the directories. The metadata provided for these elements should be interpreted for each file contained within them and for further subdirectories.

Root Dataset

This is the top-level element in the hierarchical file structure of the RO-Crate. Any statements made here (such as metadata) apply to the entire content of the RO-Crate as a whole. The description of this root dataset in the ARP repository system corresponds to the description of repository datasets, meaning the Dataverse dataset description and the RO-Crate root-level metadata of the data package match.

Data Entities

  • Datasets (directories)
  • Files
  • Objects identifiable by remote URIs

Context-Dependent Entities

  • Entities that exist outside the digital world (e.g., people, places)
  • Descriptions primarily existing in the form of metadata (e.g., geographic coordinates)

Metadata

Properties assigned to individual entities. For example, a dataset can have an author, and the author provided in the appropriate format is the metadata for the dataset.

Ontology

The concept of an information ontology refers to a conceptual map or dictionary in which entities identified by unambiguous URIs (through ontology relations) are included within a system. E.g. Such a conceptual system may include the concept of "author," and if everywhere this author concept is used (e.g., in metadata descriptions), it is marked that the data provided corresponds to the author concept according to the given ontology, then computational tools can unambiguously see this property and semantically connect the metadata with its interpretation. A practical example could be that in a given description, the authors of poems are referred to as "poets," while the authors of short stories are referred to as "writers." If both the poet and writer metadata are linked to the author concept according to a specific ontology, it becomes possible for searches for authors to find both poets and writers.

Metadata Schemas

These describe a specific type of entity. They specify which metadata can be assigned to the entity (e.g., a document may have an author, creation date, etc.), their format, which ones are mandatory to provide, and which ones might have multiple values for the same entity. The disambiguation of metadata fields is done using ontologies.

Schema Registry

The schema registry is a repository for collecting metadata schemas. Although it is not explicitly mentioned in the RO-Crate vocabulary, it is worth noting as an external service. The ARP platform includes a schema registry based on Cedar from which, for example, file schemas can be selected.