Modeling Documents

date 2023-11-19 

Introduction

When asked how to model a document, you might think it is an InformationObject, so an Individual. In fact it is, but for the most it is modeled as a ClassOfInformationObject. This will be explained below.

Discussion

The newspaper that you take in the morning, or evening, from your mailbox, clearly is a PossibleIndividual. Having said that, it is evenly clearly a member of a Class, simply because your newspaper does have a lot of siblings in other mailboxes, and these siblings all have the same information content. It doesn't make much sense to define the information content for each individual newspaper, so we define the applicable ClassOfInformationObject.

ISO 15926 has a complete set of interrelated entity types that together allow for a precise definition of such a document class.
Here is a model:

The information content is defined by any combination of instances of: If there were a Korean edition of the New York Times the Language would be KOREAN LANGUAGE and the RepresentationForm HANGUL.

The actual information content is shown only genericaly and needs some explanation. In ISO 15926-2 strings are modeled with EXPRESS_string, which has an attribute 'content'. In an RDF setting that is incompatible, and instead we use xsd:string, which actually isn't that much different, because it is modeled as "abcde12345"^^xsd:string, so a string that is a member of xsd:string, as defined in the W3C Recommendation "XML Schema Part 2: Datatypes Second Edition".

The modeling of the information content uses classes as Header, Paragraph, Footer, etc. with templates for structure. This will be detailed in another blog issue.

The ClassOfInformationObject is a subClassOf the information content, see here.

Finally my NYT of Nov.19th 2023 is a member of NYT 2023-11-19.

.