Skip to:


Latest Additions

You are here: Home > Specifications > Reviews > Review February 2007

IESR Metadata Review February 2007

Ann Apps, 2007-02-09

Decisions on changes are given in italics at the top of each section.

Summary: Status of reviewed metadata

Metadata Changes in this Version

  • 1.1 Collection type vocabulary will become iesr:CollTypeList but retain the same terms
  • 1.4, 1.6 Encoding schemes UDC for subject and LCSH for spatial no longer available
  • 1.7 itemType will have possible additional values ScholarlyText from Eprints Application Profile Type Vocabulary and Party from a new vocabulary iesr:ItemTypeList
  • 1.11, 2.9 seeAlso property added to Collection and Service available to other users of the IESR Application Profile, but not used by IESR
  • 2.1, 2.7 MXG addedd as a Service access method, with compliance levels included in the supportsStandard vocabluary
  • 2.3 New Service type / function list
  • 2.4 New property 'alternative' for Service, for OpenURL resolver only, to capture preferred link text
  • 2.6 Service 'output' dropped from IESR use
  • 4.1 Date modified may include time, but not used by IESR
  • 4.2 New property 'status' added to administrative metadata

1 Collection Metadata

1.1 Collection Type

[Keep current vocabulary terms in an IESR vocabulary]

The DCMI Collection Description Type list has now changed to http://www.dublincore.org/groups/collections/colldesc-type/. It has values: CollectionDescription; AnalyticFindingAid; HierarchicFindingAid; IndexingFindingAid; UnitaryFindingAid. The NISO Metasearch Initiative list that the IESR Application Profile (AP) references is probably still current but I suspect not stable (i.e. will move to the DCMI one). Our current list has values: Catalogue; HierarchicFindingAid; Index. In the actual data several existing records are Catalogue, one is HierarchicFindingAid. To change this would be confusing for contributors and users unless they are conversant with library/archive terminology.

Thus, I suggest we define an IESR vocabulary with the current terms. This will also provide later extensibility and flexibility.

There was a suggestion of adding Repository (or ScholarlyRepository) and Registry. But these are not really collection types. They could be viewed as just databases, ie. collection and catalogue respectively. In current usage the terms imply a system with certain functionality. This may be a composite service type that is made up of the various service interfaces, or functions, recorded in IESR. Capturing the composite system function is not currently possible in IESR and may be out of scope. Thus a repository or registry would be described as a collection or catalogue with: appropriate item types; various service functions; and further, finer-grained detail captured within the text of its title or description. For example, IESR itself is a collection of type Catalogue with item types Collection, Service and Party.

1.2 Collection Type

[Yes]

Add a comment in the AP about our conflation of catalogue / collection (from last review 1.6):

Note that theoretically a catalogue has records that are metadata (and therefore text). Thus, for example when describing a catalogue into an image collection, to indicate a Collection of type 'Catalogue' with 'itemType=Image' is strictly incorrect. In a perfect world the catalogue and the collection would be modelled as two separate collections with a 'describes' relation. However, to take a pragmatic approach, acknowledging that IESR is a practical application, such modelling would be over complicated for understanding by many users. The information that they actually want to know is that the catalogue will provide them access to images. Therefore we decided to conflate the two collections and allow Collections with, eg. 'type=Catalogue' and 'itemType=Image'.

1.3 Collection Strength

[Don't include this property]

Inclusion of this property has been suggested, in particular in relation to vocabularies. It could use the NISO MI term subjectCompleteness, a repeatable term, whose value is a pair of subject (LCSH) and conspectus level (1-6). However it is not obvious how to capture a pair within the flat IESR structure. This would be yet another additional field to add to the work of contributors. Is it likely that discovery would be to this precision? I suggest this not be included at present (but it is recorded here to show it has been considered).

1.4 Subject

[Drop subject vocabulary UDC.]

We may have allowed too many subject vocabularies, but this is a decision difficult to reverse. However, UDC has not yet been used and I propose removing it.

1.5 Subject

[Yes]

Add a comment in AP about using subject terms from Use Cases 3.4a1, 4.1 and 5.1: granularity and using `words'.

1.6 Spatial

[Drop spatial vocabulary LCSH]

We may have allowed too many spatial vocabularies, but this is a decision difficult to reverse. However, LCSH has not yet been used, and is probably inappropriate, so I propose removing it.

1.7 Item Type

[Add 'ScholarlyText' from Eprints and 'Party' from an IESR vocabulary as possible values of itemType]

Some of the Use Scenarios imply additional item types are necessary. These are mainly bibliographic, except Use Scenario 1.5 which would expect music scores. Possibilities:

  • Allow the Eprints Application Profile Type Vocabulary as a scheme. This would give the bibliographic item genres (but not music, though I expect they'll need it also...). It doesn't include bibliographic citation as a type, but maybe `collection type = catalogue' and `item type = scholarly text' is equivalent. This list has not been formally endorsed, thus it may not be stable (and resource genre is a slippery slope!).
  • Allow just the single top-level term `scholarly text' from the Eprints Type Vocabulary. This seems a better option than the previous one to keep the list of possible item types at a manageable level for data entry
  • MODS/MARC has a genre list at http://www.loc.gov/marc/sourcecode/genre/genrelist.html.
  • MESH includes bibliographic terms such as `journal article' but again not music scores.
  • Introduce a collection type of ScholarlyRepository and stick with DCMIType only as a scheme for itemType. This is no longer proposed - see 1.1

There is also a need for an Agent or Party type, e.g. IESR has records of this type. This suggests adding another optional encoding scheme for itemType, an IESR-defined one, with initially a single term 'Party'. Type 'Party' is already included in Agent records supplied by IESR in simple Dublin Core.

1.8 Extent

[Defer until next Review. Await examples of real use]

The units or type of content of this field is not defined for machine reading. It could be number of items or storage size. I suggest adding an encoding scheme to indicate: number of items; storage size in megabytes, kilobytes, gigabytes. If no encoding scheme is given, the value is free text, so all existing values are valid. The Editor will need extending to implement this. (DCMI Collection Description were intending to consider this issue, but it appears to be no longer on the list.)

1.9 Dates

[Wait for DC-Date proposals.] (Deferred from last review)

B.C.E, geological, approximate, questionable.
We should wait for DC-Date Working Group to make proposals on this. Until then we use some ad hoc guidelines.

1.10 Education Level

[Add suggested comment to AP. No change to property values]

The only vocabulary we have is UK-MEG which may not be stable and is very UK-specific. A comment in the AP should indicate that this encoding scheme is optional outside the UK.

It has been suggested that ideally we should use URIs for the values rather than textual tokens (UK Education Level n), with scheme `dcterms:URI' rather than `iesr:UKEL'. This would require a change to the existing data. Our current usage is consistent with linked documentation about UKEL.

1.11 See Also

[Add property to AP, with comment that it is not used by IESR]

An additional property that could capture arbitrary further information and properties about a collection relevant for a particular domain or application. This field will not be used by IESR or available through the IESR Editor, although it may be imported in harvested data.

2 Service Metadata

2.1 Service Access Method (Protocol)

[Add MXG as a Service protocol]

Add to Access Method Vocabulary: MXG - NISO Metasearch XML Gateway (http://www.niso.org/standards/resources/RP-2006-02.pdf). This also needs compliance levels (Level 1, Level 2, Level 3), which can be included in supportsStandard (see 2.7).

2.2 Service Access Method (Protocol)

[No]

There was a request from a stakeholder for a `client/server' protocol. This would reflect that the locator URL is a jumping off point for an application. However, as far as a user, and certainly a machine user, is concerned, the locator URL is simply a web page. It has no functionality until a further action is selected. It appears that this application cannot be called directly without clicking on the web page, so it is not a machine interface.

2.3 Service Type

[Add suggested list]

The current list needs revision and expansion. Several items are really system (composite service) types (those that end `registry'). There is only one instance currently in IESR: `Alert' for Zetoc. I suggest we revise this list. I also suggest we relabel it `service function' (as used by ISO 2146).

We could make use of the registry services functions list proposed for ISO 2146 (There doesn't appear to be a standard to reference yet, but they are listed in http://www.nla.gov.au/nla/staffpaper/2005/pearce1.html). This list is:
Common Services: Authenticate; Authorise; Pay
Metadata Services: Contribute; Save; Alert; Harvest
Discovery Services: Find; Locate; Request
Delivery Services: Resolve; Supply; Lend; Reserve
User Services: Register; Ask; Personalise; Monitor

There is also a much longer, finer-grained (in places) service genre list within the e-Framework, but many items are very e-learning specific (and the length would probably confuse contributors and users): http://www.e-framework.org/Services/Genres/ServiceGenreRegistry/tabid/655/Default.aspx

However we may need some additions to the ISO 2146 list. Terminology is on our current list - e-Framework has `Map terms', the generalistion of which would be Map. Use Case step 6.1a1 / scenario is looking for a `format validation' service, of which the generalisation would be Validate (the Use Case could find `format' in the description). e-Framework has `Validate Courses' but that is too specific. Some others from e-Framework which seem significant: Annotate; Archive; Rate (which would include: assess, classify, recommend); Translate.

This makes a suggested list (which would be annotated with provenance in documentation):

  • Alert
  • Annotate
  • Archive
  • Ask
  • Authenticate
  • Authorise
  • Contribute
  • Find
  • Harvest
  • Lend
  • Locate
  • Map
  • Monitor
  • Pay
  • Personalise
  • Rate
  • Register
  • Request
  • Reserve
  • Resolve
  • Save
  • Supply
  • Translate
  • Validate

Although it would be preferable to use an existing standard list, there does not seem to be a suitable one. The advantage of maintaining our own list is that it gives us flexibility for future extensibility.

This property will remain repeatable, though in most cases only one option would be used.

Some service types could be filled automatically by the Editor for particular service protocols:
ftp: Request; Contribute
ldap: Locate
oai-pmh: Harvest; Request
opengis: Find
opensearch: Find
openurl: Locate
rss: Alert
rsync: Request
soap: Request
sru: Find
srw: Find
webcgi: Find
z3950: Find

Correlation with other digital library work in this area:
Obtain; Get: Request
Put: Contribute
Search: Find, and in some cases also Request (eg. Z39.50 Search/Retrieve)
Publish-Subscribe: Alert

2.4 Alternative (Title)

[Add property 'alternative' for OpenURL resolvers]

New property, specifically for OpenURL resolver, at least initially. This will capture a preferred `link text' to be used when displaying an OpenURL link to the resolver. An image button can be specified using logo. Occurrence: min 0, max 1. Indicated by Use Scenario 2.3.

It has been suggested that vendor and version of the software that implements an OpenURL Resolver is useful information. However capturing this detail seems to be beyond the scope of IESR. It could be included in the description, or in seeAlso if machine-readable XML.

2.5 Interface information for OAI-PMH services

[Add interface property for OAI-PMH]

The PerX comments suggested that we should provide more description of OAI-PMH services. We've previously been of the opinion that users could simply make an Identify request on the service. But if the user is human this involves extra steps in constructing this request from information about the service currently on our website. I suggest we introduce two `interface' properties for OAI-PMH services. The value of the first will be an Identify request and the value of the second a ListMetadataFormats request to the service. It will be constructed automatically by IESR when the service details are entered in the Editor. This should give sufficient information. We could include details like sets support and metadata formats in IESR but this seems like redundant information (unless there were a real use case indicating a discovery requirement). This should also satisfy Use Case step 1.4b1a1 because a using application could retrieve this interface/identify XML file to determine whether the Eprints AP metadata format is supported.

2.6 Output

[Drop Output from IESR use]

This was introduced to capture output from services such as webcgi and OpenURL, primarily to distinguish those that produce XML. This detail is now captured for webcgi in the WSDL file. We do not have a means of capturing it for OpenURL but this is part of a wider issue that probably would need more than this simple `output' field (see below) (the naive presumption is that OpenURL returns an HTML page). Within existing IESR data this field has been used mainly by webpage services to indicate their output is HTML (the vast majority) or XHTML (a small minority). There have previously been requests to describe the output formats of Z39.50 and more recently metadata formats for OAI-PMH (see 2.5), but these are captured within the `interface'. Only a real use case for discovery of output type would indicate a need for a specific property (and it is not currently a searchable property anyway). I suggest this property be dropped for IESR. It will remain in the Application Profile in case it is used by other users of the IESR schema.

The OpenURL standard does not define anything about the expected response from a resolver, specifically stating such detail as being out of scope, the standard not being a protocol. There is currently some discussion, including within the OpenURL Maintenance Agency Advisory Committee, about whether and how to capture such information. There are continuing requests for a standard response format indicating a need. If and when such a profile were defined, IESR could point to conformant records via the interface property.

2.7 SupportsStandard

[Make changes to vocabulary]

Some of the vocabularies contain entries that are probably not used. The lists may be confusing for data entry.

  • OAI-PMH: retain oai-pmh-2_0 only. Set this as default in the Editor (but allow Contributor to delete).
  • RSS: collapse all versions 0.9x into a single entry (rss-0_9x). Include Atom as an option (http://tools.ietf.org/html/rfc4287).
  • MXG: add compliance levels (see 2.1)

2.8 Access Control

[Add to vocabulary when needed]

Shibboleth version 2.0 is expected in about 6 months time. There will probably be a need to distinguish Shibboleth versions. The easiest way to do this will be to add a new term to the AccessCtrl vocabulary `shibboleth2_0' when it becomes available.

2.9 See Also

[Add property to AP, with comment that it is not used by IESR]

Add a further option with no defined encoding scheme (ie scheme `dcterms:URI). This could capture arbitrary further information and properties about a service relevant to a particular domain or application. This field will not be used by IESR or available through the IESR Editor, although it may be imported in harvested data.

3 Agent Metadata

3.1 Email

[Make change in AP. This has already been implemented in the IESR Editor and the Web interface]

Currently this is required for a service administrator, which for practical implementation means it is effectively required for all agents. However some agents do not have public contact emails. Alternatively service administrators probably do have a web page feedback form. It is proposed to make Agent email optional and to allow URI as a data type in addition to email. The label in the Web interface and in the Editor will become `Agent Contact' (was Agent Email).

This affects Use Scenario 6.1, if the agent has no registered contact email. Thus Contributors should still be encouraged to register an Agent contact email where possible. If a service such as Use Scenario 6.1 is run by IESR, IESR will use the Contributor contact email to report a problem if there is no Service Administrator email. Services external to IESR will be able to contact IESR only, with the presumption that IESR staff will forward the report to the Contributor contact email.

3.2 See Also

[Add comment to AP, plus a comment that it is not used by IESR]

Add a comment in AP about its purpose: that it could capture arbitrary further information about an agent relevant to a particular domain or application. This field will not be used by IESR and will no longer be available through the IESR Editor (it isn't currently used), although it may be imported in harvested data.

4 Administrative Metadata

4.1 Date Modified

[Make change to AP]

Allow the inclusion of time in the date value in the general case. Restrict to date only for IESR use. This was requested by aDORe.

4.2 Status

[Add this property and a vocabulary]

A new property, to indicate currency of a record. Suggested values:

  • Created - new record
  • Approved - checked by IESR staff
  • Updated - edited, or checked as OK, by Contributor
  • ForReview - Contributor has been asked to review record
  • Deprecated - not updated for a time period after request for review (6 months?)
  • Harvested - created by machine ingest

In all cases `date modified' will show when the latest action happened.

For IESR: a single value from the above list and mandatory. However this may be revised in the future if a need for another vocabulary becomes apparent, e.g. for capturing some authority stamp.
In general: optional, repeatable and other values allowed.

Existing IESR records will be set to `updated'. All values will be generated automatically by the Editor, except `Approved' which will require an IESR admin setting (and Updated will need a Contributor to at least enter the Editor and click `Save'). `Harvested' status will not change unless a record is changed by manual updating.

This property is needed as part of the QA procedure. It is also indicated by Use Case steps 1.4e1, 7.6 and 8.4.

Other values from other vocabularies could be added by other users of the IESR metadata, eg. appropriate, suggested, reviewed, peer reviewed, local collection type rating. It could also indicate that a record has been `signed' as valid, official or appropriate by some authority, but such usage is outside the scope of IESR.

4.3 Source

[Make this change]

Change the use of this property (see 4.4) and remove it from population by Contributors. There are currently no instances in the data.

4.4 Source

[Make this change]

Use this field to capture provenance information for harvested data. For data gathered by OAI-PMH this will be an XML record according to the OAI-PMH provenance XML schema. It will retrievable via an IESR URI disseminated as the value of `dc:source', except for OAI-PMH records where it will be disseminated as provenance. Occurrence will become: min 0, max 1.

5 Application Profile

[All these changes to the AP will be made]

  • Update AP to show standard definitions and comments, eg DC ones, as well as IESR-specific ones.
  • Check and update AP with corrections. In particular:
    • Show that admeta is not contained within an entity. (This is an error in the AP left over from the original definition.)
    • Webcgi service arguments are now captured as WSDL
    • Some links need updating (eg. all DCMI Collection Description AP is now on DCMI website)
  • Make AP less restrictive on required properties to allow its use for other purposes (eg aDORe). Add another field to show IESR-specific requirements. Fields no longer required for general use, but mandatory for IESR:
    • Collection: subject, owner
    • Service: accessRights, administrator
    • Agent
    • Administrative: status (see 4.2)
  • Add comments to indicate mapping from some IESR properties, in particular RSLP ones, to DCMI Collection Description AP.

6 XML Descriptions

[No change to IESR XML. DC-Text and DC-in-XML examples will be produced]

Deferred in last review: The IESR XML format may need updating to be consistent with proposed new guidelines for DC-in-XML that conform to the DC Abstract Model.

It seems better not to update the IESR XML format because of the impact on existing users that such a drastic change would have.

It would be instructive to create an example DC-Text description, and possibly an example DC-in-XML version to illustrate the mapping. Although DC-in-XML is almost specified it has not yet been finally endorsed.

A future possibility could be to offer alternative formats, eg oai_iesr and oai_iesr_new

7 IESR Interfaces

[These will be added to the development list]

  • Add a separate Dewey search box in the Web interface to allow searching specifically using Dewey. Other subject box will search all subject terms as now
  • For Z39.50 have a separate index for Dewey (Bib-1 13) (Use Case step 1.3)
  • Add an OpenSearch interface

[This has been changed to 200 after advice from oai-implementers listserv and experiments]

  • The OAI-PMH interface currently returns a maximum of 25 records at a time. Should this be increased?

[These are noted for future development]

  • Add subject `names' to codes (eg Dewey, JACS) in internal metadata, so that searching using `words' will find records with Dewey terms
  • Create a `subject' dictionary and provide a select list for searching

8 Other IESR Services

[Add to software development list]

  • Provide an identifier discovery service. Possibly augment the OpenURL service to return identifier from title and entity type (and maybe service protocol). Suggested by Use Case step 4.1d1 and PerX comments. Possibly the service would return the entire entity record rather than just the identifier.
  • Alert / notification of changed records (from PerX comment about using Web Services). This would notify significant changes only such as new service locator address.
  • Provide an annotation service. This would allow users to provide overlay tagging on top of the canonical IESR data, such as relevance and quality indicators. (This will require investigation into design.)