Interopérabilité: 2007

Friday, December 07, 2007

Microsoft and ISO - how to destroy the ISO process

Report from the SC34 convener Martin Bryan is stepping down (after his three year term) as convener of ISO/IEC JTC1 SC34, the place of the OOXML debacle. Says he:

This year WG1 have had another major development that has made it almost impossible to continue with our work within ISO. The influx of P members whose only interest is the fast-tracking of ECMA 376 as ISO 29500 has led to the failure of a number of key ballots.

(ECMA 376 is OOXML), and

The days of open standards development are fast disappearing. Instead we are getting “standardization by corporation”, something I have been fighting against for the 20 years I have served on ISO committees.

So, Microsoft is single-handedly responsible for destroying the standardization efforts of numerous engaged individuals and organizations, as well as the work in an important international standardization organization. I can't find words for the disappointment I feel. Peter Kopelman, take some responsibility for the business ethics of your company!

Friday, November 30, 2007

Semanticizing metadata specifications

Jason Wrage asks:

Would you mind taking a moment to summarize the process of making a specification semantically compatible? I assume that this might entail development of a vocabulary and embedding RDF within the target specification?

That is an excellent question, and something I've spent a few years contemplating. To begin with, there is a huge difference between designing a specification semantically from scratch, and "semanticizing" it after the fact. In general, it depends a lot on the specification at hand, and in particular on things like:

Is the specification based on some form of vocabulary-independent abstract model
Is the specification expressed in some kind of modeling language (UML etc)
Are the entities in the specification explicit?
How does the specification handle identity for the metadata terms?

and so on. I have experiences with semanticizing IEEE LOM, and the answers to the above in the LOM case is:

Not explicitly - but the LOM tree structure is almost an abstract model.
No
No - there are many entities in the model that are not explicit (The Educational category/entity is a major issue)
Tree-based identification such as General.Title

Based on the above, one can start to see the issues:

Tree-based and semantic models don't fit well. We will have to disassemble the tree to semanticize, and then reconstruct it afterwards
No UML model means no alternative to the tree view, so we need to base our decisions on the tree directly.
We will have major headaches trying to identify the entities.
We will need to make sure that information about the position in the hierarchy when introducing new properties. Compare General.Description and Educational.Description - very different semantics.

I wrote in length about the process here. The general method for LOM was:

Isolate properties and objects. The first step involves extracting an object-oriented view of the LOM data model. What LOM elements are objects, and which are relations between objects? This sounds relatively easy, but it's in effect the core of the semantic translation.
Find related Dublin Core elements and encodings. For the LOM case, it was very important to try to reuse existing vocabulary. After having found the relevant Dublin Core elements, the precise relation to the Dublin Core element needed to be defined. There are essentially four ways in which a LOM element might be related to Dublin Core:

By being identical to some Dublin Core Element.
By being a sub-property (=refinement) of a Dublin Core Element.
By being a super-property of a Dublin Core Element
By using literal values that could be specified using a Dubin Core Syntax Encoding Scheme.

Define RDF vocabulary matching your model
Making RDF namespaces available on the web, following vocabulary publishing guidelines

Nowadays, there are a few additional steps that might be interesting.

The Dublin Core Description Set Profile model allows for the construction of application profiles of RDF data, promising syntactic validation of Dublin Core metadata. This is otherwise something that many people miss when going from XML to RDF. A general RDF equivalent is something Alistair Miles has written about.
GRDDL support in your XML formats will allow semantic web clients to extract RDF information from your XML data. With the above vocabularies, such data can be of high quality.

I'm sure there are more things as well. See also the articles linked from this page. Not sure this is summary, but still....

Thursday, November 29, 2007

Copyriot » Some notes on General Rights Management

The most interesting thing I've read about copyright, DRM, piratism etc. in a long time. Copyriot » Some notes on General Rights Management

Tuesday, November 20, 2007

Putting REST into perspective

From T.V Raman at the W3C Tech Plenary XML Applications: 2^W: The Second Coming Of The Web

Where Web 1.0 was about bringing useful content to the Web, Web 2.0 is about building Web artifacts out of Web parts. URLs play a central role in enabling such re-use --- notice that a necessary and sufficient condition for something to exist on the Web is that it be addressable via a URL.

The post puts the REST design principles and the importance of Web Architecture in perspective.

Sunday, November 11, 2007

The XO laptop is out

Buy one for $399, and one is given to a child in the developing world. One Laptop Per Child -- XO Giving Campaign starts tomorrow, and lasts for two weeks.

Explaining knowledge representation

Hmm, finally an interesting introduction to knowledge representation: How To Tell Stuff To A Computer - The Enigmatic Art of Knowledge Representation

Are SSD drives worth the money?

Interesting test run by engadget: Samsung's 64GB SSD: better, faster, stronger - Engadget Conclusion? Definitely better in many scenarios, but not really worth the cost at this point.

Friday, November 09, 2007

Standardizing semantic technologies for learning

(In preparation of the JISC CETIS conference)

As noted by the organizers, the educational technology field is still not very mature when it comes to semantic technology applications. I'd like to define three classes of applications from the looking glass of semantic technology:

Applications that rely on non-semantic technologies (typically, XML) for interoperability (syntactic interoperability)
Applications that work with RDF data for interoperability without caring too much about complex reasoning abilities (semantic interoperability)
Applications that live on the “semantic web”, and use distributed vocabularies and ontologies, and use reasoning capabilities to enhance the user experience (semantic “cooperation”)

It's clear to me that most applications that care about interoperability still fall into Class 1. From where I sit, there are several possible explanations for this situation:

Lack of semantics in the base standards used in the educational technology field, such as IEEE LOM, IMS Learning Design, IMS QTI, ADL SCORM etc. (thus, Class 1)
Focus on LMSs and other “silo”-like vertical systems that feel they have little need for semantic interoperability, and others (thus, not Class 2)
Skepticism based on bad experiences with “intelligent tutoring systems” and other attempts at replacing teachers with intelligent machines (thus, not Class 3)

While it's understandable that Class 3 systems are still not ubiquitous due to their complexity, it still surprises me greatly that so few Class 2 systems are developed. In 2001, I wrote a piece for CETIS on the topic: “The semantic web: How RDF will change learning technology standards”, where I discussed the benefits of Class 2 systems for educational technology. The visions described in that article have still not been fulfilled.

The main points of Class 2 applications, as described in that article, are:

RDF allows a single storage model for very different types of data and schemas. For example, storing meta-data from different specifications in the same database is straightforward. To implement searching that includes dependencies between meta-data expressed in different schemas is simplified.
Reuse of existing meta-data standards is greatly simplified – as RDF has built-in support for merging data
The relationship between terms from different standards can be formalized in a machine-readable manner.
Vocabulary description and usage is straightforward in RDF. With RDF specifications such as SKOS, really powerful
RDF vocabularies can be easily extended and refined, and metadata descriptions can be easily extended thanks to the strong support for merging.

None of the above depends on Class 3 applications to be useful.

So, how do we tackle the apparently difficult step from Class 1 to Class 2 applications? My personal approach has been to deal with the base standards. Here's my Plan for Semantic Interoperability in Educational Technology Specifications.

Make sure the Dublin Core specifications are fully RDF-compatible. Progress:
- Publication of the new Dublin Core Abstract Model in June 2007, making the underlying model of Dublin Core metadata much more semantics-friendly
- Publication of revised Dublin Core terms with much more detailed and machine-processable semantics (domains and ranges) is only weeks away.
- Publication of revised Dublin Core RDF expression closes the gap between Dublin Core applications and RDF applications. Weeks away.
- Publication of revised Dublin Core in (X)HTML uses the W3C GRDDL specification to provide automatic extraction of RDF metadata from (X)HTML documents – a “semantic stylesheet” for HTML. Public Comment in progress.
- Publication of revised Dublin Core in XML with GRDDL support, within a few months
Make sure IEEE LOM is semantics-enabled. Work is in progress in the Joint DCMI/IEEE LTSC Task force to
- Publish a formal specification for an RDF vocabulary for IEEE LOM elements. Spring 2008.
- Publish a formal specification for a mapping from LOM to the Dublin Core Abstract Model, making LOM both DC and RDF-compatible. Spring 2008
- Making sure the LOM XML namespace contains information for GRDDL processors – so that all LOM XML instances can be automatically exposed as RDF. Spring 2008.
Make sure the new ISO Metadata for Learning Resources standard is semantics-enabled
- WARNING: this is currently not happening. Help is appreciated.
Make sure the library world is semantics-enabled. Work is in progress with the new version of the Cataloging Rules – RDA (Resource description and access).
- There is work in progress on defining a RDA-endorsed RDF vocabulary for the RDA metadata properties and vocabularies. This will essentially semanticize the ancient MARC format. During 2008
Spreading the semantics to other specifications. Which ones are the most pressing?

There is one big thing we will lose when moving from XML-based syntactic interoperability to RDF-based semantic interoperability, and that is syntactic quality assurance. Essentially, RDF currently lacks the necessary specifications and tools for performing any form of validation of metadata instances. The recent work within Dublin Core on Description Set Profiles might eventually provide a solution to that problem.

Hopefully, these developments will act as enablers for semantically enhanced educational applications.

Saturday, October 20, 2007

ISO står stilla efter omstridd formatkupp - Computer Sweden

ISO står stilla efter omstridd formatkupp - Computer Sweden Holy Crap! In english: OOXML Payback Time as Global Standards Work in SC 34 "Grinds to a Halt" Well done Microsoft (and ISO itself gets to share the blame).

Friday, October 12, 2007

Time for interoperability!

Two important meetings announced (that's important for me ...) First: Dublin Core Architecture teleconference on October 26 This is going to be heavy on RDF/HTML/XML stuff for Dublin Core Then: Joint IEEE LTSC/DCMI Task Force teleconference on October 31. If you're interested in LOM-DC interoperability, this is where it happens.

Tuesday, October 09, 2007

PRESSMEDDELANDE: IT-utredning stänger öppna standarder

Säg nej till standarder som inte går att använda utan att betala "tull". Från ffii: PRESSMEDDELANDE: IT-utredning stänger öppna standarder

Friday, October 05, 2007

Boycotting MS Office documents

I've officially started my Microsoft office document format boycott. Why? Because of two things: 1. the unavailability of open documentation of the old document formats (.doc, .ppt, etc) 2. the behavior of Microsoft in the ISO process, in particular in Sweden, but elsewhere too, when voting on OOXML, the new, so-called "open" document formats for MS Office. Being engaged myself in standardization processes, I know how this hurts the standardization community. So, I'm not accepting files in Word or Powerpoint format, and I'm only sending ODF and PDF. What about everybody else? Well, either use an office application that supports the ISO standard for office documents, such as Openoffice.org, or use the following plugin for MS Office: http://www.sun.com/software/star/odf_plugin/

Wednesday, October 03, 2007

OpenDisc - open source for windows

The OpenDisc OpenDisc is a free (open source and gratis) CD image that contains a lot of software that I use when I'm windows (from time to time). A great, cheap, gift to friends and family, and a possible first step to Linux.

Tuesday, October 02, 2007

"A short update on Metadata specs"

Phil Barker of CETIS has just posted A short update on Metadata specs Sadly lacking from the summary is any mention of ISO MLR. Not that Phil is to blame - the MLR work is proceeding at quite a distance from the community. It's encouraging to note that KMR is involved in several of the mentioned items - LOM-DC, Harmony and the DCMI Abstract Model... There's more out there that we're involved in - the Description Set Profile work, including the wiki syntax, and the DC-in-RDF expressions come to mind as important.

Wednesday, September 19, 2007

Description Set Profiles

I'm currently working on Dublin Core Description Profiles. The hope is to have a machine-readable constraint language for the DCMI Abstract Model - this is something currently lacking in the DCMI taskgroups. At the same time, there's discussion about what Application Profiles are really intended to convey. Are they just recommendations or can they be seen as constraints on a Description Set (DCAM terminology). The current thinking is that they are actually constraints and that recommendations belong in other forms of documentation. Alistair got really excited about the notion of constraint languages for graph metadata - see his post for details. Expect interesting results coming out of this work. Oh, and Alistair - drop the "Son of DC" name before the DC community hears about it :-)

Interoperability means many things

I like to think about interoperability in terms of invariants - what does not change when the context changes? Interoperability is about defining and ensuring invariance. For metadata, we can therefore have syntactic invariance when syntaxes are produced and consumed in the same way. Semantic invariance is something else entirely and deals with making sure humans and machines interpret metadata in the same way. This blog will focus on my PhD thesis on metadata interoperability and the future of metadata interoperability.

Interopérabilité