Scott's Workblog

scott.bradley.wilson@gmail.com


attention!
This blog has moved! Go to my new blog!


January 24, 2011

The Rise of Server-Side JavaScript (SSJS)

For a long time now the established pattern of web development has been to use one programming language for interactivity in the browser (JavaScript) and another for server-side logic and request processing (PHP, JSP, ASP, Java, Python, Ruby…) However we're now seeing the rise of JavaScript as a sort of universal web language encompassing the server side too.

Having a single language makes some sense. For one thing, you can reuse code between browser and server implementations rather than try to map APIs between different languages. You can also happily use JSON as your default serialization. And you only need to learn one language.

Other benefits are perhaps less obvious - for example, in recent years there has been considerable investment made in making JavaScript interpreters as fast as possible to meet the rising complexity of web applications. This has resulted in the screamingly fast V8 JavaScript engine used by Chrome, for example. This provides an infrastructure to create lean, mean JavaScript-based server applications.

Platforms

Node.js is a native JavaScript server application running right up against the OS. Node.js builds directly on Chrome V8 and uses event-driven, asynchronous JavaScript to create a very distinctive development environment. Node.js has very few dependencies, runs very fast, and has a very active community. Most of all, its fun! I ported the Google Wave Gadget API implementation from Apache Wookie over to Node.js in a couple of days (you can download it here). Node also seems to use very little memory regardless of demand - this is due to it using a threadless, event driven server model rather than a more traditional thread pool approach. On the downside, there is some skepticism of the hype around the speed of Node.js. Also, while being very lightweight has its advantages, being so close to the OS makes it harder to manage and track server performance without a lot of Linux Jujitsu: there are, as yet, no simple graphical management tools or utilities for handling deployed applications, for example. However, if you need a platform to prototype demanding real-time applications such as multiplayer games then Node.js is well worth looking at.

Ringo takes a more traditional approach, and uses the Rhino JavaScript engine to run JavaScript applications using Jetty, a Java application server. While Rhino isn't as fast as V8, its usually good enough, and one author has noted some benefits of using the tried-and-tested JVM as the basis for server code as opposed to Node.js's more radical approach. Another plus is that by using Java you get access to tons of Java libraries in your code using Java-JavaScript integration, rather than having to access everything through spawning a console process as in Node. On the other hand, all this Java does weigh a fair amount, and so you won't see the small memory footprints you can acheive with Node.js. If you're a Java developer interested in server-side JavaScript, but think Node.js looks a bit scary - or you just have a ton of Java code you can't face porting - then I think Ringo is a great place to start.

Jack is inspired by Ruby's Rack: it provides an interface between JavaScript and the web server. Jack provides handlers for JavaScript applications to respond to webserver calls building on JSGI in a similar manner to Rack or other lightweight CGI-style frameworks. You can use Jack with Jerry and other Java application servers such as Simple. In many ways, Jack is similar in final deployment to Ringo, and the two are broadly compatible. Jack is probably a good starting point if you want to develop your own specialized server-side javascript framework or middleware.

Nitro is a set of libraries that build on JSGI and Rhino. Rather than a complete platform, Nitro provides components that are useful in building JavaScript frameworks, including templating and parsing engines. The most prominent use of Nitro is AppEngineJS, which lets you run JavaScript servers using Google's app engine infrastructure - so if you want to deploy your application using Google's App Engine, then clearly this is the platform you need to look at.

Opera Unite is a very different kind of system entirely - it uses JavaScript to deploy web services directly from your desktop rather than on a separate server, for example to share music or have your own personal chatroom. Opera Unite services are created using JavaScript with some special extensions, which are then packaged as W3C Widgets. It may not suit every purpose, but it makes sense that a personal web server uses JavaScript as the server programming language, and makes it very easy to create and share simple applications with your friends. Plus it uses Widgets, my other favourite web technology of the moment!

Apache Sling is an example of putting JavaScript on the same footing as other server-side scripting languages such as PHP and JSP. Sling lets you create JavaScript applications and deploy them using its common web application server engine; in fact, "ESP", its implementation of server-side JavaScript looks a lot like JSP. If you want to use server-side JavaScript with a Java content repository (JCR) then Sling is clearly where you need to be looking. (Sling, incidentally, is also at the core of the Sakai 3 LMS).

Standards

The most important emerging standard in this space is the CommonJS initiative. CommonJS defines standard APIs for basic functionality needed by non-browser JavaScript, including module loading, writing to the console, and filesystem access. Most of the platforms described above implement one or more of the existing CommonJS specifications. On the roadmap for CommonJS are areas such as JSGI, Web Sockets, HTTP clients. Eventually we should see a high level of compatibility between applications written in JavaScript for deployment on any of the platforms described above (and the many others I've not looked at) making developing services in JavaScript less dependent on the foibles of any one deployment platform.

In Summary

Server-side JavaScript is an obvious direction for the evolution of web applications and services that ties in well with the developments on the client side of things like HTML5. While many of the current platforms are quite young, there is a very active community, and an active engagement in standardisation, that makes is a promising area for developers to look into. I think also it confirms for me that JavaScript is finally emerging from the scripting ghetto to become recognised as the web's most important programming language - the only language usable in both browsers and servers.

December 14, 2010

Parsing CC license information in different feed formats

This is something I wrote quite a long time ago as part of some help I was doing for the STEEPLE project and as advice to the UK OER programme, however I'm sure other people have run into this same issue and so I'm posting it here. You can see it in use on the Ensemble prototype.

RSS and Atom are a natural way to share lists of educational materials. However, one of the main issues with feed standardisation is the CC license. I've summarised this below.

1. Elements

Currently we have several different elements to choose from when placing our CC license:
  • http://creativecommons.org/ns#license license
  • http://web.resource.org/cc/ license
  • http://backend.userland.com/creativeCommonsRssModule> license
Different feed formats seem to prefer different options here, but its not uncommon to find them mixed up, all of which are valid XML. There is also possible confusion with Copyright, for which there are two elements:
  • RSS 2.0 copyright
  • DC rights
Feeds may sometimes put licensing information in these, which is technically not correct as license != copyright. But it happens.

2. Placement

There is licensing of items and of feeds; for Ensemble the main interest if the feed license as we're dealing with "albums" rather than arbitrary collections. However we will have to deal with situations where items are of mixed licenses (see below).

3. Content

Then there is the issue of what content to use here. I personally don't mind as long as there is a valid CC URI that can be extracted from the text using expressions. Others prefer declaring the license conditions using RDF. There is also some debate as to whether these elements should contain attributes specifying the URI, or the URI should be placed within the text content such as "Licensed under a Creative Commons Attribution - NonCommercial-ShareAlike 2.0 Licence - see http://creativecommons.org/licenses/by-nc-sa/2.0/uk/". I think ultimately we're going to have to agree some common practice here - I suggest making sure the CC URI is somewhere in the text content of the element. Coping with both text and RDF content for CC is way too taxing; also the RDF content I've seen is redundant as it basically sets out what is already meant by the CC URI.

Aggregation Algorithm

Ultimately any aggregator has to handle a lot of variation here, even with some best practice evening things out. Here's my first stab at the algorithm I'll code up for Ensemble: 1. Find a channel-level element that matches any of:
  • http://creativecommons.org/ns#license: license
  • http://web.resource.org/cc/>http://web.resource.org/cc/ : license
  • http://backend.userland.com/creativeCommonsRssModule : license
2. If none of the above found, try:
  • rss2: copyright
  • dc: rights
3. Parse the text content of the elements, and extract any URLs using regular expressions
  1. If there is a single consistent URL and it matches a known CC type, mark against CC dimension for browsing/filtering purposes
  2. If there is a single consistent URL but it does not match a known CC type, mark as "unknown license"
  3. If there are multiple inconsistent URLs,  mark the channel as "mixed licensing"
4. Next, repeat steps 1 & 2 for all the items and extract URLs using regular expressons
  1. If the items have licenses, and they are not all consistent, mark the channel as "mixed licensing"
  2. If the items have licenses, and they are all consistent, and the channel has no license, set the channel license to the value of the item licenses and process as in Step 3
  3. If the items have licenses, and the channel has a license, and these are not all equal, mark the channel as "mixed licensing"
5. If neither channel nor items have any license information - even plaintext with no URLs - mark the channel as "unknown license"

November 23, 2010

MEAOT Project - managing local innovation

This article is part of a series of brief reviews of recent projects I've been asked to write by JISC.

Overview

The MEAOT project at the University of Cambridge took two existing administration tools developed in-house by specific departments and sought to extend their use to other departments.

The key areas of innovation were the two tools themselves, (1) the Teaching Office Database (TODB), (2) the Student Choices, but also the models of development, customization and deployment that were undertaken. In particular the "trunkless" customized development model used for TODB is identified as a significant innovation (3). For this review I'll look both at the tools, but in terms of innovation the methodology for supporting local development is perhaps the most significant aspect of the project to look at.

TODB

TODB stands for "Teaching Office Database" and is a system for managing the allocation and tracking of staff time on common teaching tasks such as supervision, lectures and seminars. Administrators add teaching tasks to the system, which can then calculate approximate workloads. The best way to understand what TODB does is to go and try out the online demo.

TODB is a fairly basic web application in the general vein of time tracking and scheduling applications - this is something that people tend to have to do in a wide range of jobs and industries, so there is a lot of generic software in this category. The key issue with such systems is customising the information collected to fit the types of work and the way it is tracked (e.g. by cost centres or project codes) and to integrate it with other systems that provide data such as rates, codes and so on. However, TODB is not really intended to be a sophisticated and feature rich time/task management system, but rather as a simple database with some web forms that an administrator can customise themselves.

It is also worth looking at the Learning Design Support Environment (formerly the LKL Pedagogic Planner) which is another system for planning teaching activities, from a slightly different viewpoint where the emphasis is on the types of teaching methods being used rather than resource tracking. However the actual tools are quite similar in some basic respects, and it would be interesting to see just how much overlap there might be.

While TODB is certainly not that innovative as a system, it appears to have been successfully adopted at Cambridge, with eight departments evaluating the software, five of which seem committed to using it in production.

One of the reasons that is given for this success in adoption is an approach that encourages departments to freely modify the tool; this is something discussed later in this review under the heading of "trunkless development".

Student Choices

Student Choices emerged from a set of diverse requirements that originated from an in-house student and course management system within the Physics department. However the project quickly determined that there was relatively little scope for use of the tool beyond the original department - partly as it provided functions that were adequately met by existing central systems. It was an interesting consequence of the project that the central student data management team has begun work with the Physics department to ensure they have a solution that also fits the wider requirements of the University for central reporting of student numbers (for example). Though as the report says "However, this action has also rendered our developed software redundant." I think actually this was a very good result!

Trunkless Development approach

One of the main innovations of the MEAOT project is an approach to supporting local innovation " described by the project team in their report is something that she describes as a "trunkless development approach".

Basically what this means is that each department that wanted to use TODB took a copy of the code and then were free to customise and extend it however they wished. This is not quite the same as a typical OSS "trunk and branches" model in that here the core initial code is a prototype with very limited functionality and little attempt made at abstraction. The idea is that each copy of the application is kept very close to the needs and to the understanding of the small group of user-developers so that they can easily rewrite the application if they wish to.

Looked at as a general process, it doesn't really look all that different from any other software process:

  1. Prototype developed by originating user
  2. Other users interested
  3. Central, one-off funding obtained for development
  4. Software is genericised, with a view to making expected range of stakeholder-specific customisations manageable
  5. Software deployed and customised user by user. Users and developers agree priorities for development.
  6. Lessons are incorporated back in to generic version and pushed to previously-addressed departments
  7. Detailed, tested documentation and handover to users

(from MEAOT final report, 2010)

One could take the above description and apply it to anything from Apache to Moodle. However, I think the difference in ethos could be characterised as "folk coding" - rather than the OSS processes that tend to be quite rigorous and applied by professional software developers, this approach was intended for use by departmental staff who dabbled in a bit of PHP and had no experience of or interest in such staples of software development as source control. I think also given the stated dislike of using standardised libraries as dependencies that the term "folk coding" makes a lot more sense than "trunkless development" as it seems to be a rejection of the techniques of modern software development to embrace a more DIY, almost anti-engineering approach to creating applications that are closer to the day to day concerns and problems of user-developers.

Off to do a spot of PHP development at Cambridge

Off to do a spot of PHP development at Cambridge

In that sense its somewhat counterintuitive. My first take on reading "trunkless development" was to think of Github, which is a social coding environment using the Git source control system. In Github, developers create "clones" of interesting code, improve on it, and then allow their derivatives to be merged into its parent project if so desired. Indeed a very similar activity took place within MEAOT, but using a completely manual process:

"Maintaining eight independent branches means each time a change is made to one that might be useful to all, a rather laborious merging process is required to ensure that the new functionality is available in other departments but does not overwrite some other department-specific changes."

Overall the model of "folk coding" to create department-maintained code is interesting, but as the report freely admits brings its own issues. For example, it seems to use more resources than a professional OSS-style software development model, however it distributes the usage of resources across independent departments, who have to then maintain their own variants. In some ways it resembles a software version of the widespread practice of departments keeping their own versions of enterprise data in the form of personal spreadsheets and Access databases - something which is very much frowned on in the wider world (I once worked for a company that provided data services to banks, where this type of practice still occurred, despite being absolutely forbidden on security and privacy grounds). Interestingly, the Student Choices work in this project provided a good counter-example, whereby a local student record system was identified as being potentially problematic, and central student data management have become involved to propose a centrally-supported solution.

The question JISC have asked me to answer in these reviews is basically "how innovative is it?" In this case I'm not quite sure how to answer. In many ways "trunkless development"/"folk coding" could be seen as reviving a set of outdated development practices (if you replace "PHP" with "Visual Basic" you could be looking at a description of common practice circa 1990). However there is innovation here, and that innovation I think is in two parts.

First, there is identifying a potential business case for these applying these kind of "anti-patterns" in areas that are left relatively untouched by enterprise IT investments - business processes that are limited to only a few members of staff, or are a sort of "barely repeatable process" (a term coined by Sigurd Rinde) that is unlikely to be standardised as a formal business process but can benefit from some form of automation, as long as it doesn't take too long or cost too much. (Perhaps MEAOT would have been as well advised to have looked into solutions such as Thingamy as an alternative to relying on departments having people with PHP skills?) However, as the Student Choices experience also shows, in some cases there also need to be interventions to replace local applications with central systems.

There is a lot of "folk coding" already going on in institutions - for example in my own university I've seen quite a few bespoke departmental admin and workflow tools in a range of programming languages; I've even written a few myself (this blog software included). The second innovation introduced by this project was to try to embrace and improve this practice rather than to either ignore it or discourage it. I asked several colleagues in IT services and they had difficulty locating any policies or common practice in this area other than a few vague statements from some institutions discouraging the practice of local application development.

Currently, while there are best practice frameworks that consider application management and application service portfolios (e.g. ASL and ITIL) there is little that is specific to managing local innovation and local application development. I asked Sandra Whittelston, chair of the UK Education Special Interest Group in ITSMF (IT Service Management Forum) about this topic, and she commented that:

"In V3 ITIL we consider operational, tactical and strategic development (of course) and I think this may be useful in approaching this topic (from an apps developement sense) in that trying to control software developed locally (for local reasons) is a challenge. From a business perspective understanding why people do it and the risk of letting it get out of control whilst still allowing flexible handling of information is key." "From the strategic vantage point service portfolio management is crucial as it allows us to see a birds eye view of our whole portfolio and its status. This of course should include locally held systems and services. Live apps should be included in the live part of the portfolio.."

The role of a central IT project in MEAOT for the TODB solutionwas to introduce a kind of "limited professionalism" in terms of keeping common source code under control but not committing to maintaining the in-use code, and to support the requirement process. This in itself is an interesting innovation, and it also makes sense given the ethos at Cambridge of local autonomy.

The team write in their report:

"Criticism of the software development methodology used with the TODB has been leveled at the difficulty in supporting software that, at the time of release, has eight slightly different versions, and may have been independently modified further without knowledge of changes being fed back to the support team. The simple answer is - there is no support burden! The whole point of the trunkless, decentralised model is that departments are self-supporting. This is a 'launch and forget' model. Were such a model used more widely, it would be wonderful for software developers, who traditionally enjoy developing new things and despise the 'housework' of ongoing support. The temptation to compromise the constant focus on local maintainability must be held in check, however."

I think this last sentence is important - if a central IT service team introduce "limited professionalism" into department-based "folk coding" projects, it has to be with clear expectations on all sides. I think this would be one of the challenges to follow up with the team at Cambridge a year on to see if departments have continued to maintain their own variants without drawing in the project team members who presumably will have wanted to move onto other activities.

If I were to make one suggestion to throw into the mix it would be to make a concerted effort as part of the initial intervention to get those involved familiar with Github and its model of social coding, as while it doesn't impose a more structured way of working, it does make the process of merging much less problematic, reducing the effort required to coordinate the different variants. However, there will obviously be a temptation to try to turn "folk coders" into software engineers, which isn't the point at all.

However, when we look across the two systems that MEAOT worked with the approach to application management could be characterised as pragmatic problem-solving rather than a top-down "enterprise architecture"-style approach - either supporting local innovation, or enabling centralisation whichever is appropriate to the situation. I think that given Sandra's comments it might be useful to see if there are lessons learned here that could contribute to the wider understanding of local application management, for example through UCISA's ITIL group and ITSMF.

Photo by Darren W

November 03, 2010

Erewhon: Mobile and location-based services in Oxford

This article is part of a series of brief reviews of recent projects I've been asked to write by JISC.

Overview

The Erewhon project at the University of Oxford investigated the areas of mobile applications, web services and institutional geo-spatial data. The key areas where there were innovation outcomes are in (1) an open-source framework for institutional mobile applications, (2) an open-source solution for managing institutional geo-spatial data, (3) the data set that was collected, and (4) advice to institutions on developing services, managing open data, and identifying a mobile strategy. The advice they've produced seems pretty sound, so I'll concentrate on the other items for this review.

Molly

Molly is an open-source framework developed by the project for exposing institutional web services as mobile web applications. Essentially it is a specialised web framework with mobile-friendly widget-style HTML templates for common campus applications (e.g. library search, maps) backed by connectors for taking data from web services at the institution.

The idea of mobile applications backed by web services is certainly not novel, nor the idea of mobile applications backed by geo-spatial data (these were some of the first mobile applications on on the iPhone platform, for example). Nor is the case for developing mobile applications for universities a new idea; many institutions have been developing mobile applications, either as bespoke in house development (e.g. TVU, Coventry and Northumbria ) or in partnership with specialist companies (e.g. Duke Apps). JISC has also funded other projects to develop mobile applications, for example MyMobile Bristol.

At around the same time as the project was underway, several commercial offerings were developed that also enabled institutions to offer mobile applications connected to their institutional data, so the overall technical model is not itself an innovation - for example the CampusM platform operates in a similar manner, although in this case it offers native mobile applications (e.g. iPhone apps) rather than mobile web applications (see this article on ReadWriteWeb for a brief discussion of native applications on mobile versus web applications targeting mobile browsers).

There are also other offerings based on particular institutional systems, such as Blackboard Mobile Central and Moodle Touch. Molly itself provides connectors for the Sakai VLE.

It is likely that MIS vendors will follow suit and offer suites of mobile applications for their platforms. For example, Sungard is offering a range of mobile applications for its public-sector MIS applications.

It is also worth noting that the MIT Mobile Web framework, a very similar open-source mobile framework emerged at around the same time as the Molly project; from which the Mobile Web OSP community open source project has been developed.

So the key innovation here is not the application itself, but its position as a UK-oriented community open source project as a sustainable alternative to both commercial products and also bespoke development by individual institutions.

This is a strategic intervention, and relies on the adoption of Molly by other institutions to share the costs of developing and maintaining mobile applications and contribute to the sustainability of the Molly project itself. The team have made a good start by taking a community-oriented approach, and have two production implementations (University of Oxford and Oxford Brookes).

The key challenges are promoting an open-source community alternative when vendors will be aggressively pushing their own solutions, and to attract more users and developers to sustain the project. Oxford has good support in place for this type of project, the project team have already been working with OSSWatch, and Oxford have their own deployment of Molly they will want to maintain; all of these are good indicators for sustainability for the near term.

To aid in monitoring progress by Molly I used Ohloh code analysis and this shows that while development is ongoing, there are still a very small number of active contributors. Molly really needs to work hard on engaging a more diverse community of core developers to reduce the dependency of the project on institutional support at Oxford in the longer term.

For the JISC - the funders of this project - the main action points to take forward would be for its advisory services to point institutions looking to develop mobile web applications to the Molly project to consider it as an option, and to consider carefully whether new project proposals involving mobile applications should be encouraged to contribute to the sustainability of Molly in preference to either bespoke development or adopting commercial solutions.

Gaboto

Gaboto is a system for managing geo-spatial data; it emerged from a need identified by the Erewhon project to store and make available geo-spatial data for its mobile applications. In particular a need to tag locations of institutional buildings and resources and to describe connections between locations and resources. The team had already evaluated a number of existing GIS systems and found them unsuitable.

The Gaboto system is now described as follows:

"Gaboto maps first class java objects onto RDF. By this it introduces a layer on top of RDF giving you RDF's flexibility in storing objects, their properties and the relationships between objects while preserving the full power of java objects."

So Gaboto is positioned as a more generic piece of Semantic Web middleware for Java applications rather than by its initial implementation as geo-spatial storage - this may improve its prospects for uptake elsewhere. However there are a large number of Semantic Web frameworks and tools that do approximately the same job: for example Jena, Elmo and Sommer. Gaboto seems positioned as a Java-RDF mapping framework on top of Jena, quite similar to Elmo and Sommer, with some pre-defined ontologies for geo-spatial and temporal data.

A good overview of the problem space that Gaboto addresses can be found here.

Clearly the Erewhon team felt that existing solutions in this space had some drawbacks for them; however its not terribly clear what the advantages may be for other organisations with a similar requirement. The unique proposition of Gaboto seems less to be the framework itself so much as components developed for it that serialize data in a range of geo-spatial formats such as KML.

Overall I think Gaboto is more of a point solution for the project to get over a particular problem rather than something innovative in itself; other institutions may find it a good approach for holding and serving geo-spatial data, but there are other options available. Gaboto is also not an out-of-the-box solution, but rather a toolkit that can be used by Java and RDF-savvy developers to create their own solution to similar problems.

Modelling institutional spaces and resources

In terms of innovation, I think what Oxford have done here that is new is to undergo the process of mapping buildings and other resources (including wireless access points and car parks), and linking the descriptions of these resources together into a coherent model (for example, to describe sites as well as individual buildings). This model is then used to drive applications using Gaboto as the framework to deliver the data to applications. The data is exposed for use in a variety of formats which can be found from the Oxpoints website.

This type of semantic location modelling is something which has been described quite well in research literature (for example, see Roth, 2005; and Kalamatsos et al. 2009) and there is a current EU FP7 project which seems to explore similar principles (see MUGGES). However there no other live, practical examples of implementing this approach in the HE sector that I'm aware of which is what makes this work innovative. It will be interesting to see what kinds of services the Oxpoints team can support with this data, and whether the benefits of those services will be sufficient to encourage other institutions to undertake similar exercises with their own resources.

As with open linked data generally, the benefits are only made visible in the applications that make use of it, but I think there is a lot of potential future innovation to be developed using the dataset.

References

Kolomvatsos, Kostas and Papataxiarhis, Vassilis and Tsetsos, Vassileios (2009). Semantic Location Based Services for Smart Spaces. In Metadata and Semantics. Sicilia Miguel-Angel and Lytras, Miltiadis D.(eds.), Springer US.

Roth, Jorg. (2005) The Role of Semantic Locations for Mobile Information Access. Mobiles Informationsmanagement und seine Anwendungen, Sept. 22, 2005, Bonn, Proceedings of the 35th annual GI conference, Vol. 2, 538-542

September 09, 2010

Making standards and specifications: Technical approaches

Later this month is the second CETIS Future of Interoperability Standards event, and as I've been involved in drafting interoperability specifications and standards for about a decade now, using quite a wide range of different techniques, its a good time for me to articulate what I think I've learned so far. What follows is my position paper for this event.

When I first started the specifications I worked on were based principally around lists or tables of elements, as many as people could think of, with an XML DTD. Since then I've seen the introduction of UML, Use Cases, WSDLs, REST, RDF and a whole host of other things into the specification process. Some of these work, some don't. Here's my personal view based on my experiences to date.

UML

I like UML, but I've seen it overused. In small doses, UML can bring clarity and simplicity to what can otherwise be an impenetrable wall of SHALL, MUST and MAYs. In large doses, it can bulk out a simple spec into a huge impenetrable tome full of arcane diagrams. I think a UML class diagram is a great way to summarise a data model. If you need more than one page for it, the spec is probably too complex. If you need more than one diagram, the spec may need breaking up into multiple smaller modules.

UML sequence diagrams can be handy when there is a very important choreography that needs to be implemented, particularly for things like security specifications where you need to understand how multiple parties interact (e.g. oAuth). However they aren't always very readable, even for developers, and so if there is a need for a sequence diagram then there is also a need for a step-by-step walkthrough. For example, Eran's simple oAuth workflow with pictures is much easier to follow than a UML sequence diagram. Without it I probably wouldn't have bothered trying to understand the detailed choreography.

Overall I think I would recommend using UML as an aid to explanation, and as a way of warning yourself when things are becoming too complex. During the specification process, using UML is also a good way to check mutual understanding of what the spec is and it current status, but must be heavily moderated for the actual specification documentation.

Use Cases and Requirements

Specifications really do need requirements, and there are several ways to do this. IMS uses Use Cases in a fairly traditional format. W3C uses use cases for brainstorming, and then captures Requirements from them as brief, but normative statements (see, for example, the Widgets 1.0 Requirements document). In CEN I've worked on specs using high level "business cases" which are similar to use cases but structured slightly differently to capture things like non-functional requirements and the business context (see, for example, CWA 15903.)

In general I don't think it matters too much how these things are documented. But it does matter how requirements are managed.

One particular problem is defining the specification scope. It is very easy to stretch the scope to fit an edge case, particularly in a small community with a few vociferous members, as someone can latch onto such a case and easily distort the whole process. It is really difficult sometimes to make a distinction between requirements that have a direct implementation need (that is, its part of an existing system or will be implemented as soon as the spec is in draft) versus those that are speculative with no identifiable implementation strategy. Its not necessarily a bad thing to design specifications so that they are flexible and can meet future needs - I think that is an excellent design goal (q.v.), but quite another to invent speculative requirements and use cases to justify it.

Overall I think we're getting better at requirements and scoping, but some specifications are still far too broad.

Another problem is the requirement defining its solution, which then hampers the process of coming up with the specification design to suit a range of implementations.

Design Goals

Something I like about the way the webapps group has worked in W3C is setting out some general design goals independently of specific requirements. I think these are a good checklist to use when evaluating the effectiveness of a specification as a whole, rather than whether it implements a particular requirement. I've also introduced this approach in other specification work, such as XCRI and HEAR and I think its one I'd recommend more widely.

RDF and Semantic Stuff

I'm a bit ambivalent about Semantic Web technologies, but I do think a way of modelling semantics is very useful and worth applying to specifications. Most specifications involve concepts that are implemented in information models, and the way RDF properties and classes are defined provides a good model for how to do this in a way that builds on and references concepts in other specifications. For example, explicitly relating properties in a specification to elements in the Dublin Core Element Set (aka ISO15836). Also, if an information model is expressed using the semantic web constructs of "classes", "properties", "domains" and so on, it is then very clear how to relate this to a UML class diagram summarising the specification, and makes it easier to cross-check.

Another really good idea that came from RDF is the idea of assigning a URI to each property and class. This makes it very simple to reuse individual properties defined in other standards as you can identify them unambiguously.

On the negative side, there is a lot of academic complexity and obscure terminology in these technologies and this really should be avoided for specifications where possible.

Singapore Framework

A technique that emerged from the metadata and semantic web world is to create a distinction between "vocabulary" specifications and implementation profiles. This is subtly different from the approach taken to create application profiles (e.g. of the LOM); vocabulary specifications defines only concepts, whereas profiles defines relationships and constraints.

The Singapore Framework sets out a methodology for constructing "domain models" and "description set profiles" based on Dublin Core, but which applies equally well to any specification based on reusing core vocabularies.

Use of this framework is being explored in specifications such as CEN's European Learner Mobility (EuroLMAI) standards and the UK HEAR specification. For example, CEN ELM defines a core vocabulary of classes and properties used in achievement data, a generic description set profile for "european learner mobility documents" and then a specific profile for the Europass Diploma Supplement.

As another example, CWA 15903 defines the concepts of learning opportunities and their properties. It doesn't offer any constraints on how many instances of a property a model can have, or really very much about their syntax. Other specifications can then take the concepts and define the constraints and bindings, for example XCRI.

However, I don't think the specific language and techniques for defining a Description Set Profile are of as much value as the distinction itself (however realized), so I'd suggest we learn from and be inspired by the framework rather than adopt it. For example, in EuroLMAI, the Description Set Profile is actually realised using constraint clauses (e.g. "each instance of ClassX MUST have exactly one PropertyY").

A side effect of separating concepts from implementation profiles is that you have a specification where you just focus on definitions. I think this can be really important; for example in recent IMS specifications for web services the information on what a field is for is tiny compared with the big UML interface diagrams and interface definition stuff, and in some cases has been pretty vague and even incorrect. This isn't to malign the authors (I was one of them!) - it is just that the format makes it harder to focus on providing good explanations of the meaning of properties and to provide good guidance on their use.

I think this approach may be useful to make better reuse of concepts shared across the domain, and for making it clearer when a specification actually needs a binding and technical conformance, and when it doesn't.

Conformance Testing

Testing is something we've struggled with as a community, and there has been some confusion over conformance testing, badging and certification and so on.

Overall I think its important to be able to test implementations of a specification. In W3C, there is a requirement for having tested implementations of specifications before they can be approved, and Marcos Caceres from Opera has produced a very interesting methodology for developing these tests. Having worked on an implementation myself I found the tests developed using this method easy to work with. Also, having a nice visible performance gauge for my work was a good motivation for improving the implementation.

I think this does point up something important about conformance testing - I think it has to be open, free, and transparent. There is a temptation to politicize conformance, or to make it into a revenue stream. I think this misses the point - conformance is also about making better specifications, and you don't want that to be distorted by a "pay to test" environment or have aspects of testing that are based on a nod-and-a-wink from some staffer. If necessary it may be a case of having neutral, free conformance testing alongside paid certification and marketing, but with a good clean separation of the two.

Another thing about testing - its useful to make the tests available early on, during the evolution of the specification. Often the tests themselves show up specification problems, and help identify scope issues. For example, if the specification mandates an untestable behaviour, maybe it should be optional; if its unclear what a fallback position is when something is missing, maybe it has to be mandatory or have a specified fallback behaviour that can be tested. Again I'd point to Marcos & Dominique's work here on test generation at W3C, as well as to the work of Ingo Dahn.

Open Source (Reference) Implementations

Again, something the community has struggled with over the years. Overall I think there is considerable value in having running code for new specifications, particularly things like basic libraries for a range of platforms. In some cases this is uncontroversial, but there have been problems in terms of ownership conflicts and sustainability. In general I think its important to have viable open source implementations, independent of the specification body itself, but not necessarily considered "reference" implementations. I think much of the trouble comes from the SSO endorsing particular implementations rather than relying on an open conformance process (see above) to allow users and implementers to draw their own conclusions.

There is also the issue of OSS projects having access to specifications under development, and OSS contributors contributing to specifications. In some cases this isn't really important (e.g. IETF) in others by having an MOU (e.g. W3C and the ASF). However I think given the value that OSS brings to standards, if the process of specification development doesn't allow ANY open source project to engage (not just cherry picking the most popular) then the development process needs rethinking.

Note that this only really applies to specifications that are aimed at direct implementation; "vocabulary"-style standards and domain models aren't implemented in this fashion. I guess a rule of thumb is, if there are conformance tests, then there should be OSS implementations.

If in doubt, throw it out

One final thing, not really a technology but certainly a technique, is to be really ruthless about what makes the final cut. That doesn't just apply to the appendices and guidance stuff kicking around in some specification documents, but also the core models and functionality. If the key implementations that are testing a spec can't find a use for a field or never use a method or interface, consider cutting it out completely. Keep the draft around as it might turn out useful in a revision. If there is a whole section of functionality that is only used by a few implementations, separate it out as a mini-spec published separately to keep the core as small and easy to understand and implement correctly as possible. This can continue right up to the end of the process - for example in the W3C Widgets specs we've removed API methods and properties at each stage of the spec, often very simply as a result of asking "is anyone using this?"

In the early days I think we were keen to capture as wide a set of requirements as we could and provided redundancy in the specification to avoid too many non-conforming extensions. I think one consequence was an explosion of application profiles, and just as many interoperability issues as if we'd kept the specifications lean and mean to begin with.

Also, large complex specifications need many, many more tests to check conformance. In my recent W3C work I think its on average about 20 tests per XML element. So if a spec has 100 elements, that's about 2000 conformance tests to pass if you're doing it to the same level of detail. (W3C Widgets has 10 elements; some of its sub-specifications like Widget Updates have just a couple.)

Summing up

So what should we do in future? Or at least, until something better comes along?

  • Clearly separate standards for concepts from specifications for implementation
  • Use UML only where it adds clarity
  • Split up large standards/specifications into smaller documents

For standardising concepts:

  • Build on other standards (and reference rather than repeat)
  • Take note of the Singapore Framework for inspiration
  • Give each class and property its own URI

For specifications aimed at implementation:

  • Collect requirements broadly, but define scope narrowly (or push non-core cases into speclets/profiles)
  • Split up complex specifications into speclets
  • Have design goals as well as requirements
  • Encourage open source implementations during spec development, but don't necessarily label them as "reference" implementations
  • Provide useful tests, and provide them as early as possible to implementers (i.e. evolve them with the spec)
  • Remove things that don't get implemented during testing
  • Explain the specification in the terms an implementer will understand

July 06, 2010

Future of Interoperability Standards

If you're involved in developing standards and specifications then you should make a space in your diary for the second Future of Interoperability Standards event taking place in London on the 24th of September.

While the previous FIS event was focussed on areas such as governance and in particular the role of informal standardisation processes, this event is going to focus on technical aspects - how we model, develop, document, test and handle conformance for specifications and standards.

As before, we're inviting position papers, so if you have a view on how standards should be developed, or how the documentation should be written, or any other topic related to the area then you can either send us a paper by email, or blog it and send us a link, and we'll circulate it ahead of the meeting and make sure delegates have access to it during the event.

The papers from the previous event are still available, and are a great resource for people interested in this area.

June 29, 2010

Transfer Summit: Open Innovation-Development-Collaboration

Last week I attended TransferSummit, a conference organised by OSSWatch aimed at open innovation and collaboration between academia and the private sector.

I gave two talks at the event, and these were based around our experience at moving the outcomes of an EU-funded research project into the Apache Software Foundation, and in engaging with commercial partners.

The first was on barriers to community and focussed on areas such as governance, diversity and personal barriers to engaging in an Open Source development community, and how as a member of such a community you can make a contribution. Noirin Shirley, who gave another talk on a similar topic, made the useful suggestion of being a "greeter" for a project so that everyone who posts on a project mailing list gets a friendly response straight away.

The second talk was on dissemination beyond academic circles. This was a case study of the transfer of our work from a closed research project into an open project in the ASF incubator (Apache Wookie (Incubating)), looking at the process and business case. For us this move has proven to be extremely successful, and the value generated through adopting a fully-open development approach rather than the"open source, closed community" more typical of research projects has been far greater than the sum of our investment. While not every project can be as successful as this, hopefully it will at least help making the case easier for others.

Of course I didn't just go to do some talks! There were lots of sessions and two very interesting keynotes. I don't normally enjoy keynotes, but these were sufficiently different to be of interest. Steven Pemberton provided a historical perspective on open innovation before delving into what he sees as the key challenge facing open source: usability. Roland Harwood's keynote provided some very interesting case studies of open innovation, with examples including applying F1 logistics technology to hospital waiting lists, and Virgin Atlantic sourcing innovations from an online community of frequent flyers.

Other sessions I attended look at open innovation between business and academic teams, FOSS business models, the CodePlex Foundation (not .com!), open source innovation, community, knowledge transfer partnerships... its going to take me quite a while to let it all settle and figure out which of the things I learned about I can apply next.

Overall the "Open Innovation" message came through loud and clear, as did the clear willingness of both academic and commercial organisations to work together on this basis.

In my own mind I'm seeing "Open Innovation" as a methodology that can both support and be supported by the other "Open" agendas that the CETIS, OSSWatch and UKOLN innovation support centres have been pushing for some years now - Open Source, Open Standards, Open Content, Open Data - each of which also build upon and sustain each other.

The combination of these factors enables companies and universities to unlock innovation that generates far greater value than could be created by any one of them alone, or even by a more traditional "closed" partnership. The challenge ahead is to remove any remaining barriers to openness and collaboration, and to unlock the potential for open innovation involving universities and innovative companies; I think the event last week was an excellent start.

May 11, 2010

Transfer Summit

Next month I'm giving a couple of talks as part of the Transfer Summit event in Oxford, aimed at bringing together academia and business in Open Source. My own experiences in this area have been on the Apache Wookie (incubating) project, which started out in an academic project (TenCompetence) but is now being incubated by the Apache Software Foundation. Part of this process has involved connecting with a much broader range of organisations, including SMEs and large companies as well as universities and foundations, all of which bring something different to the project.

This meant persuading our Institute to actively commit to supporting our staff engaging in an open community process with no promise of future funding, rather than dumping some code in a repository at the end of the project and hoping somehow that something magical will happen. I had to draft and redraft a business case, all the time making the arguments about risks and opportunities for our management team. However, looking back I can safely say the actual results - in income alone - far exceeded my most optimistic projections.

An OSS strategy has proven for to be extremely successful for our small research unit, in terms of funding, reputation, partnerships, and academic opportunities, and this event is an opportunity for me to tell that story to a wider audience - so do come along if you're interested to find out more.

For more details and to register see http://www.transfersummit.com

April 22, 2010

Simplifying Learning Design

Guillaume Durand has posted a proposal for a Simple Learning Design 2.0. Simplifying IMS LD is certainly something worth attempting, but there turn out to be different ways to go about it.

Durand comments that "The main idea behind SLD 2.0 was to keep the essence of learning design in a voluntary simplistic specification easily usable as an add-on for IMS-CC 1.0. Several documents are already available."

I think this is a reasonable position to take; the IMS LD specification is extremely complex, hard to implement, and perhaps most problematic, hard for authors and users to understand. Most of the effort in recent years to improve adoption has focussed on improvements to tools supporting the specification, for example with ReCourse and Astro.

However there is only so far you can go with building tools and doing odd spec tweaks (like bolting on support for Widgets into the LD Services element) and so a re-think of IMS LD from the ground up is something worth thinking about.

Looking at Durand's proposal my immediate thought is that his idea of the "essence" of Learning Design is rather different from my own - and in fact he keeps what I would have thrown out, and he throws out some of what I would have kept.

Specifically, Durand has kept the aspects of Learning Design that are concerned with conditional branching. This has always been my least-favourite part of LD for a number of reasons; one of which is that I don't like putting programming logic into XML (<if>x<then>y<else>z<endif>). If you're going to use this sort of logic in an XML document, I think its better to use a scripting language like JavaScript or a functional programming language like Erlang. However, that's a pretty technical reason. The main reasons I would give for this being something to leave out of a simplified Learning Design language are that (a) SCORM already does this and is widely implemented and (b) most examples of simple learning designs don't use this feature, and are fairly linear flows.

This is brought home I think by what Durand has left out, which is grouping. In SLD 2.0, the only "groups" you can have for activities are everyone, and individuals. So as far as I can tell, no small-group activities are supported.

This positions SLD 2.0 much more closely to SCORM than to something like LAMS, which is probably the most popular LD platform. Which makes me wonder whether SLD 2.0 would have been better positioned as additional requirements for SCORM 2.0 rather than a simplification of LD?

So what would I do differently? Well, I think I'd start from somewhere else. I'd recognise that for individual, self-paced, adaptive content the only game in town is SCORM. And I'd take a look at existing implementations, like LAMS. And I'd focus on the things which make LD different from SCORM, which is around group and collaborative activities. And what I'd come up with would be something like this:

  • <sequence> a set of activities that have to be completed one after another
  • <choice> a set of activities that users can complete in any order they want
  • <dissolve> split the participants up to work as individuals
  • <merge> merge all the participants into one group
  • <group> split the participants into groups; this can be specified as dynamic, using some heuristics like preferred numbers per group, or with pre-defined groups. The design could specify whether the runtime should assign users randomly, let users select which group to join for themselves, or prompt the moderator to assign the users.
  • <synchronize> stop progress until everyone has completed the previous sequence or choice.
  • <wait> stop progress until the moderator decides to go on
  • <schedule> stop progress until a specified time.

Each of these concepts should seem pretty familiar to LAMS users, although the LAMS file format doesn't really look like this. It also looks an awful lot like a group workflow pattern language.

I think this specification is simpler than IMS LD, but at the same time it has almost no overlap with Durand's proposal. Which just goes to show that IMS LD is not only complex, but you can carve up the space it occupies in many different ways.

(For the activities themselves, they need titles, instructions, resources, and tools, and there are various ways you could specify that which I won't elaborate here.)

Here's an example of how it might look:

<schedule time="2010-05-05:12:00:00Z"/>
<sequence>
	<activity>
		<title>Getting started</title>
		<instructions>Read the briefing</instructions>
		<resource>
			<title>Briefing </title>
			<url>briefing.html</url>
		</resource>
		<content src="briefing.html"/>
	</activity>
</sequence>
<synchronize/>
<dissolve/>
<choice>
	<activity>
		<title>Read the articles</title>
		<instructions>Read each of the resources in this activity</instructions>
		<resource>
			<title>Article 1</title>
			<url>article-1.html</url>
		</resource>
		<resource>
			<title>Article 2</title>
			<url>article-2.html</url>
		</resource>
	</activity>
	<activity>
		<title>Do a quiz</title>
		<assessment src="self-test.xml"/>
	</activity>
</choice>
<group maxGroups="4" selection=":random"/>
<sequence>
	<activity>
		<title>Discuss</title>
		<instructions>Now in your group discuss the articles ...</instructions>
		<tool type="chat"/>
	</activity>
</sequence>
<wait/>
<dissolve/>
<sequence>
	<activity>
		<title>write individual log entry on activity</title>
		<tool type="text editor"/>
	</activity>
</sequence>
<synchronize/>
<merge/>
<sequence>
	<activity>
		<title>Plenary</title>
		<resource>
			<title>Debrief notes</title>
			 <url>debriefing.html</url>
		</resource>
	</activity>
</sequence>

April 20, 2010

Standardizing standardisation practices

Interesting set of reflections by Mattias Ganslandt on IBM's policy, set out in 2008, on working with standards organisations. IBM's policy initiative was aimed at more openness in processes and IPR policies by standards organisations, spurred on no doubt by the OOXML debacle.

At the JISC-CETIS future of interoperability standards event delegates also ranked lack of transparency and IPR issues as being two of the things we most wanted to fix in the eLearning standardisation domain, so its clearly still important.

Ganslandt makes the point that widespread adoption of such policies might lead towards homogenisation of standards setting organisations - not necessarily a good thing, as organisations differ along a wide range of criteria, and are often adapted to a particular set of conditions; for example, he cites the Danish review of openness in standards organisations, which concluded that openness is a trade-off proposition rather than an absolute criteria. For example, consortia may provide openness at the "front end" through open membership but have tighter editorial control from its Board of Directors at the "back end", or more restrictive membership criteria but greater openness and equality among members in the actual work of the consortium.

However as we see with the OWF, even at the most informal end of the standards spectrum there is a desire for more standardisation when it comes to IPR in particular. So perhaps it is practical to push for common IPR practices, irrespective of other characteristics of openness, as this would seem to be a more "absolute" criteria than process openness, which may indeed follow the pattern of tradeoffs that the Danish study concluded. And even then, I think it would be silly to conclude that all consortia are equally but differently open; there are clearly those that haven't even reached a tradeoff position yet with lack of openness at both ends.

Overall I have quite a lot of sympathy for the IBM stance, as poor IP policy and lack of transparency cause a lot of unnecessary friction and barriers in developing standards. However, pragmatically, some consortia are going to be strategic enough that you just end up gritting your teeth and trying to work past it rather than take a principled position. It would be interesting to see how IBM have put their policy into practice - and where.

archive