Thursday, July 13, 2017

Drifting

In a post reflecting on the software development practice in the Hydra/Samvera community, Jonathan Rochkind begins a late pivot towards a more general complaint by framing Samvera:

And finally, a particularly touchy evaluation of all for the hydra/samvera project; but the hydra project is 5-7 years old, long enough to evaluate some basic premises. I’m talking about the twin closely related requirements which have been more or less assumed by the community for most of the project’s history:
1) That the stack has to be based on fedora/fcrepo, and
2) that the stack has to be based on native RDF/linked data, or even coupled to RDF/linked data at all.
I believe these were uncontroversial assumptions rather than entirely conscious decisions, but I think it’s time to look back and wonder how well they’ve served us, and I’m not sure it’s well.

This 5 sentence history of Hydra/Samvera is a fabrication. The Hydra project began in 2008 as attempt to combine a Blacklight discovery layer and a Fedora 3 repository, debatably with the goal of improving the notion of services/disseminators in the Fedora 3 CMA by making them contained applications. The Fedora Commons project was one of its original partners. It's strange to characterize that backend as an assumption rather than the motivating use case when the core library from the project's onset is ActiveFedora (published February 2009).

I'm more sympathetic to interrogating the relationship of Samvera to linked data, but casting that decision as an assumption- rather than the conscious development goals of Hydra/Samvera partners who were trying focus their descriptions less on XML serializations and more on the description as data- is patronizing. I can agree that we should "look back and wonder how well they’ve served us", but it's always been the time to do that (as far as I can tell, the ActiveFedora:RDF package was introduced in 2013 as a reaction to frustration managing object descriptions as files). If I were an employee at Penn State, whose work prompted the accommodation of RDF as ActiveRecord-style properties rather than as serialized files, I'd be insulted to see my work characterized as the product of not "entirely conscious decisions" or "uncontroversial assumption[s]".

At more than one meeting now (in the interest of disclosure: I gave a talk on related topics at OR2016, participated in two panels touching on the issue at Hydra Connect 2016, and have been involved for some years with the Fedora Commons community and project), there's been open discussion of what the relationship of Samvera to Fedora ought to be going forward. It's clearly a question motivating the work on Valkyrie at Princeton. Over the years there's been more than one alternate backend written to mimic the Fedora APIs. There's analogous conversations in the world of Blacklight when a potential installer wants to use a different noSql store than Solr.

The critical question underlying those efforts and conversations is to what degree the software products of these various projects should be shareable, whether the surface of interaction is within a platform, across a shared index, across API abstractions, or at the achievement of consensus around use case and functional requirements. Rochkind takes a different tack, suggesting that a hard pivot away from abstraction should be the baseline and arguing that we need to justify any commitment beyond Rails and Paperclip. This strikes me as reductive and dismisses of my own experience: that generalizing description in a database moves pretty quickly towards re-inventing RDF in tables, and that storing blobs of serialized description leads to re-inventions of Fedora without the mediating APIs. If our response to the problems motivating Rochkind's post were to advocate interacting directly with the backing databases and file systems of Fedora, it might work - it might be faster! - but we would certainly not be proposing it as a minimalist path towards more sustainable software approaches.

We can scrutinize our approaches to the problem of managing assets and description in shareable ways without a fabulous and dismissive historical framing of the project and the use cases of its participating institutions. But we should also be cognizant of what *some kind* of abstraction yields: An Avalon or a Charon can function as a common tool to originate content subsequently repurposed for independent, locally developed publication platforms; I still think we'll inch towards shared practice, and thus shared content, with Islandora. Integrative projects like this require some kind of interface- the question is where to locate it.