Friday, August 23, 2019

The Island and the Archipelago

This post does not begin with an orthogonal observation about NYC as archipelago, but: Close Rikers.

On 22 August 2019 I sat in on a meeting of the Archipelago Advisory Board at METRO.

METRO developers describe Archipelago as:
... an evolving Open Source Digital Objects Repository / DAM Server Architecture based on the popular CMS Drupal8/9 and ... a mix of deeply integrated custom-coded Drupal8 modules (made with care by us) and a curated and well-configured Drupal8 instance, running under a discrete and and well-planned set of service containers. All of this driven by a clear and concise but thoughtfully planned technical roadmap.
Archipelago was dreamt as a multi-tenant, distributed, capable system (as its name suggests!) and can live isolated or in flocks of similar deployments, sharing storage, services, or -- even better -- just the discovery layer. Learn more about the different Software Services used by Archipelago.
Archipelago's primary focus is to serve the GLAM community by providing a flexible, consistent, and unified way of describing, storing, linking, exposing metadata and media assets. We respect identities and existing workflows.
All of this operates under a different concept than the one we all have become used to in recent times.

I think this is a compelling project, but I want to push on that last sentence a bit.

Archipelago might be summarized as collapsing several parts of repository application system in the early 2010s mode - certainly the management application and the repository itself, possibly the reader/researcher-facing publications - into a Drupal management application. Part of this is accomplished by recognizing that at least some of the nuts-and-bolts work of the repository have been subsumed into services - for example, S3-bucket as storage API or a DOI source as an identification API.

Archipelago eschews a system-determined schema for objects in favor of json (I think json-ld, actually) as an object storage format, and leveraging the object interpretation of the stored json to expose the objects to Twig templates.

In the subsumption of the repository into a management and publication tool, Archipelago tracks with Princeton's Figgy (and, I might add, the work our team does at Columbia on Hyacinth, although we still publish to a Fedora installation). It is also not an alien trajectory to the one Stanford's SDR has been on - or Duke's Digital Repository, which was also recently redesigned more completely around Drupal.

A talk I gave at Open Repositories in- 2016? I forget when. The Dublin one, where the video disappeared. Anyway, in a talk about the future of Fedora Commons and APIs, I briefly digressed into the virtues of CDL's curation microservices (to the surprise, I think, of the CDL delegation) - but I think it's clear that even if Merritt didn't per se change the way we all go about this work, the footprints (feetprint?) of S3 and Datacite across digital libraries suggests that the disarticulation of the repository into a process of services has happened - a trend that continues in Archipelago, which disarticulates a storage service, a description service, and an index service/API (Solr in particular, but the particulars are not necessary or even especially interesting to me this morning).

Listening to the METRO folks (that is, the inimitable Diego Pino) discuss Archipelago's templating system, I found myself reconsidering along these lines the Fedora Commons 3 Content Model Architecture. No, seriously!

The CMA was an elaboration of Sandy Payette and Karl Lagoze's Flexible and Extensible Digital Object Repository Architecture (that's right, F E D O R A) disseminators into a quasi-SOAP, aspirationally object-oriented set of behaviors specified as best they could be in other repository objects, and linked with RDF assertions between the content-bearing objects and their linked type-defining objects. In a frictionless world, this is an excellent model.

Unfortunately, the Fedora 3 CMA was deployed in the frictional world of J2EE. The practical constraints of Fedora-side implementation meant that the syntax was arcane, fragile, and expensive to run (as the linked services, hidden behind REST-fully accessed object "property" URLs, made calls back to the repository to get the information they needed from the object for which they were building a response while the client waited and so on). Like the Handle architecture (that's right, I said it) things weren't necessarily this way - but the social, organizational and platform considerations of the day determined them.

Not very long after, the Hydra project (staffed by Fedora Commons committers, growing out of the Blacklight project at Virginia, and aiming to manage and index Fedora content) would begin developing what might be understood as an overlay approach to disseminators in Rails apps. While the mixins and gems of the resulting Rails framework might themselves not seem to track towards Twig templates, moving the environment of development into a platform (see also Islandora on Drupal and Emory University's analogous work on Django) that has more front-end concerns and a diversity of dynamically evaluated template options strikes me as a necessary conceptual step towards them. Or, if not necessary, supported by being less horrifying than storing JSP and recompiling them to operate against your object exposed as JAXB somehow. A chill runs up my spine.

This is all to say that I see the Archipelago project as being on a vector of repository work (and as noted above, not alone on it) that intersects previous work. It's not the only vector that does - we remain in a holding pattern about the distinct repository at my place of work, and I think there's service preservation/sustainability arguments that can be made for it, to say nothing of performance considerations - but I think it makes interesting observations about where to locate and value the labor of running, managing, and publishing digital collections. Its design approach also makes a claim about what the optimal balance of abstraction and community of practice is. I'm interested to see where it goes from here.