Monday, April 5, 2021

Featured Searches in a Blacklight/Solr Web Application

We've been working on a project to allow a Solr-backed application (an institutional repository built on Blacklight) to display configured "search features" - something akin to the breakout information boxes in a search engine - when a search is strongly affiliated with an organizational partner/journal/etc.

Rather than trying to predict actual searches that should be associated with a feature, we decided to leverage the facets in a given result set - the Search Feature is associated with a faceted field and one or more values. When a search is executed and there are facet values matching results in excess of a 'component threshold' (for example, 16% of the result set), we query a database for matched Features, and compare the aggregated tallies for the Feature to a display threshold (pretty high - 80% or more in testing).  This might be obvious to Blacklight veterans, but sorting the facets in the result set by count rather than by value is what makes this all possible.

Working against the analyzed result set rather than a predetermined set of queries permits the display of a Feature to be more emergent (for example, catching common acronyms for an academic unit or journal), but still accommodates a 'stable' link to a Feature that redirects to a search using the associated facet values as a filter query.

The data model for a Feature is pretty simple in its initial iteration - a slug identifier, a category (which maps to a faceted field), a description, links to a logo and external web site, and the associated facet values. We use the Feature data in two contexts - in the search results (with a compact display that can be expanded to show the description), and in "explore" pages presenting all the features for some categories. The application in question already has some authorization-restricted pages, so we were able to stand up a simple CRUD user interface for the features allowing us to delegate content management.

The stable links for features take advantage of duck-typing and Blacklight's deep-hash configuration to allow establishment of a filtered search context without precluding further filtering on the facet associated with a Feature's category: We define a query facet, but configure it not to display. Rather than an explicit query hash (which would be used to write out user-selectable values in a displayed facet), we have a "lazy" query proxy that implements the bracket method and builds named filters based on the configured facet values for a Feature, retrieved by the slug.

We anticipate ongoing work in scoring the Features - the limited data to begin with means very simple rules like "the top feature from each category" are sufficient to get us started - and in content management - making the descriptions Markdown seems likely in the foreseeable future. I'm interested to see how this develops in use, particularly in the context of some important counterpart efforts: We're also developing a reusable search "widget" to surface content associations in the university's centralized web content management system, and we are leveraging OJS's SWORD plugins to deposit articles from hosted journals immediately on publication (the hosted journals are all Features). Together these efforts suggest an intriguing capacity for our institutional repository to function as a partner platform.

Our IR is developed in a public source repository, so if you're interested in tracking this effort as a Blacklight developer you can find us on Github: