Tuesday, October 13, 2009

Describing a tile

Trying to use the Djatoka jpeg2000 image viewer to display the image tiles / regions served up by an installation of the now-defunct eRez image server underscored the value of a good web API.

eRez Tile API

Parm Name Type Function
src string path to the ptif src, relative to the eRez image root
width integer width of the resulting image tile (will stretch to fit)
height integer height of the resulting image tile (will stretch to fit)
top float the position of the top edge of the tile relative to the entire scaled image, expressed as a decimal fraction
left float see "top"
bottom float see "top"
right float see "top"
scale float the ratio of the dimensions of an entire image composed of tiles in the requested size to the dimensions of the original image, expressed as a decimal fraction.
tmp string constant the value is "ajax-viewer", unquoted

OpenURL getRegion API

Parm Name Type Function
svc.level integer a scaling indicator, as specified here
svc.region integer or float list the top edge position, left edge position, region height and width. Concatenated as a comma-delimited value.
svc.scaleinteger or float listscaling factor as either a single value, or a targeted width and height.
If the latter, a value of zero for one of the dimensions indicates the original proportions should be maintained.

Translating OpenURL Level to eRez Scale

After calculating the maximum levels, any given level converts to scale as:

scale = 1 / 2(maxLevels - requestedLevel)

Wednesday, September 30, 2009

you lying, non-ascii bastards

grep -l $'[\x80-\xff]' * > nonascii.txt

Monday, September 28, 2009

brain dump

What about a collection of micro-apps that extract linked data from epidoc, a la SNERT/OC? One for date info normalization, one to spit back pleiades, etc.

Monday, February 9, 2009

Date ranges, Ontology, etc.

Digital Humanities 2007
Time Period Directory Standard, 2006
A naïve ontology for concepts of time and space for searching and learning, 2007

Pharaonic navigation has an advantage here in that (ignoring protests of some historians, I'm sure) there are commonly defined, named periods with determinate endpoints. It would be possible to suggest some vocabulary to incorporate them. But that wouldn't provide many linkages to currently-known partners, and it wouldn't accurately describe most of the collection. So, what other approach would be more appropriate and inclusive?

Thursday, January 29, 2009

Concordia Graph thoughts

Horosthesia links to the current Concordia Graph.

The ontology it indicates very nearly encompasses the relationships of metadata Descriptions, Texts, Objects, and Transcriptions that I'm working on for the APIS image repository. One of my open questions about this enterprise is the availability of URIs for the Objects (and Texts, really). The Images and Descriptions I'll be more-or-less in control of, but I want to avoid a scenario in which I incur responsibility for identifiers outside of my domain.

I am supposing that the custodian of the transcriptions in the DDb might be able to structure things in such a way that creates Text URIs to parallel the transcription URIs in the databank.

In any case, if that graph is augmented with a "depicts" relationship and an Image object, my conceptual work appears to be done.

Tuesday, January 27, 2009

Modeling relationships between Ancient Document Resources

Entities relating to ancient documents: Images, Texts, Text-Bearing Objects, Descriptions
* Text-Bearing Objects are Physical Objects
* Text-Bearing Objects have material and bibliographic Descriptions
* Images depict Texts and Text-Bearing Objects
* Texts are inscribed on Text-Bearing Objects
* Texts have bibliographic and interpretive Descriptions (eg link to APIS or HGV metadata)
* Texts have Transcriptions (eg link to DDbDP transcription)
* Texts have Translations (eg link to DDbDP translation)

In situations that a Text or Text-Bearing Object is depicted by multiple images, the Text should have a structural description indicating the way each depicting image relates to it.

Thursday, January 15, 2009

This can't be right

lines in the input file may contain a hex sequence interspersed among ascii text, in the form 0x{hex characters}.

import fileinput
import re
import codecs
utf = codecs.lookup('utf-8')
hex = re.compile("[0-9a-f]{2}")
pattern = re.compile("0x[0-9a-f]*")
def hexrepl(matchobj):
input = matchobj.group(0)
result = ""
numbers = []
for m in hex.findall(input):
result = result.join(numbers)
return utf.encode(utf.decode(result)[0])[0]
for line in fileinput.input():
line = re.sub(pattern,hexrepl,line)
print line.rstrip()

This feels overly complicated, and I'm only an occasional Python hack. I'm forgetting something.

Wednesday, January 14, 2009

File Under: Reasons not to encode Data in File Names

Consider: Two adjoining papyrus fragments that connect on the horizontal. On one side, a relatively long text that spans the two fragments (text A). From the perspective of this text, there are two additional texts (in two different hands) in the right "margin" of the rightmost fragment, one (text B) above the other (text C). Finally, there is some writing on the other side (text D) of the joined fragments.

You have a cataloging scheme that separately identifies each text on the physical artifact. You have an image file naming scheme that effectively represents the relationship of a catalog record with an image file: {catalog id}.{front/back}.{tile coordinate if applicable}.{resolution}.{format}.

From the perspective of the text A, images of the two fragments might be named A.front.left.high.tif, A.front.right.high.tif, etc. But wait! Isn't A.front.left.high.tif also D.back.left.high.tif? And isn't A.front.right.high.tif also B.front.high.tif?