Thursday, January 29, 2009

Concordia Graph thoughts

Horosthesia links to the current Concordia Graph.

The ontology it indicates very nearly encompasses the relationships of metadata Descriptions, Texts, Objects, and Transcriptions that I'm working on for the APIS image repository. One of my open questions about this enterprise is the availability of URIs for the Objects (and Texts, really). The Images and Descriptions I'll be more-or-less in control of, but I want to avoid a scenario in which I incur responsibility for identifiers outside of my domain.

I am supposing that the custodian of the transcriptions in the DDb might be able to structure things in such a way that creates Text URIs to parallel the transcription URIs in the databank.

In any case, if that graph is augmented with a "depicts" relationship and an Image object, my conceptual work appears to be done.

Tuesday, January 27, 2009

Modeling relationships between Ancient Document Resources

Entities relating to ancient documents: Images, Texts, Text-Bearing Objects, Descriptions
* Text-Bearing Objects are Physical Objects
* Text-Bearing Objects have material and bibliographic Descriptions
* Images depict Texts and Text-Bearing Objects
* Texts are inscribed on Text-Bearing Objects
* Texts have bibliographic and interpretive Descriptions (eg link to APIS or HGV metadata)
* Texts have Transcriptions (eg link to DDbDP transcription)
* Texts have Translations (eg link to DDbDP translation)

In situations that a Text or Text-Bearing Object is depicted by multiple images, the Text should have a structural description indicating the way each depicting image relates to it.

Thursday, January 15, 2009

This can't be right

lines in the input file may contain a hex sequence interspersed among ascii text, in the form 0x{hex characters}.

import fileinput
import re
import codecs
utf = codecs.lookup('utf-8')
hex = re.compile("[0-9a-f]{2}")
pattern = re.compile("0x[0-9a-f]*")
def hexrepl(matchobj):
input = matchobj.group(0)
result = ""
numbers = []
for m in hex.findall(input):
numbers.append(chr(int(m,16)))
result = result.join(numbers)
return utf.encode(utf.decode(result)[0])[0]
for line in fileinput.input():
line = re.sub(pattern,hexrepl,line)
print line.rstrip()
exit


This feels overly complicated, and I'm only an occasional Python hack. I'm forgetting something.

Wednesday, January 14, 2009

File Under: Reasons not to encode Data in File Names

Consider: Two adjoining papyrus fragments that connect on the horizontal. On one side, a relatively long text that spans the two fragments (text A). From the perspective of this text, there are two additional texts (in two different hands) in the right "margin" of the rightmost fragment, one (text B) above the other (text C). Finally, there is some writing on the other side (text D) of the joined fragments.

You have a cataloging scheme that separately identifies each text on the physical artifact. You have an image file naming scheme that effectively represents the relationship of a catalog record with an image file: {catalog id}.{front/back}.{tile coordinate if applicable}.{resolution}.{format}.

From the perspective of the text A, images of the two fragments might be named A.front.left.high.tif, A.front.right.high.tif, etc. But wait! Isn't A.front.left.high.tif also D.back.left.high.tif? And isn't A.front.right.high.tif also B.front.high.tif?