Apotelesm

Thursday, January 29, 2009

Concordia Graph thoughts

Horosthesia links to the current Concordia Graph.

The ontology it indicates very nearly encompasses the relationships of metadata Descriptions, Texts, Objects, and Transcriptions that I'm working on for the APIS image repository. One of my open questions about this enterprise is the availability of URIs for the Objects (and Texts, really). The Images and Descriptions I'll be more-or-less in control of, but I want to avoid a scenario in which I incur responsibility for identifiers outside of my domain.

I am supposing that the custodian of the transcriptions in the DDb might be able to structure things in such a way that creates Text URIs to parallel the transcription URIs in the databank.

In any case, if that graph is augmented with a "depicts" relationship and an Image object, my conceptual work appears to be done.

Tuesday, January 27, 2009

Modeling relationships between Ancient Document Resources

Entities relating to ancient documents: Images, Texts, Text-Bearing Objects, Descriptions
* Text-Bearing Objects are Physical Objects
* Text-Bearing Objects have material and bibliographic Descriptions
* Images depict Texts and Text-Bearing Objects
* Texts are inscribed on Text-Bearing Objects
* Texts have bibliographic and interpretive Descriptions (eg link to APIS or HGV metadata)
* Texts have Transcriptions (eg link to DDbDP transcription)
* Texts have Translations (eg link to DDbDP translation)

In situations that a Text or Text-Bearing Object is depicted by multiple images, the Text should have a structural description indicating the way each depicting image relates to it.

Thursday, January 15, 2009

This can't be right

lines in the input file may contain a hex sequence interspersed among ascii text, in the form 0x{hex characters}.


import fileinput
import re
import codecs
utf = codecs.lookup('utf-8')
hex = re.compile("[0-9a-f]{2}")
pattern = re.compile("0x[0-9a-f]*")
def hexrepl(matchobj):
  input = matchobj.group(0)
  result = ""
  numbers = []
  for m in hex.findall(input):
    numbers.append(chr(int(m,16)))
  result = result.join(numbers) 
  return utf.encode(utf.decode(result)[0])[0]
for line in fileinput.input():
  line = re.sub(pattern,hexrepl,line)
  print line.rstrip()
exit

This feels overly complicated, and I'm only an occasional Python hack. I'm forgetting something.

Wednesday, January 14, 2009

File Under: Reasons not to encode Data in File Names

Consider: Two adjoining papyrus fragments that connect on the horizontal. On one side, a relatively long text that spans the two fragments (text A). From the perspective of this text, there are two additional texts (in two different hands) in the right "margin" of the rightmost fragment, one (text B) above the other (text C). Finally, there is some writing on the other side (text D) of the joined fragments.

You have a cataloging scheme that separately identifies each text on the physical artifact. You have an image file naming scheme that effectively represents the relationship of a catalog record with an image file: {catalog id}.{front/back}.{tile coordinate if applicable}.{resolution}.{format}.

From the perspective of the text A, images of the two fragments might be named A.front.left.high.tif, A.front.right.high.tif, etc. But wait! Isn't A.front.left.high.tif also D.back.left.high.tif? And isn't A.front.right.high.tif also B.front.high.tif?

Saturday, November 22, 2008

Mapping note

Mapping structured citations: If there is a successful sided match, add the unsided cite to the matched list to preclude false recto/verso matches.

Friday, November 21, 2008

Concordia, naming, sparse relationships

http://www.atlantides.org/trac/concordia/wiki/ConcordiaThesaurus
Numbers server already resolves (ultimately) to RDF
low-hanging fruit: adding some known metadata properties to the leaves
high-hanging fruit: disentangling the many sparse relationships
**
What if rather than having any organizational center for the index, we reorganized things around a more abstracted graph relating Objects (inventory numbers), Texts (citations), and CatalogEntries (metadata records, these might be Editions)

Say we have two more relationships: METADATA ore:describes TEXT/OBJECT, and METADATA ore:similarTo METADATA

Number Server lets you drill down through identifier hierarchies as aggregates, OR lets you see a graph centered on a particular URI.

More at some indeterminate point in the future.

Wednesday, November 12, 2008

Encoding javascript utf16 characters for urls

Dug this up from the bowels of the ddbdp webapp code. I wrote it a long time ago, but didn't end up needing it. Seems like it might be useful down the line...


var firstByteMark = [ 0x00,0x00,0xC0,0xE0,0xF0,0xF8,0xFC ];
var byteMask = 0xBF;
var byteMark = 0x80;

function UTF16toUTF8Bytes(u16){
    var bytes = new Array();
    if (u16 < 128){
        bytes.length = 1;
    } else if (u16 < 2048){
        bytes.length = 2;
    } else { // presuming max js charCode of 65535
        bytes.length = 3;
    }
    switch (bytes.length){
        case 3:
            bytes[2] = ((u16 | byteMark) & byteMask);
            u16 >>= 6;
        case 2:
            bytes[1] = ((u16 | byteMark) & byteMask);
            u16 >>= 6;
        case 1:
            bytes[0] = (u16 | firstByteMark[bytes.length]);
    }
    return bytes;
}
function encode(input){
    var output = new Array();
    var inputArray = input.split(/\s+/);
    for(var i=0;i<inputArray.length;i++){
        var term = '';
        for(var j=0;j<inputArray[i].length;j++){
            var u16 = inputArray[i].charCodeAt(j);
            if (u16 < 128){
                term += inputArray[i].charAt(j);
                continue;
            }
            var utf8bytes = UTF16toUTF8Bytes(u16);
            for(var k=0;k<utf8bytes.length;k++){
                if(utf8bytes[k] < 16){
                    term += "%0";
                } else {
                   term += "%";
                }
                term += utf8bytes[k].toString(16);
            }
        }
        output[i] = term;
    }
    return output;
}