Wednesday, July 9, 2008

Why not CQL? Why not XTF?

As the good folks at XTF already know, the semantics of text searching are slightly different from those for searching structured metadata.

I had been trying to encode my queries as CQL, but there are some problems:
  • It's painfully verbose, resulting in ridiculous URLs
  • (( A prox B) not (( A prox B) prox C)) is not equivalent to (A near B) not near C

XTF would be nice, except:
  • The tokenization and display requirements of the DDb project appear to be beyond the XTF customization options
  • Substring searching
  • Lemmatized forms
  • Honestly, I'm still a little unsatisfied with the way queries are encoded in URLs

If I slog along with my current collection of tokenizers and indexers, I still need a way to make the url query encoding both more transparent and flexible. My thought of the week (which is showing some progress) is embedding a stripped-down javascript parser, limiting it to the js native types (Strings, Numbers, maybe functions), and creating basically a little DSL to express the queries.

So far, things look pretty promising. I think I can capture the text searching requirements pretty well with 8 or 9 defined functions and a few "barewords" to indicate mode of sensitivity to case, etc.

An ugly CQL example:
(((cql.keywords=/locale=grc.beta/ignoreCapitals/ignoreAccents "^kai^"
prox/unit=word/distance<=1 cql.keywords=/locale=grc.beta/ignoreCapitals/ignoreAccents "^upoqhkhs^") prox/unit=word/distance<=2 (cql.keywords=/locale=grc.beta/ignoreCapitals/ignoreAccents"^dik")))

versus something like:
then( beta("^kai^ ^upoqhkhs^",IA), beta("^dik",IA), 2 )

1 comment:

Anonymous said...

Single Zero Roulette – High Stakes from Tropicana Atlantic City on-line. You can wager a lot as} $4,000 on every 카지노사이트 spin, cranking up the stress with each round. Everything is offered on one screen so you can to|you possibly can} quickly see what all is going on with no distractions or additional displays to keep observe of.