I had been trying to encode my queries as CQL, but there are some problems:
- It's painfully verbose, resulting in ridiculous URLs
- (( A prox B) not (( A prox B) prox C)) is not equivalent to (A near B) not near C
XTF would be nice, except:
- The tokenization and display requirements of the DDb project appear to be beyond the XTF customization options
- Substring searching
- Lemmatized forms
- Honestly, I'm still a little unsatisfied with the way queries are encoded in URLs
If I slog along with my current collection of tokenizers and indexers, I still need a way to make the url query encoding both more transparent and flexible. My thought of the week (which is showing some progress) is embedding a stripped-down javascript parser, limiting it to the js native types (Strings, Numbers, maybe functions), and creating basically a little DSL to express the queries.
So far, things look pretty promising. I think I can capture the text searching requirements pretty well with 8 or 9 defined functions and a few "barewords" to indicate mode of sensitivity to case, etc.
An ugly CQL example:
(((cql.keywords=/locale=grc.beta/ignoreCapitals/ignoreAccents "^kai^"
prox/unit=word/distance<=1 cql.keywords=/locale=grc.beta/ignoreCapitals/ignoreAccents "^upoqhkhs^") prox/unit=word/distance<=2 (cql.keywords=/locale=grc.beta/ignoreCapitals/ignoreAccents"^dik")))
versus something like:
then( beta("^kai^ ^upoqhkhs^",IA), beta("^dik",IA), 2 )
No comments:
Post a Comment