Monday, July 21, 2008

Unbound, Bound again

Moving from CPU-bound to memory bound. The large number of docs in the secondary index is making the bit vectors ungainly, I think. Maybe I'll try testing performance using the scorer directly, like a doc id iterator, for the bigram term resolution.

The bigram resolution is fast, and also (embarrassingly) more accurate: It looks like Shanghai was a more imperfect algorithm than I thought. Oh well.

Result documents are trickier: Maybe I'll change the return type for docs() to send back a scorer as well?

If that goes well, I may also try out the fast bitset implementation hiding in the trunk of lucene.util.

No comments: