Friday, July 11, 2008

on to the next bad idea?

Mini-dsl is looking pretty good. I think it's time to revisit the "foreign keys" I'm using to associate records in the 2 Lucene indices.

Lucene is slow when your application tries to effect joins, building up a big BooleanQuery follow-up to a query on one index to get related documents from another. I got around the performance hit to a large extent by storing the related foreign doc id's as binary fields in each index. Then I just scooped the values up with a bit vector, and it was like I had executed the search.

Except that it's very easy to knock the indices out of synch. Also to be determined is the number of places the crosswalk data will be stored, and where the ORE feed will draw its data from.

No comments: