BitSet Filters in ES/Lucene


(ElasticSearch Users mailing list) #1

Hi,
I have looked at TermsLookupFilter and it is a good approach to cache
frequently used filters. However, even if I write a custom filter plugin, I
cannot use a BitSet to hold any sort of document identifier. Even the _uid
field is converted into a TermFilter.

Assume a scenario where I need to tag millions of documents with a tag like
"Finance", "IT", "Legal", etc.

Unless, I can cache these filters in memory, the cost of constructing this
filter at run time per query is not practical. If I could map the documents
to a numeric long identifier and put them in a BitMap, I could then cache
them because the size reduces drastically. However, I cannot use this
numeric long identifier in ES/Lucene filters, either Custom Filter Plugin
or Terms Lookup Filter. Is there any way?

I read about possible solutions in ES and found this link:
http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/.

Please help with this scenario. Thanks,

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d2a12986-220b-44c8-ac8f-a836de692c16%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #2