Hello Jérôme!
Yes you are right, it should be for HIGH cardinality fields, so LOW frequent terms. I've updated the post - very big thanks for pointing that out !
--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
Hey Rafal,
While we are on it;
Is it just me or the description of pulsing and bloom filter codec on this page; http://elasticsearchserverbook.com/elasticsearch-0-90-using-codecs/ saying that it's appropriate for LOW cardinality field is wrong ? Pulsing and bloom are good for id lookup, so that's high cardinality field. An id field is normally the highest possible cardinality ? Maybe I'm wrong, but if so I would like to know what I don't understand 
Jerome
On Friday, April 19, 2013 9:57:50 AM UTC-4, Jérôme Gagnon wrote:
Just found... the class on ElasticSearch source is BloomFilterPostingsFormatProvider, so the type to user is "bloom_filter".
The documentation should be updated accordingly.
On Friday, April 19, 2013 9:55:49 AM UTC-4, Jérôme Gagnon wrote:
I'm experiencing the same issue, so any updates would be appreciated, thanks !
On Sunday, April 14, 2013 8:00:17 AM UTC-4, Rafał Kuć wrote:
Thanks Clinton,
I'm aware of the bloom_pulsing and bloom_default postings formats. I was wondering if I'm missing something after looking at the docs at http://www.elasticsearch.org/guide/reference/index-modules/codec/, because of the type name "bloom". I thought one can use the "bloom" type when defining a custom postings format and set the appropriate delegate, to for example pulsing or default.
But now, I've got another question. Is it possible to use a custom bloom filter based codec, like the bloom_default or bloom_pulsing ?
For example, the following request:
curl -XPOST 'localhost:9200/posts/' -d '{
"settings" : {
"index" : {
"codec" : {
"postings_format" : {
"custom" : {
"type" : "bloom_default",
"delegate" : "default",
"ffp" : "10k=0.01,1m=0.03"
}
}
}
}
},
"mappings" : {
"post" : {
"properties" : {
"id" : { "type" : "long", "store" : "yes", "precision_step" : "0", "postings_format" : "custom" },
"name" : { "type" : "string", "store" : "yes", "index" : "analyzed" },
"contents" : { "type" : "string", "store" : "no", "index" : "analyzed" }
}
}
}
}'
Gives the following exception:
{"error":"IndexCreationException[[posts] failed to create index]; nested: NoClassSettingsException[Failed to load class setting [type] with value [bloom_default]]; nested: ClassNotFoundException[org.elasticsearch.index.codec.postingsformat.bloomdefault.BloomDefaultPostingsFormatProvider]; ","status":500}
--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
The bloom codec needs to wrap another codec. Using "bloom" means "maintain a bloom filter in memory" but doesn't specify how the data should be stored on disk.
http://www.elasticsearch.org/guide/reference/index-modules/codec/
That said, it could throw a better error message
https://github.com/elasticsearch/elasticsearch/issues/2893
On Sat, Apr 13, 2013 at 11:34 PM, Rafał Kuć <r....@solr.pl> wrote:
Hello!
I've got a question about the postings format. When reading the
documentation we can see that there is a bloom posting format type.
However when trying to use it ElasticSearch throws an exception, for
example:
curl -XPOST 'localhost:9200/posts' -d '{
"settings" : {
"index" : {
"codec" : {
"postings_format" : {
"custom" : {
"type" : "bloom",
"delegate" : "pulsing"
}
}
}
}
},
"mappings" : {
"post" : {
"properties" : {
"id" : { "type" : "long", "store" : "yes", "precision_step" : "0", "postings_format" : "custom" },
"name" : { "type" : "string", "store" : "yes", "index" : "analyzed" },
"contents" : { "type" : "string", "store" : "no", "index" : "analyzed" }
}
}
}
}'
And the exception is as follows:
{"error":"IndexCreationException[[posts] failed to create index]; nested: NoClassSettingsException[Failed to load class setting
[type] with value [bloom]]; nested: ClassNotFoundException[org.elasticsearch.index.codec.postingsformat.bloom.BloomPostingsFormatProvider]; ",
"status":500}
According to the code there we are allowed to use the bloom_default or
bloom_pulsing types, but not the bloom itself (at least as the
pre-configured ones). And of course when configuring the id field with
one of the mentioned postings format it works without any problem,
which can be seen in the mappings:
{
"posts" : {
"post" : {
"properties" : {
"contents" : {
"type" : "string"
},
"id" : {
"type" : "long",
"store" : true,
"postings_format" : "bloom_pulsing",
"precision_step" : 2147483647
},
"name" : {
"type" : "string",
"store" : true
}
}
}
}
}
Am I missing something when it comes to the bloom type ? I'm using
0.90.RC2. Thanks for the answer.
--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.