0.90.RC2 postings format

Hello!

I've got a question about the postings format. When reading the
documentation we can see that there is a bloom posting format type.
However when trying to use it ElasticSearch throws an exception, for
example:

curl -XPOST 'localhost:9200/posts' -d '{
"settings" : {
"index" : {
"codec" : {
"postings_format" : {
"custom" : {
"type" : "bloom",
"delegate" : "pulsing"
}
}
}
}
},
"mappings" : {
"post" : {
"properties" : {
"id" : { "type" : "long", "store" : "yes", "precision_step" : "0", "postings_format" : "custom" },
"name" : { "type" : "string", "store" : "yes", "index" : "analyzed" },
"contents" : { "type" : "string", "store" : "no", "index" : "analyzed" }
}
}
}
}'

And the exception is as follows:
{"error":"IndexCreationException[[posts] failed to create index]; nested: NoClassSettingsException[Failed to load class setting
[type] with value [bloom]]; nested: ClassNotFoundException[org.elasticsearch.index.codec.postingsformat.bloom.BloomPostingsFormatProvider]; ",
"status":500}

According to the code there we are allowed to use the bloom_default or
bloom_pulsing types, but not the bloom itself (at least as the
pre-configured ones). And of course when configuring the id field with
one of the mentioned postings format it works without any problem,
which can be seen in the mappings:

{
"posts" : {
"post" : {
"properties" : {
"contents" : {
"type" : "string"
},
"id" : {
"type" : "long",
"store" : true,
"postings_format" : "bloom_pulsing",
"precision_step" : 2147483647
},
"name" : {
"type" : "string",
"store" : true
}
}
}
}
}

Am I missing something when it comes to the bloom type ? I'm using
0.90.RC2. Thanks for the answer.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

The bloom codec needs to wrap another codec. Using "bloom" means "maintain
a bloom filter in memory" but doesn't specify how the data should be stored
on disk.

http://www.elasticsearch.org/guide/reference/index-modules/codec/

That said, it could throw a better error message

On Sat, Apr 13, 2013 at 11:34 PM, Rafał Kuć r.kuc@solr.pl wrote:

Hello!

I've got a question about the postings format. When reading the
documentation we can see that there is a bloom posting format type.
However when trying to use it ElasticSearch throws an exception, for
example:

curl -XPOST 'localhost:9200/posts' -d '{
"settings" : {
"index" : {
"codec" : {
"postings_format" : {
"custom" : {
"type" : "bloom",
"delegate" : "pulsing"
}
}
}
}
},
"mappings" : {
"post" : {
"properties" : {
"id" : { "type" : "long", "store" : "yes", "precision_step" : "0",
"postings_format" : "custom" },
"name" : { "type" : "string", "store" : "yes", "index" : "analyzed" },
"contents" : { "type" : "string", "store" : "no", "index" : "analyzed"
}
}
}
}
}'

And the exception is as follows:
{"error":"IndexCreationException[[posts] failed to create index]; nested:
NoClassSettingsException[Failed to load class setting
[type] with value [bloom]]; nested:
ClassNotFoundException[org.elasticsearch.index.codec.postingsformat.bloom.BloomPostingsFormatProvider];
",
"status":500}

According to the code there we are allowed to use the bloom_default or
bloom_pulsing types, but not the bloom itself (at least as the
pre-configured ones). And of course when configuring the id field with
one of the mentioned postings format it works without any problem,
which can be seen in the mappings:

{
"posts" : {
"post" : {
"properties" : {
"contents" : {
"type" : "string"
},
"id" : {
"type" : "long",
"store" : true,
"postings_format" : "bloom_pulsing",
"precision_step" : 2147483647
},
"name" : {
"type" : "string",
"store" : true
}
}
}
}
}

Am I missing something when it comes to the bloom type ? I'm using
0.90.RC2. Thanks for the answer.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks Clinton,

I'm aware of the bloom_pulsing and bloom_default postings formats. I was wondering if I'm missing something after looking at the docs at http://www.elasticsearch.org/guide/reference/index-modules/codec/, because of the type name "bloom". I thought one can use the "bloom" type when defining a custom postings format and set the appropriate delegate, to for example pulsing or default.

But now, I've got another question. Is it possible to use a custom bloom filter based codec, like the bloom_default or bloom_pulsing ?

For example, the following request:

curl -XPOST 'localhost:9200/posts/' -d '{

"settings" : {

"index" : {

"codec" : {

"postings_format" : {


 "custom" : {


  "type" : "bloom_default",


  "delegate" : "default",


  "ffp" : "10k=0.01,1m=0.03"


 } 


}

}

}

},

"mappings" : {

"post" : {

"properties" : {

"id" : { "type" : "long", "store" : "yes", "precision_step" : "0", "postings_format" : "custom" },


"name" : { "type" : "string", "store" : "yes", "index" : "analyzed" },


"contents" : { "type" : "string", "store" : "no", "index" : "analyzed" }

}

}

}

}'

Gives the following exception:

{"error":"IndexCreationException[[posts] failed to create index]; nested: NoClassSettingsException[Failed to load class setting [type] with value [bloom_default]]; nested: ClassNotFoundException[org.elasticsearch.index.codec.postingsformat.bloomdefault.BloomDefaultPostingsFormatProvider]; ","status":500}

--

Regards,

Rafał Kuć

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

The bloom codec needs to wrap another codec. Using "bloom" means "maintain a bloom filter in memory" but doesn't specify how the data should be stored on disk.

http://www.elasticsearch.org/guide/reference/index-modules/codec/

That said, it could throw a better error message

https://github.com/elasticsearch/elasticsearch/issues/2893

On Sat, Apr 13, 2013 at 11:34 PM, Rafał Kuć <r.kuc@solr.pl> wrote:

Hello!

I've got a question about the postings format. When reading the

documentation we can see that there is a bloom posting format type.

However when trying to use it ElasticSearch throws an exception, for

example:

curl -XPOST 'localhost:9200/posts' -d '{

"settings" : {

"index" : {

"codec" : {

"postings_format" : {


 "custom" : {


  "type" : "bloom",


  "delegate" : "pulsing"


 }


}

}

}

},

"mappings" : {

"post" : {

"properties" : {

"id" : { "type" : "long", "store" : "yes", "precision_step" : "0", "postings_format" : "custom" },


"name" : { "type" : "string", "store" : "yes", "index" : "analyzed" },


"contents" : { "type" : "string", "store" : "no", "index" : "analyzed" }

}

}

}

}'

And the exception is as follows:

{"error":"IndexCreationException[[posts] failed to create index]; nested: NoClassSettingsException[Failed to load class setting

[type] with value [bloom]]; nested: ClassNotFoundException[org.elasticsearch.index.codec.postingsformat.bloom.BloomPostingsFormatProvider]; ",

"status":500}

According to the code there we are allowed to use the bloom_default or

bloom_pulsing types, but not the bloom itself (at least as the

pre-configured ones). And of course when configuring the id field with

one of the mentioned postings format it works without any problem,

which can be seen in the mappings:

{

"posts" : {

"post" : {


  "properties" : {


    "contents" : {


      "type" : "string"


    },


    "id" : {


      "type" : "long",


      "store" : true,


      "postings_format" : "bloom_pulsing",


      "precision_step" : <a style=" font-family:'courier new'; font-size: 9pt;" href="tel:2147483647">2147483647</a>

    },


    "name" : {


      "type" : "string",


      "store" : true


    }


  }


}

}

}

Am I missing something when it comes to the bloom type ? I'm using

0.90.RC2. Thanks for the answer.

--

Regards,

Rafał Kuć

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

--

You received this message because you are subscribed to the Google Groups "elasticsearch" group.

To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--

You received this message because you are subscribed to the Google Groups "elasticsearch" group.

To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

I'm experiencing the same issue, so any updates would be appreciated,
thanks !

On Sunday, April 14, 2013 8:00:17 AM UTC-4, Rafał Kuć wrote:

Thanks Clinton,

I'm aware of the bloom_pulsing and bloom_default postings formats. I was
wondering if I'm missing something after looking at the docs at
http://www.elasticsearch.org/guide/reference/index-modules/codec/,http://www.elasticsearch.org/guide/reference/index-modules/codec/ because
of the type name "bloom". I thought one can use the "bloom" type when
defining a custom postings format and set the appropriate delegate, to for
example pulsing or default.

But now, I've got another question. Is it possible to use a custom bloom
filter based codec, like the bloom_default or bloom_pulsing ?

For example, the following request:

curl -XPOST 'localhost:9200/posts/' -d '{
"settings" : {
"index" : {
"codec" : {
"postings_format" : {
"custom" : {
"type" : "bloom_default",
"delegate" : "default",
"ffp" : "10k=0.01,1m=0.03"
}
}
}
}
},
"mappings" : {
"post" : {
"properties" : {
"id" : { "type" : "long", "store" : "yes", "precision_step" : "0",
"postings_format" : "custom" },
"name" : { "type" : "string", "store" : "yes", "index" : "analyzed" },
"contents" : { "type" : "string", "store" : "no", "index" : "analyzed"
}
}
}
}
}'

Gives the following exception:

{"error":"IndexCreationException[[posts] failed to create index]; nested:
NoClassSettingsException[Failed to load class setting [type] with value
[bloom_default]]; nested:
ClassNotFoundException[org.elasticsearch.index.codec.postingsformat.bloomdefault.BloomDefaultPostingsFormatProvider];
","status":500}

*--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
ElasticSearch

The bloom codec needs to wrap another codec. Using "bloom" means
"maintain a bloom filter in memory" but doesn't specify how the data should
be stored on disk.

http://www.elasticsearch.org/guide/reference/index-modules/codec/

That said, it could throw a better error message

https://github.com/elasticsearch/elasticsearch/issues/2893

On Sat, Apr 13, 2013 at 11:34 PM, Rafał Kuć <r....@solr.pl <javascript:>

wrote:
Hello!

I've got a question about the postings format. When reading the
documentation we can see that there is a bloom posting format type.
However when trying to use it ElasticSearch throws an exception, for
example:

curl -XPOST 'localhost:9200/posts' -d '{
"settings" : {
"index" : {
"codec" : {
"postings_format" : {
"custom" : {
"type" : "bloom",
"delegate" : "pulsing"
}
}
}
}
},
"mappings" : {
"post" : {
"properties" : {
"id" : { "type" : "long", "store" : "yes", "precision_step" : "0",
"postings_format" : "custom" },
"name" : { "type" : "string", "store" : "yes", "index" : "analyzed" },
"contents" : { "type" : "string", "store" : "no", "index" : "analyzed"
}
}
}
}
}'

And the exception is as follows:
{"error":"IndexCreationException[[posts] failed to create index]; nested:
NoClassSettingsException[Failed to load class setting
[type] with value [bloom]]; nested:
ClassNotFoundException[org.elasticsearch.index.codec.postingsformat.bloom.BloomPostingsFormatProvider];
",
"status":500}

According to the code there we are allowed to use the bloom_default or
bloom_pulsing types, but not the bloom itself (at least as the
pre-configured ones). And of course when configuring the id field with
one of the mentioned postings format it works without any problem,
which can be seen in the mappings:

{
"posts" : {
"post" : {
"properties" : {
"contents" : {
"type" : "string"
},
"id" : {
"type" : "long",
"store" : true,
"postings_format" : "bloom_pulsing",
"precision_step" : 2147483647
},
"name" : {
"type" : "string",
"store" : true
}
}
}
}
}

Am I missing something when it comes to the bloom type ? I'm using
0.90.RC2. Thanks for the answer.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Just found... the class on ElasticSearch source is
BloomFilterPostingsFormatProvider, so the type to user is "bloom_filter".

The documentation should be updated accordingly.

On Friday, April 19, 2013 9:55:49 AM UTC-4, Jérôme Gagnon wrote:

I'm experiencing the same issue, so any updates would be appreciated,
thanks !

On Sunday, April 14, 2013 8:00:17 AM UTC-4, Rafał Kuć wrote:

Thanks Clinton,

I'm aware of the bloom_pulsing and bloom_default postings formats. I was
wondering if I'm missing something after looking at the docs at
http://www.elasticsearch.org/guide/reference/index-modules/codec/,http://www.elasticsearch.org/guide/reference/index-modules/codec/ because
of the type name "bloom". I thought one can use the "bloom" type when
defining a custom postings format and set the appropriate delegate, to for
example pulsing or default.

But now, I've got another question. Is it possible to use a custom bloom
filter based codec, like the bloom_default or bloom_pulsing ?

For example, the following request:

curl -XPOST 'localhost:9200/posts/' -d '{
"settings" : {
"index" : {
"codec" : {
"postings_format" : {
"custom" : {
"type" : "bloom_default",
"delegate" : "default",
"ffp" : "10k=0.01,1m=0.03"
}
}
}
}
},
"mappings" : {
"post" : {
"properties" : {
"id" : { "type" : "long", "store" : "yes", "precision_step" : "0",
"postings_format" : "custom" },
"name" : { "type" : "string", "store" : "yes", "index" : "analyzed" },
"contents" : { "type" : "string", "store" : "no", "index" :
"analyzed" }
}
}
}
}'

Gives the following exception:

{"error":"IndexCreationException[[posts] failed to create index]; nested:
NoClassSettingsException[Failed to load class setting [type] with value
[bloom_default]]; nested:
ClassNotFoundException[org.elasticsearch.index.codec.postingsformat.bloomdefault.BloomDefaultPostingsFormatProvider];
","status":500}

*--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
ElasticSearch

The bloom codec needs to wrap another codec. Using "bloom" means
"maintain a bloom filter in memory" but doesn't specify how the data should
be stored on disk.

http://www.elasticsearch.org/guide/reference/index-modules/codec/

That said, it could throw a better error message

https://github.com/elasticsearch/elasticsearch/issues/2893

On Sat, Apr 13, 2013 at 11:34 PM, Rafał Kuć r....@solr.pl wrote:
Hello!

I've got a question about the postings format. When reading the
documentation we can see that there is a bloom posting format type.
However when trying to use it ElasticSearch throws an exception, for
example:

curl -XPOST 'localhost:9200/posts' -d '{
"settings" : {
"index" : {
"codec" : {
"postings_format" : {
"custom" : {
"type" : "bloom",
"delegate" : "pulsing"
}
}
}
}
},
"mappings" : {
"post" : {
"properties" : {
"id" : { "type" : "long", "store" : "yes", "precision_step" : "0",
"postings_format" : "custom" },
"name" : { "type" : "string", "store" : "yes", "index" : "analyzed" },
"contents" : { "type" : "string", "store" : "no", "index" :
"analyzed" }
}
}
}
}'

And the exception is as follows:
{"error":"IndexCreationException[[posts] failed to create index]; nested:
NoClassSettingsException[Failed to load class setting
[type] with value [bloom]]; nested:
ClassNotFoundException[org.elasticsearch.index.codec.postingsformat.bloom.BloomPostingsFormatProvider];
",
"status":500}

According to the code there we are allowed to use the bloom_default or
bloom_pulsing types, but not the bloom itself (at least as the
pre-configured ones). And of course when configuring the id field with
one of the mentioned postings format it works without any problem,
which can be seen in the mappings:

{
"posts" : {
"post" : {
"properties" : {
"contents" : {
"type" : "string"
},
"id" : {
"type" : "long",
"store" : true,
"postings_format" : "bloom_pulsing",
"precision_step" : 2147483647
},
"name" : {
"type" : "string",
"store" : true
}
}
}
}
}

Am I missing something when it comes to the bloom type ? I'm using
0.90.RC2. Thanks for the answer.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
ElasticSearch

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey Rafal,

While we are on it;

Is it just me or the description of pulsing and bloom filter codec on this
page; http://elasticsearchserverbook.com/elasticsearch-0-90-using-codecs/
saying that it's appropriate for LOW cardinality field is wrong ? Pulsing
and bloom are good for id lookup, so that's high cardinality field. An id
field is normally the highest possible cardinality ? Maybe I'm wrong, but
if so I would like to know what I don't understand :slight_smile:

Jerome

On Friday, April 19, 2013 9:57:50 AM UTC-4, Jérôme Gagnon wrote:

Just found... the class on ElasticSearch source is
BloomFilterPostingsFormatProvider, so the type to user is "bloom_filter".

The documentation should be updated accordingly.

On Friday, April 19, 2013 9:55:49 AM UTC-4, Jérôme Gagnon wrote:

I'm experiencing the same issue, so any updates would be appreciated,
thanks !

On Sunday, April 14, 2013 8:00:17 AM UTC-4, Rafał Kuć wrote:

Thanks Clinton,

I'm aware of the bloom_pulsing and bloom_default postings formats. I was
wondering if I'm missing something after looking at the docs at
http://www.elasticsearch.org/guide/reference/index-modules/codec/,http://www.elasticsearch.org/guide/reference/index-modules/codec/ because
of the type name "bloom". I thought one can use the "bloom" type when
defining a custom postings format and set the appropriate delegate, to for
example pulsing or default.

But now, I've got another question. Is it possible to use a custom bloom
filter based codec, like the bloom_default or bloom_pulsing ?

For example, the following request:

curl -XPOST 'localhost:9200/posts/' -d '{
"settings" : {
"index" : {
"codec" : {
"postings_format" : {
"custom" : {
"type" : "bloom_default",
"delegate" : "default",
"ffp" : "10k=0.01,1m=0.03"
}
}
}
}
},
"mappings" : {
"post" : {
"properties" : {
"id" : { "type" : "long", "store" : "yes", "precision_step" : "0",
"postings_format" : "custom" },
"name" : { "type" : "string", "store" : "yes", "index" : "analyzed"
},
"contents" : { "type" : "string", "store" : "no", "index" :
"analyzed" }
}
}
}
}'

Gives the following exception:

{"error":"IndexCreationException[[posts] failed to create index];
nested: NoClassSettingsException[Failed to load class setting [type] with
value [bloom_default]]; nested:
ClassNotFoundException[org.elasticsearch.index.codec.postingsformat.bloomdefault.BloomDefaultPostingsFormatProvider];
","status":500}

*--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
ElasticSearch

The bloom codec needs to wrap another codec. Using "bloom" means
"maintain a bloom filter in memory" but doesn't specify how the data should
be stored on disk.

http://www.elasticsearch.org/guide/reference/index-modules/codec/

That said, it could throw a better error message

https://github.com/elasticsearch/elasticsearch/issues/2893

On Sat, Apr 13, 2013 at 11:34 PM, Rafał Kuć r....@solr.pl wrote:
Hello!

I've got a question about the postings format. When reading the
documentation we can see that there is a bloom posting format type.
However when trying to use it ElasticSearch throws an exception, for
example:

curl -XPOST 'localhost:9200/posts' -d '{
"settings" : {
"index" : {
"codec" : {
"postings_format" : {
"custom" : {
"type" : "bloom",
"delegate" : "pulsing"
}
}
}
}
},
"mappings" : {
"post" : {
"properties" : {
"id" : { "type" : "long", "store" : "yes", "precision_step" : "0",
"postings_format" : "custom" },
"name" : { "type" : "string", "store" : "yes", "index" : "analyzed"
},
"contents" : { "type" : "string", "store" : "no", "index" :
"analyzed" }
}
}
}
}'

And the exception is as follows:
{"error":"IndexCreationException[[posts] failed to create index];
nested: NoClassSettingsException[Failed to load class setting
[type] with value [bloom]]; nested:
ClassNotFoundException[org.elasticsearch.index.codec.postingsformat.bloom.BloomPostingsFormatProvider];
",
"status":500}

According to the code there we are allowed to use the bloom_default or
bloom_pulsing types, but not the bloom itself (at least as the
pre-configured ones). And of course when configuring the id field with
one of the mentioned postings format it works without any problem,
which can be seen in the mappings:

{
"posts" : {
"post" : {
"properties" : {
"contents" : {
"type" : "string"
},
"id" : {
"type" : "long",
"store" : true,
"postings_format" : "bloom_pulsing",
"precision_step" : 2147483647
},
"name" : {
"type" : "string",
"store" : true
}
}
}
}
}

Am I missing something when it comes to the bloom type ? I'm using
0.90.RC2. Thanks for the answer.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
ElasticSearch

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hello Jérôme!

Yes you are right, it should be for HIGH cardinality fields, so LOW frequent terms. I've updated the post - very big thanks for pointing that out !

--

Regards,

Rafał Kuć

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

Hey Rafal,

While we are on it;

Is it just me or the description of pulsing and bloom filter codec on this page; http://elasticsearchserverbook.com/elasticsearch-0-90-using-codecs/ saying that it's appropriate for LOW cardinality field is wrong ? Pulsing and bloom are good for id lookup, so that's high cardinality field. An id field is normally the highest possible cardinality ? Maybe I'm wrong, but if so I would like to know what I don't understand :slight_smile:

Jerome

On Friday, April 19, 2013 9:57:50 AM UTC-4, Jérôme Gagnon wrote:

Just found... the class on ElasticSearch source is BloomFilterPostingsFormatProvider, so the type to user is "bloom_filter".

The documentation should be updated accordingly.

On Friday, April 19, 2013 9:55:49 AM UTC-4, Jérôme Gagnon wrote:

I'm experiencing the same issue, so any updates would be appreciated, thanks !

On Sunday, April 14, 2013 8:00:17 AM UTC-4, Rafał Kuć wrote:

Thanks Clinton,

I'm aware of the bloom_pulsing and bloom_default postings formats. I was wondering if I'm missing something after looking at the docs at http://www.elasticsearch.org/guide/reference/index-modules/codec/, because of the type name "bloom". I thought one can use the "bloom" type when defining a custom postings format and set the appropriate delegate, to for example pulsing or default.

But now, I've got another question. Is it possible to use a custom bloom filter based codec, like the bloom_default or bloom_pulsing ?

For example, the following request:

curl -XPOST 'localhost:9200/posts/' -d '{

"settings" : {

"index" : {

"codec" : {

"postings_format" : {


 "custom" : {


  "type" : "bloom_default",


  "delegate" : "default",


  "ffp" : "10k=0.01,1m=0.03"


 } 


}

}

}

},

"mappings" : {

"post" : {

"properties" : {

"id" : { "type" : "long", "store" : "yes", "precision_step" : "0", "postings_format" : "custom" },


"name" : { "type" : "string", "store" : "yes", "index" : "analyzed" },


"contents" : { "type" : "string", "store" : "no", "index" : "analyzed" }

}

}

}

}'

Gives the following exception:

{"error":"IndexCreationException[[posts] failed to create index]; nested: NoClassSettingsException[Failed to load class setting [type] with value [bloom_default]]; nested: ClassNotFoundException[org.elasticsearch.index.codec.postingsformat.bloomdefault.BloomDefaultPostingsFormatProvider]; ","status":500}

--

Regards,

Rafał Kuć

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

The bloom codec needs to wrap another codec. Using "bloom" means "maintain a bloom filter in memory" but doesn't specify how the data should be stored on disk.

http://www.elasticsearch.org/guide/reference/index-modules/codec/

That said, it could throw a better error message

https://github.com/elasticsearch/elasticsearch/issues/2893

On Sat, Apr 13, 2013 at 11:34 PM, Rafał Kuć <r....@solr.pl> wrote:

Hello!

I've got a question about the postings format. When reading the

documentation we can see that there is a bloom posting format type.

However when trying to use it ElasticSearch throws an exception, for

example:

curl -XPOST 'localhost:9200/posts' -d '{

"settings" : {

"index" : {

"codec" : {

"postings_format" : {


 "custom" : {


  "type" : "bloom",


  "delegate" : "pulsing"


 }


}

}

}

},

"mappings" : {

"post" : {

"properties" : {

"id" : { "type" : "long", "store" : "yes", "precision_step" : "0", "postings_format" : "custom" },


"name" : { "type" : "string", "store" : "yes", "index" : "analyzed" },


"contents" : { "type" : "string", "store" : "no", "index" : "analyzed" }

}

}

}

}'

And the exception is as follows:

{"error":"IndexCreationException[[posts] failed to create index]; nested: NoClassSettingsException[Failed to load class setting

[type] with value [bloom]]; nested: ClassNotFoundException[org.elasticsearch.index.codec.postingsformat.bloom.BloomPostingsFormatProvider]; ",

"status":500}

According to the code there we are allowed to use the bloom_default or

bloom_pulsing types, but not the bloom itself (at least as the

pre-configured ones). And of course when configuring the id field with

one of the mentioned postings format it works without any problem,

which can be seen in the mappings:

{

"posts" : {

"post" : {


  "properties" : {


    "contents" : {


      "type" : "string"


    },


    "id" : {


      "type" : "long",


      "store" : true,


      "postings_format" : "bloom_pulsing",


      "precision_step" : 2147483647


    },


    "name" : {


      "type" : "string",


      "store" : true


    }


  }


}

}

}

Am I missing something when it comes to the bloom type ? I'm using

0.90.RC2. Thanks for the answer.

--

Regards,

Rafał Kuć

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

--

You received this message because you are subscribed to the Google Groups "elasticsearch" group.

To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--

You received this message because you are subscribed to the Google Groups "elasticsearch" group.

To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--

You received this message because you are subscribed to the Google Groups "elasticsearch" group.

To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.