Default codec


(aphalke) #1

Hello Team,

I am using elastic search 0.90.3. As per the documentation if I don't specify any codec mapping then default is taken. But what I have observed from yourkit memory snapshot is presence of objects of BloomFilterPostingFormat, which has delegate producer as BlockTreeTermsReader(which is default). So instead of default codec bloom_default codec is used. Is bloom_default is a default codec for each field?

https://lh3.googleusercontent.com/-ZCsYnSH-CSI/Um387Y-Us3I/AAAAAAAAEO0/hP3BBWgs1Ck/s1600/codec.png

Thanks,
Atul.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Adrien Grand) #2

Hi Atul,

Indeed, we don't exactly have the same defaults as Lucene and the
difference is that we add a bloom filter to the _uid field. The reason why
we do that is that the _uid field is unique in the index (by design) so
having bloom filters on top of the terms dictionary makes _uid lookups very
fast, which is important eg. for index requests since we need to check if
there is already document with the same _uid in the index.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(aphalke) #3

Thanks Adrien for the clarification.
Suppose if we are good with the performance overhead incurred due to not
using bloom filters. In this case do we have way to override this
functionality and use default posting format for _uid too? Reason I am
asking this is, for our application in simulated environment around 48% of
the memory is taken by BloomFilters.

Thanks,
Atul.

On Monday, 28 October 2013 14:04:53 UTC+5:30, Adrien Grand wrote:

Hi Atul,

Indeed, we don't exactly have the same defaults as Lucene and the
difference is that we add a bloom filter to the _uid field. The reason why
we do that is that the _uid field is unique in the index (by design) so
having bloom filters on top of the terms dictionary makes _uid lookups very
fast, which is important eg. for index requests since we need to check if
there is already document with the same _uid in the index.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(aphalke) #4

Hello Adrien,Team,
Do we have way to override the fuctionality and use default posting
format for _uid too?
Thanks in Advance.
Regards,
Atul.

On Monday, 28 October 2013 17:04:48 UTC+5:30, Atul Phalke wrote:

Thanks Adrien for the clarification.
Suppose if we are good with the performance overhead incurred due to not
using bloom filters. In this case do we have way to override this
functionality and use default posting format for _uid too? Reason I am
asking this is, for our application in simulated environment around 48% of
the memory is taken by BloomFilters.

Thanks,
Atul.

On Monday, 28 October 2013 14:04:53 UTC+5:30, Adrien Grand wrote:

Hi Atul,

Indeed, we don't exactly have the same defaults as Lucene and the
difference is that we add a bloom filter to the _uid field. The reason why
we do that is that the _uid field is unique in the index (by design) so
having bloom filters on top of the terms dictionary makes _uid lookups very
fast, which is important eg. for index requests since we need to check if
there is already document with the same _uid in the index.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(prasanna sivanandam) #5

Adrien,

We are indexing read only data using ES. So we won't do any update on the
indexed data. Is it possible to avoid storing _uid field in the indexes.

Prasanna

On Monday, October 28, 2013 2:04:53 PM UTC+5:30, Adrien Grand wrote:

Hi Atul,

Indeed, we don't exactly have the same defaults as Lucene and the
difference is that we add a bloom filter to the _uid field. The reason why
we do that is that the _uid field is unique in the index (by design) so
having bloom filters on top of the terms dictionary makes _uid lookups very
fast, which is important eg. for index requests since we need to check if
there is already document with the same _uid in the index.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(simonw-2) #6

you actually can configure the lucene default for this field. you need to
define a customer posting format like this:

curl -XPUT 'http://localhost:9200/indexname/' -d '{
"settings" : {
"index" : {
"codec" : {
"postings_format" : {
"default_no_bloom" : {
"type" : "default"
}
}
}
}
}
}'

And then use it in the mapping like this:

{
"type" : {
"_uid" : {
"postings_format" : "default_no_bloom"
}
}
}

That should be it

simon

On Monday, November 11, 2013 8:51:29 AM UTC+1, prasanna wrote:

Adrien,

We are indexing read only data using ES. So we won't do any update on the
indexed data. Is it possible to avoid storing _uid field in the indexes.

Prasanna

On Monday, October 28, 2013 2:04:53 PM UTC+5:30, Adrien Grand wrote:

Hi Atul,

Indeed, we don't exactly have the same defaults as Lucene and the
difference is that we add a bloom filter to the _uid field. The reason why
we do that is that the _uid field is unique in the index (by design) so
having bloom filters on top of the terms dictionary makes _uid lookups very
fast, which is important eg. for index requests since we need to check if
there is already document with the same _uid in the index.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Anantha Govindarajan) #7

Hi ,

I have added index.codec.postings_format.my_format.type:default in
elasticsearch.yml and "_uid" : {"postings_format" : "my_format"} in
my default-mapping.json file.

Still _es090_0.blm files getting created. How do i achieve default posting
format for _uid field.

Atul , We are facing the same problem , could you solve the issue ? any
alternatives ? If so help me to resolve this case.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(aphalke) #8

Hi Simon,
Thanks for the reply. As Anantha mentioned its not working for
_uid field. After analysis looks like bloom filter codec is default for
_uid field and we can not overwrite that.
Anantha,
We were trying to resolve this issue to reduce the memory footprint.
As a alternative we are thinking of keeping open only limited number of
indices. We have to dynamically open and close indices depending upon
search request. Do you have any other option other than this?

Thanks,
Atul.

On Tuesday, 26 November 2013 15:01:40 UTC+5:30, Anantha Govindarajan wrote:

Hi ,

I have added index.codec.postings_format.my_format.type:default in
elasticsearch.yml and "_uid" : {"postings_format" : "my_format"} in
my default-mapping.json file.

Still _es090_0.blm files getting created. How do i achieve default posting
format for _uid field.

Atul , We are facing the same problem , could you solve the issue ? any
alternatives ? If so help me to resolve this case.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ecb229d7-fef6-4e32-8c17-15aa3e6e28cd%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Anantha Govindarajan) #9

Hi Atul ,

Thanks for your reply . Actually it works nicely. We need to verify like
that by using luke tool. Kindly go through the following post ,

https://groups.google.com/forum/#!searchin/elasticsearch/luke/elasticsearch/Oi3PQqgFphQ/xoRfb54DjrQJ.

I went through the above link and constructed the jar , which i have
attached , you can make use of it.

In luke tool -> Commits tab , see (A)ttributes,D,C,F infos of selected
field where it shows all the available fields posting format . By default
it show es090 format for all fields though bloom filter applied for _uid
field alone. So my advice is,

  • Dont change the 2 configurations which i mentioned, run as usually
    verify luke tool - there will be es090 posting format for all fields
  • Now add 2 conf and change my_format for all fields now verify - there
    wont be no es090 files in your index.

Thanks Simon for your advice.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/aadcd86b-7c47-4440-9b73-d8c8ab71056c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(simonw-2) #10

thanks for reporting back!

On Friday, November 29, 2013 1:48:57 PM UTC+1, Anantha Govindarajan wrote:

Hi Atul ,

Thanks for your reply . Actually it works nicely. We need to verify like
that by using luke tool. Kindly go through the following post ,

https://groups.google.com/forum/#!searchin/elasticsearch/luke/elasticsearch/Oi3PQqgFphQ/xoRfb54DjrQJ
.

I went through the above link and constructed the jar , which i have
attached , you can make use of it.

In luke tool -> Commits tab , see (A)ttributes,D,C,F infos of selected
field where it shows all the available fields posting format . By default
it show es090 format for all fields though bloom filter applied for _uid
field alone. So my advice is,

  • Dont change the 2 configurations which i mentioned, run as usually
    verify luke tool - there will be es090 posting format for all fields
  • Now add 2 conf and change my_format for all fields now verify -
    there wont be no es090 files in your index.

Thanks Simon for your advice.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/581cbd9b-e133-4db1-8ee1-f4b358663a21%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #11