Elasticsearch index MUCH larger then similar lucene index

I really want to tack this down guys, are you comparing an optimized
(single segment) index or multisegement index? And are you comparing single
shard single replica? You also need to note that ES enables positions
(proximity) by default while lucene only stores docs and frequencies but no
positions which can be a massive difference. Can you guys check that and
report the individual file differences?

simon

On Thursday, May 23, 2013 4:54:05 PM UTC+2, Jérôme Gagnon wrote:

+1 on that, we couldn't do much about it, we just hope that this doesn't
affect the disk IO performance...

On Thursday, May 23, 2013 10:34:38 AM UTC-4, Ivan Brusic wrote:

Just wanted to add that I always encountered the same issue with
Elasticsearch. Indices are almost twice as big despite aggressive trimming.
I have simply come to accept the issue as a fact and moved on. :slight_smile:

--
Ivan

On Wed, May 22, 2013 at 12:35 PM, simonw simon.w...@elasticsearch.comwrote:

I suggest you provide your lucene FieldTypes and your mapping, run your
indexing against lucene and a single shard no-replica Elasticsearch
instance. Then optimize the index and provide the output of ls -al on the
index directory. it would also be interesting what exactly is "much
larger".

simon

On Wednesday, May 22, 2013 8:27:05 PM UTC+2, Matt Weber wrote:

Really we are just shooting in the dark here because of lack of
information:

What version of ES? What version of lucene? What does your lucene
index settings (tokenizer, analyzers, etc) look like? Have you configured
an ES mapping identical to what you use in lucene? How are you measuring
your index size? Have your tried indexing a single document in lucene and
ES and comparing the resulting index size?

Gist us your mapping (not the clojure version) , custom analyzer
settings, index settings, etc and we might be able to figure this out for
you.

Thanks,
Matt Weber

On Wed, May 22, 2013 at 10:44 AM, Shlomi shlomi...@gmail.com wrote:

Hey,

Thanks for replying, ngram is the name of the field, and is
pre-computed:

Jörg - I think i might have misled you, i am not using the ngram
tokenizer, ":ngram-index" is a custom tokenizer that uses "lowercase"
tokenizer, and a list of stopwords.

David - Thanks for the suggestion, but yeah, my code fails if the
index exists before it runs, this way i am sure the index was in fact
deleted..

Mark - I tried with both a single shard and the default 5 shards.
there was no different in size (surprisingly.. )

thanks for all your responses, but we have to keep thinking.. :slight_smile:

On Wednesday, May 22, 2013 5:22:53 PM UTC+3, Jörg Prante wrote:

You are using ngram tokenizer which explodes index size. If you use
ES
default sharding, you have 5 shards (and therefore, 5 Lucene
indexes).
With ngram, you have scattered tokens over all shards, and this
converges to 5x the space compared to 1 shard.

Also, store = yes for each field is kind of clumsy. You have to
enable
each field to get them returned for a query (only _source is returned
by
default). I don't see much sense in making an ngram analyzed field
stored. Can you elaborate?

Jörg

Am 22.05.13 11:08, schrieb Shlomi:

does ES store its numeric fields as strings?

can someone confirm that if you disable _source and keep each field
as
stored and indexed, your fields becomes invisible (although
queriable)? or am i doing something totally wrong?..

Thanks

On Tuesday, May 21, 2013 7:10:07 PM UTC+3, Shlomi wrote:

here is a fraction of the mapping i have (i use clojure so its 

a

bit different from json, but its essentially the same): 

           {:test  { 
                     :_source {:enabled "false" } 
                     :_all    {:enabled "false" } 
                     :properties {:gram  {:type "string" :store 
"yes" :analyzer :ngram-index :compress "true"} 
                                      :freq    {:type "long" 
:store "yes"} }}}] 

On Tuesday, May 21, 2013 7:07:44 PM UTC+3, Shlomi wrote: 

    Hey, 

    thanks all, let me reply: 

    Michael - no, i set replicas to 0 (if that what you 

meant..)

    Itamar & Matt - i disabled _all and _source, and explicitly 
    set "store" to "yes" for both fields (i dont care about 

perf

    for now..) - with this setting i still got a much larger 

size

    and was still unable to see the fields (although i set 

store

    to yes) through queries (only got id's back) 

    On Tuesday, May 21, 2013 7:03:19 PM UTC+3, Matt Weber 

wrote:

        Don't forget about the _all field.  Also, if you don't 
        store the source, you need to explicitly set "store" to 
        yes on your field mappings so you can have them 

returned

        in the results. 


        On Tue, May 21, 2013 at 8:59 AM, Shlomi 
        <shlomi...@gmail.com> wrote: 

            yes, so i was trying to exclude source, but then 
            queries didnt return anything besides id. but in 

any

            case, even disabling source still gave me a large 

index..

            any way to tell it to save just the fields? 


            On Tuesday, May 21, 2013 6:54:38 PM UTC+3, Itamar 
            Syn-Hershko wrote: 

                Yes, because ES stores the entire source by 

default

                On Tue, May 21, 2013 at 6:53 PM, Shlomi 
                <shlomi...@gmail.com> wrote: 

                    Hey, 

                    We have some old java code that uses lucene 
                    and grizzly to serve queries over text. we 
                    have two field, a string field and a 

numeric

                    (long) field. the indexing code is pretty 
                    straight forward. 

                    I was trying to migrate this to elastic, 
                    pretty simple configuration, and indexed 

the

                    same data. 

                    the java based implementation took about 

6gb,

                    while to elastic took 17gb.. 

                    does this makes sense? what could i do 

about

                    this? 

                    Thanks! 


                    -- 
                    You received this message because you are 
                    subscribed to the Google Groups 
                    "elasticsearch" group. 
                    To unsubscribe from this group and stop 
                    receiving emails from it, send an email to 
                    elasticsearc...@googlegroups.**c**om. 

                    For more options, visit 
                    https://groups.google.com/**grou**

ps/opt_out https://groups.google.com/groups/opt_out

                    <https://groups.google.com/**gro**

ups/opt_out https://groups.google.com/groups/opt_out>.

            -- 
            You received this message because you are 

subscribed

            to the Google Groups "elasticsearch" group. 
            To unsubscribe from this group and stop receiving 
            emails from it, send an email to 
            elasticsearc...@googlegroups.**c**om. 
            For more options, visit 
            https://groups.google.com/**grou**ps/opt_out<https://groups.google.com/groups/opt_out> 
            <https://groups.google.com/**gro**ups/opt_out<https://groups.google.com/groups/opt_out>>. 

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send
an email to elasticsearc...@**googlegroups.com.
For more options, visit https://groups.google.com/**grou

ps/opt_out https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Don't give up! This does matter and does affect performance (think disk
reads, think OS cache). There is _source, _all, compression, and other
factors that will affect index size, so it would be great to nail this down.

Otis

ELASTICSEARCH Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Servicehttp://sematext.com/spm/index.html
Search Analytics - Cloud Monitoring Tools & Services | Sematexthttp://sematext.com/search-analytics/index.html

On Thursday, May 23, 2013 10:54:05 AM UTC-4, Jérôme Gagnon wrote:

+1 on that, we couldn't do much about it, we just hope that this doesn't
affect the disk IO performance...

On Thursday, May 23, 2013 10:34:38 AM UTC-4, Ivan Brusic wrote:

Just wanted to add that I always encountered the same issue with
Elasticsearch. Indices are almost twice as big despite aggressive trimming.
I have simply come to accept the issue as a fact and moved on. :slight_smile:

--
Ivan

On Wed, May 22, 2013 at 12:35 PM, simonw simon.w...@elasticsearch.comwrote:

I suggest you provide your lucene FieldTypes and your mapping, run your
indexing against lucene and a single shard no-replica Elasticsearch
instance. Then optimize the index and provide the output of ls -al on the
index directory. it would also be interesting what exactly is "much
larger".

simon

On Wednesday, May 22, 2013 8:27:05 PM UTC+2, Matt Weber wrote:

Really we are just shooting in the dark here because of lack of
information:

What version of ES? What version of lucene? What does your lucene
index settings (tokenizer, analyzers, etc) look like? Have you configured
an ES mapping identical to what you use in lucene? How are you measuring
your index size? Have your tried indexing a single document in lucene and
ES and comparing the resulting index size?

Gist us your mapping (not the clojure version) , custom analyzer
settings, index settings, etc and we might be able to figure this out for
you.

Thanks,
Matt Weber

On Wed, May 22, 2013 at 10:44 AM, Shlomi shlomi...@gmail.com wrote:

Hey,

Thanks for replying, ngram is the name of the field, and is
pre-computed:

Jörg - I think i might have misled you, i am not using the ngram
tokenizer, ":ngram-index" is a custom tokenizer that uses "lowercase"
tokenizer, and a list of stopwords.

David - Thanks for the suggestion, but yeah, my code fails if the
index exists before it runs, this way i am sure the index was in fact
deleted..

Mark - I tried with both a single shard and the default 5 shards.
there was no different in size (surprisingly.. )

thanks for all your responses, but we have to keep thinking.. :slight_smile:

On Wednesday, May 22, 2013 5:22:53 PM UTC+3, Jörg Prante wrote:

You are using ngram tokenizer which explodes index size. If you use
ES
default sharding, you have 5 shards (and therefore, 5 Lucene
indexes).
With ngram, you have scattered tokens over all shards, and this
converges to 5x the space compared to 1 shard.

Also, store = yes for each field is kind of clumsy. You have to
enable
each field to get them returned for a query (only _source is returned
by
default). I don't see much sense in making an ngram analyzed field
stored. Can you elaborate?

Jörg

Am 22.05.13 11:08, schrieb Shlomi:

does ES store its numeric fields as strings?

can someone confirm that if you disable _source and keep each field
as
stored and indexed, your fields becomes invisible (although
queriable)? or am i doing something totally wrong?..

Thanks

On Tuesday, May 21, 2013 7:10:07 PM UTC+3, Shlomi wrote:

here is a fraction of the mapping i have (i use clojure so its 

a

bit different from json, but its essentially the same): 

           {:test  { 
                     :_source {:enabled "false" } 
                     :_all    {:enabled "false" } 
                     :properties {:gram  {:type "string" :store 
"yes" :analyzer :ngram-index :compress "true"} 
                                      :freq    {:type "long" 
:store "yes"} }}}] 

On Tuesday, May 21, 2013 7:07:44 PM UTC+3, Shlomi wrote: 

    Hey, 

    thanks all, let me reply: 

    Michael - no, i set replicas to 0 (if that what you 

meant..)

    Itamar & Matt - i disabled _all and _source, and explicitly 
    set "store" to "yes" for both fields (i dont care about 

perf

    for now..) - with this setting i still got a much larger 

size

    and was still unable to see the fields (although i set 

store

    to yes) through queries (only got id's back) 

    On Tuesday, May 21, 2013 7:03:19 PM UTC+3, Matt Weber 

wrote:

        Don't forget about the _all field.  Also, if you don't 
        store the source, you need to explicitly set "store" to 
        yes on your field mappings so you can have them 

returned

        in the results. 


        On Tue, May 21, 2013 at 8:59 AM, Shlomi 
        <shlomi...@gmail.com> wrote: 

            yes, so i was trying to exclude source, but then 
            queries didnt return anything besides id. but in 

any

            case, even disabling source still gave me a large 

index..

            any way to tell it to save just the fields? 


            On Tuesday, May 21, 2013 6:54:38 PM UTC+3, Itamar 
            Syn-Hershko wrote: 

                Yes, because ES stores the entire source by 

default

                On Tue, May 21, 2013 at 6:53 PM, Shlomi 
                <shlomi...@gmail.com> wrote: 

                    Hey, 

                    We have some old java code that uses lucene 
                    and grizzly to serve queries over text. we 
                    have two field, a string field and a 

numeric

                    (long) field. the indexing code is pretty 
                    straight forward. 

                    I was trying to migrate this to elastic, 
                    pretty simple configuration, and indexed 

the

                    same data. 

                    the java based implementation took about 

6gb,

                    while to elastic took 17gb.. 

                    does this makes sense? what could i do 

about

                    this? 

                    Thanks! 


                    -- 
                    You received this message because you are 
                    subscribed to the Google Groups 
                    "elasticsearch" group. 
                    To unsubscribe from this group and stop 
                    receiving emails from it, send an email to 
                    elasticsearc...@googlegroups.**c**om. 

                    For more options, visit 
                    https://groups.google.com/**grou**

ps/opt_out https://groups.google.com/groups/opt_out

                    <https://groups.google.com/**gro**

ups/opt_out https://groups.google.com/groups/opt_out>.

            -- 
            You received this message because you are 

subscribed

            to the Google Groups "elasticsearch" group. 
            To unsubscribe from this group and stop receiving 
            emails from it, send an email to 
            elasticsearc...@googlegroups.**c**om. 
            For more options, visit 
            https://groups.google.com/**grou**ps/opt_out<https://groups.google.com/groups/opt_out> 
            <https://groups.google.com/**gro**ups/opt_out<https://groups.google.com/groups/opt_out>>. 

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send
an email to elasticsearc...@**googlegroups.com.
For more options, visit https://groups.google.com/**grou

ps/opt_out https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey,
I am trying to pin-point the difference between the two implementations,
still working on replying to Matt and Simon. I am using Luke to see inside
the indices.

as soon as I have more complete results ill post them here..

On Friday, May 24, 2013 7:19:24 PM UTC+3, Otis Gospodnetic wrote:

Don't give up! This does matter and does affect performance (think disk
reads, think OS cache). There is _source, _all, compression, and other
factors that will affect index size, so it would be great to nail this down.

Otis

ELASTICSEARCH Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Servicehttp://sematext.com/spm/index.html
Search Analytics - Cloud Monitoring Tools & Services | Sematexthttp://sematext.com/search-analytics/index.html

On Thursday, May 23, 2013 10:54:05 AM UTC-4, Jérôme Gagnon wrote:

+1 on that, we couldn't do much about it, we just hope that this doesn't
affect the disk IO performance...

On Thursday, May 23, 2013 10:34:38 AM UTC-4, Ivan Brusic wrote:

Just wanted to add that I always encountered the same issue with
Elasticsearch. Indices are almost twice as big despite aggressive trimming.
I have simply come to accept the issue as a fact and moved on. :slight_smile:

--
Ivan

On Wed, May 22, 2013 at 12:35 PM, simonw simon.w...@elasticsearch.comwrote:

I suggest you provide your lucene FieldTypes and your mapping, run your
indexing against lucene and a single shard no-replica Elasticsearch
instance. Then optimize the index and provide the output of ls -al on the
index directory. it would also be interesting what exactly is "much
larger".

simon

On Wednesday, May 22, 2013 8:27:05 PM UTC+2, Matt Weber wrote:

Really we are just shooting in the dark here because of lack of
information:

What version of ES? What version of lucene? What does your lucene
index settings (tokenizer, analyzers, etc) look like? Have you configured
an ES mapping identical to what you use in lucene? How are you measuring
your index size? Have your tried indexing a single document in lucene and
ES and comparing the resulting index size?

Gist us your mapping (not the clojure version) , custom analyzer
settings, index settings, etc and we might be able to figure this out for
you.

Thanks,
Matt Weber

On Wed, May 22, 2013 at 10:44 AM, Shlomi shlomi...@gmail.com wrote:

Hey,

Thanks for replying, ngram is the name of the field, and is
pre-computed:

Jörg - I think i might have misled you, i am not using the ngram
tokenizer, ":ngram-index" is a custom tokenizer that uses
"lowercase" tokenizer, and a list of stopwords.

David - Thanks for the suggestion, but yeah, my code fails if the
index exists before it runs, this way i am sure the index was in fact
deleted..

Mark - I tried with both a single shard and the default 5 shards.
there was no different in size (surprisingly.. )

thanks for all your responses, but we have to keep thinking.. :slight_smile:

On Wednesday, May 22, 2013 5:22:53 PM UTC+3, Jörg Prante wrote:

You are using ngram tokenizer which explodes index size. If you use
ES
default sharding, you have 5 shards (and therefore, 5 Lucene
indexes).
With ngram, you have scattered tokens over all shards, and this
converges to 5x the space compared to 1 shard.

Also, store = yes for each field is kind of clumsy. You have to
enable
each field to get them returned for a query (only _source is
returned by
default). I don't see much sense in making an ngram analyzed field
stored. Can you elaborate?

Jörg

Am 22.05.13 11:08, schrieb Shlomi:

does ES store its numeric fields as strings?

can someone confirm that if you disable _source and keep each
field as
stored and indexed, your fields becomes invisible (although
queriable)? or am i doing something totally wrong?..

Thanks

On Tuesday, May 21, 2013 7:10:07 PM UTC+3, Shlomi wrote:

here is a fraction of the mapping i have (i use clojure so its 

a

bit different from json, but its essentially the same): 

           {:test  { 
                     :_source {:enabled "false" } 
                     :_all    {:enabled "false" } 
                     :properties {:gram  {:type "string" 

:store

"yes" :analyzer :ngram-index :compress "true"} 
                                      :freq    {:type "long" 
:store "yes"} }}}] 

On Tuesday, May 21, 2013 7:07:44 PM UTC+3, Shlomi wrote: 

    Hey, 

    thanks all, let me reply: 

    Michael - no, i set replicas to 0 (if that what you 

meant..)

    Itamar & Matt - i disabled _all and _source, and 

explicitly

    set "store" to "yes" for both fields (i dont care about 

perf

    for now..) - with this setting i still got a much larger 

size

    and was still unable to see the fields (although i set 

store

    to yes) through queries (only got id's back) 

    On Tuesday, May 21, 2013 7:03:19 PM UTC+3, Matt Weber 

wrote:

        Don't forget about the _all field.  Also, if you don't 
        store the source, you need to explicitly set "store" 

to

        yes on your field mappings so you can have them 

returned

        in the results. 


        On Tue, May 21, 2013 at 8:59 AM, Shlomi 
        <shlomi...@gmail.com> wrote: 

            yes, so i was trying to exclude source, but then 
            queries didnt return anything besides id. but in 

any

            case, even disabling source still gave me a large 

index..

            any way to tell it to save just the fields? 


            On Tuesday, May 21, 2013 6:54:38 PM UTC+3, Itamar 
            Syn-Hershko wrote: 

                Yes, because ES stores the entire source by 

default

                On Tue, May 21, 2013 at 6:53 PM, Shlomi 
                <shlomi...@gmail.com> wrote: 

                    Hey, 

                    We have some old java code that uses 

lucene

                    and grizzly to serve queries over text. we 
                    have two field, a string field and a 

numeric

                    (long) field. the indexing code is pretty 
                    straight forward. 

                    I was trying to migrate this to elastic, 
                    pretty simple configuration, and indexed 

the

                    same data. 

                    the java based implementation took about 

6gb,

                    while to elastic took 17gb.. 

                    does this makes sense? what could i do 

about

                    this? 

                    Thanks! 


                    -- 
                    You received this message because you are 
                    subscribed to the Google Groups 
                    "elasticsearch" group. 
                    To unsubscribe from this group and stop 
                    receiving emails from it, send an email to 
                    elasticsearc...@googlegroups.**c**om. 

                    For more options, visit 
                    https://groups.google.com/**grou**

ps/opt_out https://groups.google.com/groups/opt_out

                    <https://groups.google.com/**gro**

ups/opt_out https://groups.google.com/groups/opt_out>.

            -- 
            You received this message because you are 

subscribed

            to the Google Groups "elasticsearch" group. 
            To unsubscribe from this group and stop receiving 
            emails from it, send an email to 
            elasticsearc...@googlegroups.**c**om. 
            For more options, visit 
            https://groups.google.com/**grou**ps/opt_out<https://groups.google.com/groups/opt_out> 
            <https://groups.google.com/**gro**ups/opt_out<https://groups.google.com/groups/opt_out>>. 

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send
an email to elasticsearc...@**googlegroups.com.
For more options, visit https://groups.google.com/**grou

ps/opt_out https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@**googlegroups.com.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

thanks! I'd be really happy to see the outcome!

On Sunday, May 26, 2013 12:06:38 PM UTC+2, Shlomi wrote:

Hey,
I am trying to pin-point the difference between the two implementations,
still working on replying to Matt and Simon. I am using Luke to see inside
the indices.

as soon as I have more complete results ill post them here..

On Friday, May 24, 2013 7:19:24 PM UTC+3, Otis Gospodnetic wrote:

Don't give up! This does matter and does affect performance (think disk
reads, think OS cache). There is _source, _all, compression, and other
factors that will affect index size, so it would be great to nail this down.

Otis

ELASTICSEARCH Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Servicehttp://sematext.com/spm/index.html
Search Analytics - Cloud Monitoring Tools & Services | Sematexthttp://sematext.com/search-analytics/index.html

On Thursday, May 23, 2013 10:54:05 AM UTC-4, Jérôme Gagnon wrote:

+1 on that, we couldn't do much about it, we just hope that this doesn't
affect the disk IO performance...

On Thursday, May 23, 2013 10:34:38 AM UTC-4, Ivan Brusic wrote:

Just wanted to add that I always encountered the same issue with
Elasticsearch. Indices are almost twice as big despite aggressive trimming.
I have simply come to accept the issue as a fact and moved on. :slight_smile:

--
Ivan

On Wed, May 22, 2013 at 12:35 PM, simonw simon.w...@elasticsearch.comwrote:

I suggest you provide your lucene FieldTypes and your mapping, run
your indexing against lucene and a single shard no-replica Elasticsearch
instance. Then optimize the index and provide the output of ls -al on the
index directory. it would also be interesting what exactly is "much
larger".

simon

On Wednesday, May 22, 2013 8:27:05 PM UTC+2, Matt Weber wrote:

Really we are just shooting in the dark here because of lack of
information:

What version of ES? What version of lucene? What does your lucene
index settings (tokenizer, analyzers, etc) look like? Have you configured
an ES mapping identical to what you use in lucene? How are you measuring
your index size? Have your tried indexing a single document in lucene and
ES and comparing the resulting index size?

Gist us your mapping (not the clojure version) , custom analyzer
settings, index settings, etc and we might be able to figure this out for
you.

Thanks,
Matt Weber

On Wed, May 22, 2013 at 10:44 AM, Shlomi shlomi...@gmail.com wrote:

Hey,

Thanks for replying, ngram is the name of the field, and is
pre-computed:

Jörg - I think i might have misled you, i am not using the ngram
tokenizer, ":ngram-index" is a custom tokenizer that uses
"lowercase" tokenizer, and a list of stopwords.

David - Thanks for the suggestion, but yeah, my code fails if the
index exists before it runs, this way i am sure the index was in fact
deleted..

Mark - I tried with both a single shard and the default 5 shards.
there was no different in size (surprisingly.. )

thanks for all your responses, but we have to keep thinking.. :slight_smile:

On Wednesday, May 22, 2013 5:22:53 PM UTC+3, Jörg Prante wrote:

You are using ngram tokenizer which explodes index size. If you use
ES
default sharding, you have 5 shards (and therefore, 5 Lucene
indexes).
With ngram, you have scattered tokens over all shards, and this
converges to 5x the space compared to 1 shard.

Also, store = yes for each field is kind of clumsy. You have to
enable
each field to get them returned for a query (only _source is
returned by
default). I don't see much sense in making an ngram analyzed field
stored. Can you elaborate?

Jörg

Am 22.05.13 11:08, schrieb Shlomi:

does ES store its numeric fields as strings?

can someone confirm that if you disable _source and keep each
field as
stored and indexed, your fields becomes invisible (although
queriable)? or am i doing something totally wrong?..

Thanks

On Tuesday, May 21, 2013 7:10:07 PM UTC+3, Shlomi wrote:

here is a fraction of the mapping i have (i use clojure so 

its a

bit different from json, but its essentially the same): 

           {:test  { 
                     :_source {:enabled "false" } 
                     :_all    {:enabled "false" } 
                     :properties {:gram  {:type "string" 

:store

"yes" :analyzer :ngram-index :compress "true"} 
                                      :freq    {:type "long" 
:store "yes"} }}}] 

On Tuesday, May 21, 2013 7:07:44 PM UTC+3, Shlomi wrote: 

    Hey, 

    thanks all, let me reply: 

    Michael - no, i set replicas to 0 (if that what you 

meant..)

    Itamar & Matt - i disabled _all and _source, and 

explicitly

    set "store" to "yes" for both fields (i dont care about 

perf

    for now..) - with this setting i still got a much larger 

size

    and was still unable to see the fields (although i set 

store

    to yes) through queries (only got id's back) 

    On Tuesday, May 21, 2013 7:03:19 PM UTC+3, Matt Weber 

wrote:

        Don't forget about the _all field.  Also, if you 

don't

        store the source, you need to explicitly set "store" 

to

        yes on your field mappings so you can have them 

returned

        in the results. 


        On Tue, May 21, 2013 at 8:59 AM, Shlomi 
        <shlomi...@gmail.com> wrote: 

            yes, so i was trying to exclude source, but then 
            queries didnt return anything besides id. but in 

any

            case, even disabling source still gave me a large 

index..

            any way to tell it to save just the fields? 


            On Tuesday, May 21, 2013 6:54:38 PM UTC+3, Itamar 
            Syn-Hershko wrote: 

                Yes, because ES stores the entire source by 

default

                On Tue, May 21, 2013 at 6:53 PM, Shlomi 
                <shlomi...@gmail.com> wrote: 

                    Hey, 

                    We have some old java code that uses 

lucene

                    and grizzly to serve queries over text. 

we

                    have two field, a string field and a 

numeric

                    (long) field. the indexing code is pretty 
                    straight forward. 

                    I was trying to migrate this to elastic, 
                    pretty simple configuration, and indexed 

the

                    same data. 

                    the java based implementation took about 

6gb,

                    while to elastic took 17gb.. 

                    does this makes sense? what could i do 

about

                    this? 

                    Thanks! 


                    -- 
                    You received this message because you are 
                    subscribed to the Google Groups 
                    "elasticsearch" group. 
                    To unsubscribe from this group and stop 
                    receiving emails from it, send an email 

to

                    elasticsearc...@googlegroups.**c**om. 

                    For more options, visit 
                    https://groups.google.com/**grou**

ps/opt_out https://groups.google.com/groups/opt_out

                    <https://groups.google.com/**gro**

ups/opt_out https://groups.google.com/groups/opt_out>.

            -- 
            You received this message because you are 

subscribed

            to the Google Groups "elasticsearch" group. 
            To unsubscribe from this group and stop receiving 
            emails from it, send an email to 
            elasticsearc...@googlegroups.**c**om. 
            For more options, visit 
            https://groups.google.com/**grou**ps/opt_out<https://groups.google.com/groups/opt_out> 
            <https://groups.google.com/**gro**ups/opt_out<https://groups.google.com/groups/opt_out>>. 

--
You received this message because you are subscribed to the
Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send
an email to elasticsearc...@**googlegroups.com.
For more options, visit https://groups.google.com/**grou

ps/opt_out https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@**googlegroups.com.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

That is exactly what I was talking about. Actually I saw an improvement
going from 0.20.x to 0.90.x which is great ! I'm also waiting for the
outcome

On Friday, May 24, 2013 12:19:24 PM UTC-4, Otis Gospodnetic wrote:

Don't give up! This does matter and does affect performance (think disk
reads, think OS cache). There is _source, _all, compression, and other
factors that will affect index size, so it would be great to nail this down.

Otis

ELASTICSEARCH Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Servicehttp://sematext.com/spm/index.html
Search Analytics - Cloud Monitoring Tools & Services | Sematexthttp://sematext.com/search-analytics/index.html

On Thursday, May 23, 2013 10:54:05 AM UTC-4, Jérôme Gagnon wrote:

+1 on that, we couldn't do much about it, we just hope that this doesn't
affect the disk IO performance...

On Thursday, May 23, 2013 10:34:38 AM UTC-4, Ivan Brusic wrote:

Just wanted to add that I always encountered the same issue with
Elasticsearch. Indices are almost twice as big despite aggressive trimming.
I have simply come to accept the issue as a fact and moved on. :slight_smile:

--
Ivan

On Wed, May 22, 2013 at 12:35 PM, simonw simon.w...@elasticsearch.comwrote:

I suggest you provide your lucene FieldTypes and your mapping, run your
indexing against lucene and a single shard no-replica Elasticsearch
instance. Then optimize the index and provide the output of ls -al on the
index directory. it would also be interesting what exactly is "much
larger".

simon

On Wednesday, May 22, 2013 8:27:05 PM UTC+2, Matt Weber wrote:

Really we are just shooting in the dark here because of lack of
information:

What version of ES? What version of lucene? What does your lucene
index settings (tokenizer, analyzers, etc) look like? Have you configured
an ES mapping identical to what you use in lucene? How are you measuring
your index size? Have your tried indexing a single document in lucene and
ES and comparing the resulting index size?

Gist us your mapping (not the clojure version) , custom analyzer
settings, index settings, etc and we might be able to figure this out for
you.

Thanks,
Matt Weber

On Wed, May 22, 2013 at 10:44 AM, Shlomi shlomi...@gmail.com wrote:

Hey,

Thanks for replying, ngram is the name of the field, and is
pre-computed:

Jörg - I think i might have misled you, i am not using the ngram
tokenizer, ":ngram-index" is a custom tokenizer that uses
"lowercase" tokenizer, and a list of stopwords.

David - Thanks for the suggestion, but yeah, my code fails if the
index exists before it runs, this way i am sure the index was in fact
deleted..

Mark - I tried with both a single shard and the default 5 shards.
there was no different in size (surprisingly.. )

thanks for all your responses, but we have to keep thinking.. :slight_smile:

On Wednesday, May 22, 2013 5:22:53 PM UTC+3, Jörg Prante wrote:

You are using ngram tokenizer which explodes index size. If you use
ES
default sharding, you have 5 shards (and therefore, 5 Lucene
indexes).
With ngram, you have scattered tokens over all shards, and this
converges to 5x the space compared to 1 shard.

Also, store = yes for each field is kind of clumsy. You have to
enable
each field to get them returned for a query (only _source is
returned by
default). I don't see much sense in making an ngram analyzed field
stored. Can you elaborate?

Jörg

Am 22.05.13 11:08, schrieb Shlomi:

does ES store its numeric fields as strings?

can someone confirm that if you disable _source and keep each
field as
stored and indexed, your fields becomes invisible (although
queriable)? or am i doing something totally wrong?..

Thanks

On Tuesday, May 21, 2013 7:10:07 PM UTC+3, Shlomi wrote:

here is a fraction of the mapping i have (i use clojure so its 

a

bit different from json, but its essentially the same): 

           {:test  { 
                     :_source {:enabled "false" } 
                     :_all    {:enabled "false" } 
                     :properties {:gram  {:type "string" 

:store

"yes" :analyzer :ngram-index :compress "true"} 
                                      :freq    {:type "long" 
:store "yes"} }}}] 

On Tuesday, May 21, 2013 7:07:44 PM UTC+3, Shlomi wrote: 

    Hey, 

    thanks all, let me reply: 

    Michael - no, i set replicas to 0 (if that what you 

meant..)

    Itamar & Matt - i disabled _all and _source, and 

explicitly

    set "store" to "yes" for both fields (i dont care about 

perf

    for now..) - with this setting i still got a much larger 

size

    and was still unable to see the fields (although i set 

store

    to yes) through queries (only got id's back) 

    On Tuesday, May 21, 2013 7:03:19 PM UTC+3, Matt Weber 

wrote:

        Don't forget about the _all field.  Also, if you don't 
        store the source, you need to explicitly set "store" 

to

        yes on your field mappings so you can have them 

returned

        in the results. 


        On Tue, May 21, 2013 at 8:59 AM, Shlomi 
        <shlomi...@gmail.com> wrote: 

            yes, so i was trying to exclude source, but then 
            queries didnt return anything besides id. but in 

any

            case, even disabling source still gave me a large 

index..

            any way to tell it to save just the fields? 


            On Tuesday, May 21, 2013 6:54:38 PM UTC+3, Itamar 
            Syn-Hershko wrote: 

                Yes, because ES stores the entire source by 

default

                On Tue, May 21, 2013 at 6:53 PM, Shlomi 
                <shlomi...@gmail.com> wrote: 

                    Hey, 

                    We have some old java code that uses 

lucene

                    and grizzly to serve queries over text. we 
                    have two field, a string field and a 

numeric

                    (long) field. the indexing code is pretty 
                    straight forward. 

                    I was trying to migrate this to elastic, 
                    pretty simple configuration, and indexed 

the

                    same data. 

                    the java based implementation took about 

6gb,

                    while to elastic took 17gb.. 

                    does this makes sense? what could i do 

about

                    this? 

                    Thanks! 


                    -- 
                    You received this message because you are 
                    subscribed to the Google Groups 
                    "elasticsearch" group. 
                    To unsubscribe from this group and stop 
                    receiving emails from it, send an email to 
                    elasticsearc...@googlegroups.**c**om. 

                    For more options, visit 
                    https://groups.google.com/**grou**

ps/opt_out https://groups.google.com/groups/opt_out

                    <https://groups.google.com/**gro**

ups/opt_out https://groups.google.com/groups/opt_out>.

            -- 
            You received this message because you are 

subscribed

            to the Google Groups "elasticsearch" group. 
            To unsubscribe from this group and stop receiving 
            emails from it, send an email to 
            elasticsearc...@googlegroups.**c**om. 
            For more options, visit 
            https://groups.google.com/**grou**ps/opt_out<https://groups.google.com/groups/opt_out> 
            <https://groups.google.com/**gro**ups/opt_out<https://groups.google.com/groups/opt_out>>. 

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send
an email to elasticsearc...@**googlegroups.com.
For more options, visit https://groups.google.com/**grou

ps/opt_out https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@**googlegroups.com.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ok, my results are somewhat strange.

I tried everything from scratch, reindexed both using the pure java +
lucene implementation and using elastic.

Here is the java code for adding both fields:
Document document = new Document();
document.add(new Field("ngram", ngram, Field.Store.YES,
Field.Index.ANALYZED));
NumericField frequencyField = new NumericField("frequency",
Field.Store.YES, false);
frequencyField.setLongValue(frequency);
document.add(frequencyField);

Java uses lucene 3.5, and we are using a custom similiarity class

here is the elastic version (0.20.5) mappings:
{
"test": {
"_all": {
"enabled": "false"
},
"properties": {
"freq": {
"store": "yes",
"compress": "true",
"index" : "not_analyzed",
"type": "long"
},
"gram": {
"store": "yes",
"compress": "true",
"type": "string",
"analyzer": "ngram-index"
}
},
"_source": {
"enabled": "false"
}
}
}

and the settings:
{
"analysis": {
"filter": {
"myStop": {
"stopwords": [
"a", "b", "c" //// this is just for the example, there's a large
list here
],
"type": "stop"
}
},
"analyzer": {
"ngram-index": {
"tokenizer": "lowercase",
"filter": [
"myStop"
],
"type": "custom"
}
}
},
"similarity": {
"search": {
"type": "org.elasticsearch.index.similarity.CustomSimilarityProvider"
////// the same similarity used in the java version
},
"index": {
"type": "org.elasticsearch.index.similarity.CustomSimilarityProvider"
}
},
"number_of_shards": 1,
"number_of_replicas": 0
}

I ran both, and to my surprise, elasticsearch's size was actually smaller
then the pure java version.
ES: 15.8gb
Java: 17gb

but originally the size (what made me complain so loudly), was ~ 8gb!

So I sparked up luke, and went to the java-lucene index, where i performed
"optimize". it worked for a long time but the size remained the same.
I tried doing the same for ES, using curl -XPOST
'http://192.161.101.61:9200/test/_optimize', but besides returning
{"ok":true,"_shards":{"total":1,"successful":1,"failed":0}} right away, it
didnt do much (should i have played with the parameters?)
So i opened ES in luke too, and hit "optimize". let it work for quite the
while, and got back 32gb (???!!!)

Wondering through luke's options, i came across "cleanup index dir". ran it
over the java lucene index, and got back a wonderful 8.4 gb, so i tried it
also on the ES index, with high hopes. it got reduced back to 15.8gb and
stayed the same..

so now, here are ls -ltra of both dirs:
ES:

ls -ltra
total 16585720
drwxr-xr-x 5 elasticsearch elasticsearch 4096 May 27 15:21 ..
-rw-r--r-- 1 root root 16983734698 May 27 17:14 _38g.cfs
-rw-r--r-- 1 root root 285 May 27 17:26
segments_55
-rw-r--r-- 1 root root 20 May 27 17:26
segments.gen
drwxr-xr-x 2 elasticsearch elasticsearch 20480 May 27 17:26 .

java:

ls -ltra
total 8759956
-rw-rw-r-- 1 shlomiv shlomiv 8970150460 May 27 17:04 _an.cfs
-rw-rw-r-- 1 shlomiv shlomiv 20 May 27 17:04 segments.gen
-rw-rw-r-- 1 shlomiv shlomiv 284 May 27 17:04 segments_38
drwxrwxrwt 34 root root 20480 May 27 17:17 ..
drwxrwxr-x 2 shlomiv shlomiv 4096 May 27 17:17 .

so it seems that our old dev guys used to manually clean their java
generated lucene index with luke, which gave back around 8 gb of space.
but unfortunately, this trick didnt work on ES's lucene index.

what do you think? does this makes sense at all?

Thanks,
Shlomi

On Sunday, May 26, 2013 10:41:49 PM UTC+3, simonw wrote:

thanks! I'd be really happy to see the outcome!

On Sunday, May 26, 2013 12:06:38 PM UTC+2, Shlomi wrote:

Hey,
I am trying to pin-point the difference between the two implementations,
still working on replying to Matt and Simon. I am using Luke to see inside
the indices.

as soon as I have more complete results ill post them here..

On Friday, May 24, 2013 7:19:24 PM UTC+3, Otis Gospodnetic wrote:

Don't give up! This does matter and does affect performance (think disk
reads, think OS cache). There is _source, _all, compression, and other
factors that will affect index size, so it would be great to nail this down.

Otis

ELASTICSEARCH Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Servicehttp://sematext.com/spm/index.html
Search Analytics - Cloud Monitoring Tools & Services | Sematexthttp://sematext.com/search-analytics/index.html

On Thursday, May 23, 2013 10:54:05 AM UTC-4, Jérôme Gagnon wrote:

+1 on that, we couldn't do much about it, we just hope that this
doesn't affect the disk IO performance...

On Thursday, May 23, 2013 10:34:38 AM UTC-4, Ivan Brusic wrote:

Just wanted to add that I always encountered the same issue with
Elasticsearch. Indices are almost twice as big despite aggressive trimming.
I have simply come to accept the issue as a fact and moved on. :slight_smile:

--
Ivan

On Wed, May 22, 2013 at 12:35 PM, simonw <simon.w...@elasticsearch.com

wrote:

I suggest you provide your lucene FieldTypes and your mapping, run
your indexing against lucene and a single shard no-replica Elasticsearch
instance. Then optimize the index and provide the output of ls -al on the
index directory. it would also be interesting what exactly is "much
larger".

simon

On Wednesday, May 22, 2013 8:27:05 PM UTC+2, Matt Weber wrote:

Really we are just shooting in the dark here because of lack of
information:

What version of ES? What version of lucene? What does your lucene
index settings (tokenizer, analyzers, etc) look like? Have you configured
an ES mapping identical to what you use in lucene? How are you measuring
your index size? Have your tried indexing a single document in lucene and
ES and comparing the resulting index size?

Gist us your mapping (not the clojure version) , custom analyzer
settings, index settings, etc and we might be able to figure this out for
you.

Thanks,
Matt Weber

On Wed, May 22, 2013 at 10:44 AM, Shlomi shlomi...@gmail.com
wrote:

Hey,

Thanks for replying, ngram is the name of the field, and is
pre-computed:

Jörg - I think i might have misled you, i am not using the ngram
tokenizer, ":ngram-index" is a custom tokenizer that uses
"lowercase" tokenizer, and a list of stopwords.

David - Thanks for the suggestion, but yeah, my code fails if the
index exists before it runs, this way i am sure the index was in fact
deleted..

Mark - I tried with both a single shard and the default 5 shards.
there was no different in size (surprisingly.. )

thanks for all your responses, but we have to keep thinking.. :slight_smile:

On Wednesday, May 22, 2013 5:22:53 PM UTC+3, Jörg Prante wrote:

You are using ngram tokenizer which explodes index size. If you
use ES
default sharding, you have 5 shards (and therefore, 5 Lucene
indexes).
With ngram, you have scattered tokens over all shards, and this
converges to 5x the space compared to 1 shard.

Also, store = yes for each field is kind of clumsy. You have to
enable
each field to get them returned for a query (only _source is
returned by
default). I don't see much sense in making an ngram analyzed field
stored. Can you elaborate?

Jörg

Am 22.05.13 11:08, schrieb Shlomi:

does ES store its numeric fields as strings?

can someone confirm that if you disable _source and keep each
field as
stored and indexed, your fields becomes invisible (although
queriable)? or am i doing something totally wrong?..

Thanks

On Tuesday, May 21, 2013 7:10:07 PM UTC+3, Shlomi wrote:

here is a fraction of the mapping i have (i use clojure so 

its a

bit different from json, but its essentially the same): 

           {:test  { 
                     :_source {:enabled "false" } 
                     :_all    {:enabled "false" } 
                     :properties {:gram  {:type "string" 

:store

"yes" :analyzer :ngram-index :compress "true"} 
                                      :freq    {:type "long" 
:store "yes"} }}}] 

On Tuesday, May 21, 2013 7:07:44 PM UTC+3, Shlomi wrote: 

    Hey, 

    thanks all, let me reply: 

    Michael - no, i set replicas to 0 (if that what you 

meant..)

    Itamar & Matt - i disabled _all and _source, and 

explicitly

    set "store" to "yes" for both fields (i dont care about 

perf

    for now..) - with this setting i still got a much larger 

size

    and was still unable to see the fields (although i set 

store

    to yes) through queries (only got id's back) 

    On Tuesday, May 21, 2013 7:03:19 PM UTC+3, Matt Weber 

wrote:

        Don't forget about the _all field.  Also, if you 

don't

        store the source, you need to explicitly set "store" 

to

        yes on your field mappings so you can have them 

returned

        in the results. 


        On Tue, May 21, 2013 at 8:59 AM, Shlomi 
        <shlomi...@gmail.com> wrote: 

            yes, so i was trying to exclude source, but then 
            queries didnt return anything besides id. but in 

any

            case, even disabling source still gave me a 

large index..

            any way to tell it to save just the fields? 


            On Tuesday, May 21, 2013 6:54:38 PM UTC+3, 

Itamar

            Syn-Hershko wrote: 

                Yes, because ES stores the entire source by 

default

                On Tue, May 21, 2013 at 6:53 PM, Shlomi 
                <shlomi...@gmail.com> wrote: 

                    Hey, 

                    We have some old java code that uses 

lucene

                    and grizzly to serve queries over text. 

we

                    have two field, a string field and a 

numeric

                    (long) field. the indexing code is 

pretty

                    straight forward. 

                    I was trying to migrate this to elastic, 
                    pretty simple configuration, and indexed 

the

                    same data. 

                    the java based implementation took about 

6gb,

                    while to elastic took 17gb.. 

                    does this makes sense? what could i do 

about

                    this? 

                    Thanks! 


                    -- 
                    You received this message because you 

are

                    subscribed to the Google Groups 
                    "elasticsearch" group. 
                    To unsubscribe from this group and stop 
                    receiving emails from it, send an email 

to

                    elasticsearc...@googlegroups.**c**om. 

                    For more options, visit 
                    https://groups.google.com/**grou**

ps/opt_out https://groups.google.com/groups/opt_out

                    <https://groups.google.com/**gro**

ups/opt_out https://groups.google.com/groups/opt_out>.

            -- 
            You received this message because you are 

subscribed

            to the Google Groups "elasticsearch" group. 
            To unsubscribe from this group and stop 

receiving

            emails from it, send an email to 
            elasticsearc...@googlegroups.**c**om. 
            For more options, visit 
            https://groups.google.com/**grou**ps/opt_out<https://groups.google.com/groups/opt_out>
            <https://groups.google.com/**gro**ups/opt_out<https://groups.google.com/groups/opt_out>

.

--
You received this message because you are subscribed to the
Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from
it, send
an email to elasticsearc...@**googlegroups.com.
For more options, visit https://groups.google.com/**grou

ps/opt_out https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@**googlegroups.com.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Monday, May 27, 2013 4:21:59 PM UTC+3, Jérôme Gagnon wrote:

That is exactly what I was talking about. Actually I saw an improvement
going from 0.20.x to 0.90.x which is great ! I'm also waiting for the
outcome

On Friday, May 24, 2013 12:19:24 PM UTC-4, Otis Gospodnetic wrote:

Don't give up! This does matter and does affect performance (think disk
reads, think OS cache). There is _source, _all, compression, and other
factors that will affect index size, so it would be great to nail this down.

Otis

ELASTICSEARCH Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Servicehttp://sematext.com/spm/index.html
Search Analytics - Cloud Monitoring Tools & Services | Sematexthttp://sematext.com/search-analytics/index.html

On Thursday, May 23, 2013 10:54:05 AM UTC-4, Jérôme Gagnon wrote:

+1 on that, we couldn't do much about it, we just hope that this doesn't
affect the disk IO performance...

On Thursday, May 23, 2013 10:34:38 AM UTC-4, Ivan Brusic wrote:

Just wanted to add that I always encountered the same issue with
Elasticsearch. Indices are almost twice as big despite aggressive trimming.
I have simply come to accept the issue as a fact and moved on. :slight_smile:

--
Ivan

On Wed, May 22, 2013 at 12:35 PM, simonw simon.w...@elasticsearch.comwrote:

I suggest you provide your lucene FieldTypes and your mapping, run
your indexing against lucene and a single shard no-replica Elasticsearch
instance. Then optimize the index and provide the output of ls -al on the
index directory. it would also be interesting what exactly is "much
larger".

simon

On Wednesday, May 22, 2013 8:27:05 PM UTC+2, Matt Weber wrote:

Really we are just shooting in the dark here because of lack of
information:

What version of ES? What version of lucene? What does your lucene
index settings (tokenizer, analyzers, etc) look like? Have you configured
an ES mapping identical to what you use in lucene? How are you measuring
your index size? Have your tried indexing a single document in lucene and
ES and comparing the resulting index size?

Gist us your mapping (not the clojure version) , custom analyzer
settings, index settings, etc and we might be able to figure this out for
you.

Thanks,
Matt Weber

On Wed, May 22, 2013 at 10:44 AM, Shlomi shlomi...@gmail.com wrote:

Hey,

Thanks for replying, ngram is the name of the field, and is
pre-computed:

Jörg - I think i might have misled you, i am not using the ngram
tokenizer, ":ngram-index" is a custom tokenizer that uses
"lowercase" tokenizer, and a list of stopwords.

David - Thanks for the suggestion, but yeah, my code fails if the
index exists before it runs, this way i am sure the index was in fact
deleted..

Mark - I tried with both a single shard and the default 5 shards.
there was no different in size (surprisingly.. )

thanks for all your responses, but we have to keep thinking.. :slight_smile:

On Wednesday, May 22, 2013 5:22:53 PM UTC+3, Jörg Prante wrote:

You are using ngram tokenizer which explodes index size. If you use
ES
default sharding, you have 5 shards (and therefore, 5 Lucene
indexes).
With ngram, you have scattered tokens over all shards, and this
converges to 5x the space compared to 1 shard.

Also, store = yes for each field is kind of clumsy. You have to
enable
each field to get them returned for a query (only _source is
returned by
default). I don't see much sense in making an ngram analyzed field
stored. Can you elaborate?

Jörg

Am 22.05.13 11:08, schrieb Shlomi:

does ES store its numeric fields as strings?

can someone confirm that if you disable _source and keep each
field as
stored and indexed, your fields becomes invisible (although
queriable)? or am i doing something totally wrong?..

Thanks

On Tuesday, May 21, 2013 7:10:07 PM UTC+3, Shlomi wrote:

here is a fraction of the mapping i have (i use clojure so 

its a

bit different from json, but its essentially the same): 

           {:test  { 
                     :_source {:enabled "false" } 
                     :_all    {:enabled "false" } 
                     :properties {:gram  {:type "string" 

:store

"yes" :analyzer :ngram-index :compress "true"} 
                                      :freq    {:type "long" 
:store "yes"} }}}] 

On Tuesday, May 21, 2013 7:07:44 PM UTC+3, Shlomi wrote: 

    Hey, 

    thanks all, let me reply: 

    Michael - no, i set replicas to 0 (if that what you 

meant..)

    Itamar & Matt - i disabled _all and _source, and 

explicitly

    set "store" to "yes" for both fields (i dont care about 

perf

    for now..) - with this setting i still got a much larger 

size

    and was still unable to see the fields (although i set 

store

    to yes) through queries (only got id's back) 

    On Tuesday, May 21, 2013 7:03:19 PM UTC+3, Matt Weber 

wrote:

        Don't forget about the _all field.  Also, if you 

don't

        store the source, you need to explicitly set "store" 

to

        yes on your field mappings so you can have them 

returned

        in the results. 


        On Tue, May 21, 2013 at 8:59 AM, Shlomi 
        <shlomi...@gmail.com> wrote: 

            yes, so i was trying to exclude source, but then 
            queries didnt return anything besides id. but in 

any

            case, even disabling source still gave me a large 

index..

            any way to tell it to save just the fields? 


            On Tuesday, May 21, 2013 6:54:38 PM UTC+3, Itamar 
            Syn-Hershko wrote: 

                Yes, because ES stores the entire source by 

default

                On Tue, May 21, 2013 at 6:53 PM, Shlomi 
                <shlomi...@gmail.com> wrote: 

                    Hey, 

                    We have some old java code that uses 

lucene

                    and grizzly to serve queries over text. 

we

                    have two field, a string field and a 

numeric

                    (long) field. the indexing code is pretty 
                    straight forward. 

                    I was trying to migrate this to elastic, 
                    pretty simple configuration, and indexed 

the

                    same data. 

                    the java based implementation took about 

6gb,

                    while to elastic took 17gb.. 

                    does this makes sense? what could i do 

about

                    this? 

                    Thanks! 


                    -- 
                    You received this message because you are 
                    subscribed to the Google Groups 
                    "elasticsearch" group. 
                    To unsubscribe from this group and stop 
                    receiving emails from it, send an email 

to

                    elasticsearc...@googlegroups.**c**om. 

                    For more options, visit 
                    https://groups.google.com/**grou**

ps/opt_out https://groups.google.com/groups/opt_out

                    <https://groups.google.com/**gro**

ups/opt_out https://groups.google.com/groups/opt_out>.

            -- 
            You received this message because you are 

subscribed

            to the Google Groups "elasticsearch" group. 
            To unsubscribe from this group and stop receiving 
            emails from it, send an email to 
            elasticsearc...@googlegroups.**c**om. 
            For more options, visit 
            https://groups.google.com/**grou**ps/opt_out<https://groups.google.com/groups/opt_out> 
            <https://groups.google.com/**gro**ups/opt_out<https://groups.google.com/groups/opt_out>>. 

--
You received this message because you are subscribed to the
Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send
an email to elasticsearc...@**googlegroups.com.
For more options, visit https://groups.google.com/**grou

ps/opt_out https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@**googlegroups.com.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks for listing the files.

If you play with Luke's "optimize", you transform the Lucene index into
Lucene's so-called "compound" format (note the file suffix .cfs) The
compound format is space efficient (as you found out) since it packs all
segments into one large container, but it's not time efficient (it costs
around 20-30% more time to build and does not scale well with multicore
CPUs in query performance) so it is not used by ES.

But, as a result of Lucene magic, ES supports a Lucene shard that was
converted to compound format silently without further notice.

On the other hand, the ES "optimize" does true index optimizing by reducing
the number of segments in an index to build a query-time optimized index
structure. But it does not create a Lucene compound format, like Luke does.
The transformation to compound format may have some flaws when Luke works
on ES because Luke is not aware of ES additional information, so the result
may vary. The 1:2 ratio you observe makes sense.

My recommendation is not to use compound format and not to play with Luke's
"optimize" in ES, it only confuses people.

In favor of having much faster queries and faster indexing, it's best to
live with the "larger" index ES creates.

Jörg

On Monday, May 27, 2013 4:35:48 PM UTC+2, Shlomi wrote:

ES:

ls -ltra
total 16585720
drwxr-xr-x 5 elasticsearch elasticsearch 4096 May 27 15:21 ..
-rw-r--r-- 1 root root 16983734698 May 27 17:14 _38g.cfs
-rw-r--r-- 1 root root 285 May 27 17:26
segments_55
-rw-r--r-- 1 root root 20 May 27 17:26
segments.gen
drwxr-xr-x 2 elasticsearch elasticsearch 20480 May 27 17:26 .

java:

ls -ltra
total 8759956
-rw-rw-r-- 1 shlomiv shlomiv 8970150460 May 27 17:04 _an.cfs
-rw-rw-r-- 1 shlomiv shlomiv 20 May 27 17:04 segments.gen
-rw-rw-r-- 1 shlomiv shlomiv 284 May 27 17:04 segments_38
drwxrwxrwt 34 root root 20480 May 27 17:17 ..
drwxrwxr-x 2 shlomiv shlomiv 4096 May 27 17:17 .

so it seems that our old dev guys used to manually clean their java
generated lucene index with luke, which gave back around 8 gb of space.
but unfortunately, this trick didnt work on ES's lucene index.

what do you think? does this makes sense at all?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey all,

On Tuesday, May 28, 2013 12:24:10 AM UTC+2, Jörg Prante wrote:

Thanks for listing the files.

thanks for the files but this listing doesn't buy us anything. The CFS
format doesn't show what are the big consumers, is it possible to get a
listing of the files before they are packed into CFS?

If you play with Luke's "optimize", you transform the Lucene index into
Lucene's so-called "compound" format (note the file suffix .cfs) The
compound format is space efficient (as you found out) since it packs all
segments into one large container, but it's not time efficient (it costs
around 20-30% more time to build and does not scale well with multicore
CPUs in query performance) so it is not used by ES.

This is not correct, the cfs format is not space efficient compared to the
multi-file segment. It in-fact consumes slightly more space (metadata
overhead etc.) but it reduces the number of open file handles which can be
important in some situations. In Lucene 4.0 the performance should be
nearly the same and 20% to 30% might be a combination of the directory used
and CFS etc.

But, as a result of Lucene magic, ES supports a Lucene shard that was
converted to compound format silently without further notice.

yes that is correct, a CFS format is written in the segment info for that
segment and we wrap a directory in that case to allow CFS reading.

On the other hand, the ES "optimize" does true index optimizing by
reducing the number of segments in an index to build a query-time optimized
index structure. But it does not create a Lucene compound format, like Luke
does. The transformation to compound format may have some flaws when Luke
works on ES because Luke is not aware of ES additional information, so the
result may vary. The 1:2 ratio you observe makes sense.

I have to disagree here again. For segment merging no schema knowledge is
needed. ie. an optimize call on Luke should result in the same index as if
you would call optimize on elasticsearch. That said, this changes a little
in Lucene 4.0 since you can specify similarity per field and per field
codecs but it seems everything is pretty much default here since we are
using Lucene < 4.0 since in 4.0 CFS is written in 2 fields rather than one.
You can also use CFS in Elasticsearch if you enable is on the merge
policy. In-fact Lucene writes CFS for small segments (ratio is 10% of the
entire index) by default, I think since 3.4 or maybe 3.3 (not sure here) We
do this to reduce the # of files by default for the NRT case where you
flush lots of small segments and perf difference is almost not existing if
segments are small. So CFS is totally OK to use but optimizing in Luke has
different problems....

My recommendation is not to use compound format and not to play with
Luke's "optimize" in ES, it only confuses people.

The optimization can have tricky side-effects. Yet, to give you and
explanation I need to see the listing of the non-compound index and you
could run checkindex on both indices with -verbose enabled. Go to your
index directory and then execute
java -cp paht/to/lucene.jar org.apache.lucene.index.CheckIndex . -verbose

this should give you a better idea... note this can take a while to finish.

In favor of having much faster queries and faster indexing, it's best to
live with the "larger" index ES creates.

one more thing.. in lucene 4.0 we write directly into the CFS during
indexing and merging so the perf difference is way smaller now.

simon

Jörg

On Monday, May 27, 2013 4:35:48 PM UTC+2, Shlomi wrote:

ES:

ls -ltra
total 16585720
drwxr-xr-x 5 elasticsearch elasticsearch 4096 May 27 15:21 ..
-rw-r--r-- 1 root root 16983734698 May 27 17:14 _38g.cfs
-rw-r--r-- 1 root root 285 May 27 17:26
segments_55
-rw-r--r-- 1 root root 20 May 27 17:26
segments.gen
drwxr-xr-x 2 elasticsearch elasticsearch 20480 May 27 17:26 .

java:

ls -ltra
total 8759956
-rw-rw-r-- 1 shlomiv shlomiv 8970150460 May 27 17:04 _an.cfs
-rw-rw-r-- 1 shlomiv shlomiv 20 May 27 17:04 segments.gen
-rw-rw-r-- 1 shlomiv shlomiv 284 May 27 17:04 segments_38
drwxrwxrwt 34 root root 20480 May 27 17:17 ..
drwxrwxr-x 2 shlomiv shlomiv 4096 May 27 17:17 .

so it seems that our old dev guys used to manually clean their java
generated lucene index with luke, which gave back around 8 gb of space.
but unfortunately, this trick didnt work on ES's lucene index.

what do you think? does this makes sense at all?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

hey,

Yes, i thought you might want to see the original files listing, so i ran
it again over the night.

from what you say, does it mean that ES's lucene index will be about twice
the size of a plain lucene index? is it just something we have to live
with? (it's probably fine most of the times, but not for some use-cases..)

the listing is quite large... here is the one for ES: (no optimizing at
all, attempted to do "clean index dir" but it didnt delete any files..)

ls -ltra
total 16648036
drwxr-xr-x 5 elasticsearch elasticsearch 4096 May 27 17:34 ..
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 17:55 _vr.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 219332316 May 27 17:55 _vr.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 1981355017 May 27 17:55 _vr.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 677757390 May 27 17:56 _vr.tis
-rw-r--r-- 1 elasticsearch elasticsearch 6519398 May 27 17:56 _vr.tii
-rw-r--r-- 1 elasticsearch elasticsearch 401141598 May 27 17:56 _vr.prx
-rw-r--r-- 1 elasticsearch elasticsearch 925178084 May 27 17:56 _vr.frq
-rw-r--r-- 1 elasticsearch elasticsearch 27416543 May 27 17:56 _vr.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:02 _16f.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 5065644 May 27 18:02 _16f.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 45741364 May 27 18:02 _16f.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 16803650 May 27 18:02 _16f.tis
-rw-r--r-- 1 elasticsearch elasticsearch 162668 May 27 18:02 _16f.tii
-rw-r--r-- 1 elasticsearch elasticsearch 9329742 May 27 18:02 _16f.prx
-rw-r--r-- 1 elasticsearch elasticsearch 20571312 May 27 18:02 _16f.frq
-rw-r--r-- 1 elasticsearch elasticsearch 633209 May 27 18:02 _16f.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:16 _1s0.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 240390660 May 27 18:16 _1s0.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 2178157235 May 27 18:16 _1s0.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 742546522 May 27 18:17 _1s0.tis
-rw-r--r-- 1 elasticsearch elasticsearch 7152131 May 27 18:17 _1s0.tii
-rw-r--r-- 1 elasticsearch elasticsearch 440466009 May 27 18:17 _1s0.prx
-rw-r--r-- 1 elasticsearch elasticsearch 1017914310 May 27 18:17 _1s0.frq
-rw-r--r-- 1 elasticsearch elasticsearch 30048836 May 27 18:17 _1s0.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:18 _1ub.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 5279132 May 27 18:18 _1ub.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 47704644 May 27 18:18 _1ub.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 17641435 May 27 18:18 _1ub.tis
-rw-r--r-- 1 elasticsearch elasticsearch 171424 May 27 18:18 _1ub.tii
-rw-r--r-- 1 elasticsearch elasticsearch 9603799 May 27 18:18 _1ub.prx
-rw-r--r-- 1 elasticsearch elasticsearch 659895 May 27 18:18 _1ub.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 21415626 May 27 18:18 _1ub.frq
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:38 _2oj.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 2149916547 May 27 18:38 _2oj.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 238283772 May 27 18:38 _2oj.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:39 _2py.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 23177452 May 27 18:39 _2py.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 205782679 May 27 18:39 _2py.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 73967868 May 27 18:39 _2py.tis
-rw-r--r-- 1 elasticsearch elasticsearch 704213 May 27 18:39 _2py.tii
-rw-r--r-- 1 elasticsearch elasticsearch 41750825 May 27 18:39 _2py.prx
-rw-r--r-- 1 elasticsearch elasticsearch 95136302 May 27 18:39 _2py.frq
-rw-r--r-- 1 elasticsearch elasticsearch 2897185 May 27 18:39 _2py.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 735613612 May 27 18:39 _2oj.tis
-rw-r--r-- 1 elasticsearch elasticsearch 7082393 May 27 18:39 _2oj.tii
-rw-r--r-- 1 elasticsearch elasticsearch 434339734 May 27 18:39 _2oj.prx
-rw-r--r-- 1 elasticsearch elasticsearch 1005557319 May 27 18:39 _2oj.frq
-rw-r--r-- 1 elasticsearch elasticsearch 29785475 May 27 18:39 _2oj.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:40 _2rp.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 26444028 May 27 18:40 _2rp.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 239777276 May 27 18:40 _2rp.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 84715455 May 27 18:40 _2rp.tis
-rw-r--r-- 1 elasticsearch elasticsearch 808505 May 27 18:40 _2rp.tii
-rw-r--r-- 1 elasticsearch elasticsearch 48544290 May 27 18:40 _2rp.prx
-rw-r--r-- 1 elasticsearch elasticsearch 110352247 May 27 18:40 _2rp.frq
-rw-r--r-- 1 elasticsearch elasticsearch 3305507 May 27 18:40 _2rp.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:43 _2ve.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 23981876 May 27 18:43 _2ve.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 218838291 May 27 18:43 _2ve.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 76737689 May 27 18:43 _2ve.tis
-rw-r--r-- 1 elasticsearch elasticsearch 731850 May 27 18:43 _2ve.tii
-rw-r--r-- 1 elasticsearch elasticsearch 44207710 May 27 18:43 _2ve.prx
-rw-r--r-- 1 elasticsearch elasticsearch 100267293 May 27 18:43 _2ve.frq
-rw-r--r-- 1 elasticsearch elasticsearch 2997738 May 27 18:43 _2ve.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:44 _2y8.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 25456740 May 27 18:44 _2y8.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 229765680 May 27 18:44 _2y8.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 80929752 May 27 18:45 _2y8.tis
-rw-r--r-- 1 elasticsearch elasticsearch 769492 May 27 18:45 _2y8.tii
-rw-r--r-- 1 elasticsearch elasticsearch 46988152 May 27 18:45 _2y8.prx
-rw-r--r-- 1 elasticsearch elasticsearch 105942428 May 27 18:45 _2y8.frq
-rw-r--r-- 1 elasticsearch elasticsearch 3182096 May 27 18:45 _2y8.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:46 _30h.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 37740247 May 27 18:46 _30h.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 4044764 May 27 18:46 _30h.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 13444775 May 27 18:46 _30h.tis
-rw-r--r-- 1 elasticsearch elasticsearch 130539 May 27 18:46 _30h.tii
-rw-r--r-- 1 elasticsearch elasticsearch 7427643 May 27 18:46 _30h.prx
-rw-r--r-- 1 elasticsearch elasticsearch 16383492 May 27 18:46 _30h.frq
-rw-r--r-- 1 elasticsearch elasticsearch 505599 May 27 18:46 _30h.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:47 _31f.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 21543756 May 27 18:47 _31f.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 197299138 May 27 18:47 _31f.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 68879876 May 27 18:47 _31f.tis
-rw-r--r-- 1 elasticsearch elasticsearch 656343 May 27 18:47 _31f.tii
-rw-r--r-- 1 elasticsearch elasticsearch 39593224 May 27 18:47 _31f.prx
-rw-r--r-- 1 elasticsearch elasticsearch 89855250 May 27 18:47 _31f.frq
-rw-r--r-- 1 elasticsearch elasticsearch 2692973 May 27 18:47 _31f.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:47 _325.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 3627348 May 27 18:47 _325.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 32383651 May 27 18:47 _325.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 12637394 May 27 18:47 _325.tis
-rw-r--r-- 1 elasticsearch elasticsearch 124679 May 27 18:47 _325.tii
-rw-r--r-- 1 elasticsearch elasticsearch 6240381 May 27 18:47 _325.prx
-rw-r--r-- 1 elasticsearch elasticsearch 14493173 May 27 18:47 _325.frq
-rw-r--r-- 1 elasticsearch elasticsearch 453422 May 27 18:47 _325.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:49 _34x.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 22657484 May 27 18:49 _34x.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 203319124 May 27 18:49 _34x.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 72839221 May 27 18:49 _34x.tis
-rw-r--r-- 1 elasticsearch elasticsearch 695455 May 27 18:49 _34x.tii
-rw-r--r-- 1 elasticsearch elasticsearch 41510104 May 27 18:49 _34x.prx
-rw-r--r-- 1 elasticsearch elasticsearch 94734687 May 27 18:49 _34x.frq
-rw-r--r-- 1 elasticsearch elasticsearch 2832189 May 27 18:49 _34x.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:49 _357.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 3226756 May 27 18:49 _357.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 29482762 May 27 18:49 _357.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 10932081 May 27 18:49 _357.tis
-rw-r--r-- 1 elasticsearch elasticsearch 106617 May 27 18:49 _357.tii
-rw-r--r-- 1 elasticsearch elasticsearch 5931548 May 27 18:49 _357.prx
-rw-r--r-- 1 elasticsearch elasticsearch 403348 May 27 18:49 _357.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 13060705 May 27 18:49 _357.frq
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:49 _35h.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 2190948 May 27 18:49 _35h.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 19853187 May 27 18:49 _35h.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 7499717 May 27 18:49 _35h.tis
-rw-r--r-- 1 elasticsearch elasticsearch 73038 May 27 18:49 _35h.tii
-rw-r--r-- 1 elasticsearch elasticsearch 4004660 May 27 18:49 _35h.prx
-rw-r--r-- 1 elasticsearch elasticsearch 273872 May 27 18:49 _35h.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 8850349 May 27 18:49 _35h.frq
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:49 _35r.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 639604 May 27 18:49 _35r.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 5879432 May 27 18:49 _35r.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 2309837 May 27 18:49 _35r.tis
-rw-r--r-- 1 elasticsearch elasticsearch 22840 May 27 18:49 _35r.tii
-rw-r--r-- 1 elasticsearch elasticsearch 1173359 May 27 18:49 _35r.prx
-rw-r--r-- 1 elasticsearch elasticsearch 79954 May 27 18:49 _35r.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 2564359 May 27 18:49 _35r.frq
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _35y.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 2354380 May 27 18:50 _35y.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 21611994 May 27 18:50 _35y.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 8030981 May 27 18:50 _35y.tis
-rw-r--r-- 1 elasticsearch elasticsearch 78300 May 27 18:50 _35y.tii
-rw-r--r-- 1 elasticsearch elasticsearch 4317653 May 27 18:50 _35y.prx
-rw-r--r-- 1 elasticsearch elasticsearch 294301 May 27 18:50 _35y.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 9554754 May 27 18:50 _35y.frq
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _367.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 2873548 May 27 18:50 _367.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 26261768 May 27 18:50 _367.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 9730410 May 27 18:50 _367.tis
-rw-r--r-- 1 elasticsearch elasticsearch 94696 May 27 18:50 _367.tii
-rw-r--r-- 1 elasticsearch elasticsearch 11684334 May 27 18:50 _367.frq
-rw-r--r-- 1 elasticsearch elasticsearch 5265938 May 27 18:50 _367.prx
-rw-r--r-- 1 elasticsearch elasticsearch 359197 May 27 18:50 _367.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 309897 May 27 18:50 _36e.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 33652 May 27 18:50 _36e.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 151593 May 27 18:50 _36e.tis
-rw-r--r-- 1 elasticsearch elasticsearch 1642 May 27 18:50 _36e.tii
-rw-r--r-- 1 elasticsearch elasticsearch 61814 May 27 18:50 _36e.prx
-rw-r--r-- 1 elasticsearch elasticsearch 126563 May 27 18:50 _36e.frq
-rw-r--r-- 1 elasticsearch elasticsearch 4210 May 27 18:50 _36e.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _36e.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 11788 May 27 18:50 _36f.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 107037 May 27 18:50 _36f.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 59456 May 27 18:50 _36f.tis
-rw-r--r-- 1 elasticsearch elasticsearch 666 May 27 18:50 _36f.tii
-rw-r--r-- 1 elasticsearch elasticsearch 21688 May 27 18:50 _36f.prx
-rw-r--r-- 1 elasticsearch elasticsearch 1477 May 27 18:50 _36f.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 43858 May 27 18:50 _36f.frq
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _36f.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 1806 May 27 18:50 _36i.tis
-rw-r--r-- 1 elasticsearch elasticsearch 49 May 27 18:50 _36i.tii
-rw-r--r-- 1 elasticsearch elasticsearch 304 May 27 18:50 _36i.prx
-rw-r--r-- 1 elasticsearch elasticsearch 25 May 27 18:50 _36i.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 514 May 27 18:50 _36i.frq
-rw-r--r-- 1 elasticsearch elasticsearch 172 May 27 18:50 _36i.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 1576 May 27 18:50 _36i.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _36i.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _36l.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 2467780 May 27 18:50 _36l.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 22442774 May 27 18:50 _36l.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 8411793 May 27 18:50 _36l.tis
-rw-r--r-- 1 elasticsearch elasticsearch 81892 May 27 18:50 _36l.tii
-rw-r--r-- 1 elasticsearch elasticsearch 4520607 May 27 18:50 _36l.prx
-rw-r--r-- 1 elasticsearch elasticsearch 10027553 May 27 18:50 _36l.frq
-rw-r--r-- 1 elasticsearch elasticsearch 308476 May 27 18:50 _36l.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _36v.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 2480516 May 27 18:50 _36v.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 22822808 May 27 18:50 _36v.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 8801263 May 27 18:50 _36v.tis
-rw-r--r-- 1 elasticsearch elasticsearch 87153 May 27 18:50 _36v.tii
-rw-r--r-- 1 elasticsearch elasticsearch 4494453 May 27 18:50 _36v.prx
-rw-r--r-- 1 elasticsearch elasticsearch 10257485 May 27 18:50 _36v.frq
-rw-r--r-- 1 elasticsearch elasticsearch 310068 May 27 18:50 _36v.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 44132 May 27 18:50 _372.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 415201 May 27 18:50 _372.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 198527 May 27 18:50 _372.tis
-rw-r--r-- 1 elasticsearch elasticsearch 2133 May 27 18:50 _372.tii
-rw-r--r-- 1 elasticsearch elasticsearch 80597 May 27 18:50 _372.prx
-rw-r--r-- 1 elasticsearch elasticsearch 5520 May 27 18:50 _372.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 167499 May 27 18:50 _372.frq
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _372.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _375.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 3645828 May 27 18:50 _375.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 34062621 May 27 18:50 _375.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 106852 May 27 18:50 _374.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 990040 May 27 18:50 _374.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 453836 May 27 18:50 _374.tis
-rw-r--r-- 1 elasticsearch elasticsearch 4743 May 27 18:50 _374.tii
-rw-r--r-- 1 elasticsearch elasticsearch 196913 May 27 18:50 _374.prx
-rw-r--r-- 1 elasticsearch elasticsearch 420451 May 27 18:50 _374.frq
-rw-r--r-- 1 elasticsearch elasticsearch 13360 May 27 18:50 _374.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _374.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 202868 May 27 18:50 _376.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 1898905 May 27 18:50 _376.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 828823 May 27 18:50 _376.tis
-rw-r--r-- 1 elasticsearch elasticsearch 8567 May 27 18:50 _376.tii
-rw-r--r-- 1 elasticsearch elasticsearch 372288 May 27 18:50 _376.prx
-rw-r--r-- 1 elasticsearch elasticsearch 816220 May 27 18:50 _376.frq
-rw-r--r-- 1 elasticsearch elasticsearch 25362 May 27 18:50 _376.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _376.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 20540 May 27 18:50 _377.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 189637 May 27 18:50 _377.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 103554 May 27 18:50 _377.tis
-rw-r--r-- 1 elasticsearch elasticsearch 1156 May 27 18:50 _377.tii
-rw-r--r-- 1 elasticsearch elasticsearch 37645 May 27 18:50 _377.prx
-rw-r--r-- 1 elasticsearch elasticsearch 2571 May 27 18:50 _377.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 77707 May 27 18:50 _377.frq
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _377.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 12573871 May 27 18:50 _375.tis
-rw-r--r-- 1 elasticsearch elasticsearch 123694 May 27 18:50 _375.tii
-rw-r--r-- 1 elasticsearch elasticsearch 6640864 May 27 18:50 _375.prx
-rw-r--r-- 1 elasticsearch elasticsearch 14999500 May 27 18:50 _375.frq
-rw-r--r-- 1 elasticsearch elasticsearch 455732 May 27 18:50 _375.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 380684 May 27 18:50 _378.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 3608362 May 27 18:50 _378.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 1471835 May 27 18:50 _378.tis
-rw-r--r-- 1 elasticsearch elasticsearch 14864 May 27 18:50 _378.tii
-rw-r--r-- 1 elasticsearch elasticsearch 696429 May 27 18:50 _378.prx
-rw-r--r-- 1 elasticsearch elasticsearch 47589 May 27 18:50 _378.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 1544148 May 27 18:50 _378.frq
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _378.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 346908 May 27 18:50 _379.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 3295714 May 27 18:50 _379.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 1359640 May 27 18:50 _379.tis
-rw-r--r-- 1 elasticsearch elasticsearch 13724 May 27 18:50 _379.tii
-rw-r--r-- 1 elasticsearch elasticsearch 1411584 May 27 18:50 _379.frq
-rw-r--r-- 1 elasticsearch elasticsearch 633187 May 27 18:50 _379.prx
-rw-r--r-- 1 elasticsearch elasticsearch 43367 May 27 18:50 _379.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _379.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 3671437 May 27 18:50 _37a.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 390892 May 27 18:50 _37a.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 1506758 May 27 18:50 _37a.tis
-rw-r--r-- 1 elasticsearch elasticsearch 15167 May 27 18:50 _37a.tii
-rw-r--r-- 1 elasticsearch elasticsearch 713155 May 27 18:50 _37a.prx
-rw-r--r-- 1 elasticsearch elasticsearch 1584719 May 27 18:50 _37a.frq
-rw-r--r-- 1 elasticsearch elasticsearch 48865 May 27 18:50 _37a.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _37a.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 33420 May 27 18:50 _37b.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 313623 May 27 18:50 _37b.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 160429 May 27 18:50 _37b.tis
-rw-r--r-- 1 elasticsearch elasticsearch 1752 May 27 18:50 _37b.tii
-rw-r--r-- 1 elasticsearch elasticsearch 61122 May 27 18:50 _37b.prx
-rw-r--r-- 1 elasticsearch elasticsearch 4181 May 27 18:50 _37b.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 127012 May 27 18:50 _37b.frq
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _37b.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 7863 May 27 18:51 segments_4m
-rw-r--r-- 1 elasticsearch elasticsearch 20 May 27 18:51
segments.gen
drwxr-xr-x 2 elasticsearch elasticsearch 20480 May 28 11:20 .

and here is the one for java (no optimize, with clean):

ls -ltr
total 8778696
-rw-rw-r-- 1 shlomiv shlomiv 20 May 28 12:20 segments.gen
-rw-rw-r-- 1 shlomiv shlomiv 24 May 28 12:20 _a3.fnm
-rw-rw-r-- 1 shlomiv shlomiv 16048551 May 28 12:20 _a0.cfs
-rw-rw-r-- 1 shlomiv shlomiv 350019 May 28 12:20 _7j.cfs
-rw-rw-r-- 1 shlomiv shlomiv 12482462 May 28 12:20 _af.cfs
-rw-rw-r-- 1 shlomiv shlomiv 6799056 May 28 12:20 _91.cfs
-rw-rw-r-- 1 shlomiv shlomiv 7100132 May 28 12:20 _ai.cfs
-rw-rw-r-- 1 shlomiv shlomiv 1010209161 May 28 12:20 _3n.frq
-rw-rw-r-- 1 shlomiv shlomiv 15936518 May 28 12:20 _a7.cfs
-rw-rw-r-- 1 shlomiv shlomiv 167517463 May 28 12:20 _7u.prx
-rw-rw-r-- 1 shlomiv shlomiv 1261800 May 28 12:20 _9.cfs
-rw-rw-r-- 1 shlomiv shlomiv 1876833 May 28 12:20 _8g.cfs
-rw-rw-r-- 1 shlomiv shlomiv 132634863 May 28 12:20 _a3.prx
-rw-rw-r-- 1 shlomiv shlomiv 192364 May 28 12:20 _4j.cfs
-rw-rw-r-- 1 shlomiv shlomiv 117388 May 28 12:20 _3n.tii
-rw-rw-r-- 1 shlomiv shlomiv 816708539 May 28 12:20 _a3.frq
-rw-rw-r-- 1 shlomiv shlomiv 11859580 May 28 12:20 _ab.cfs
-rw-rw-r-- 1 shlomiv shlomiv 29063534 May 28 12:20 _a3.nrm
-rw-rw-r-- 1 shlomiv shlomiv 8406001 May 28 12:20 _3n.tis
-rw-rw-r-- 1 shlomiv shlomiv 32785413 May 28 12:20 _ag.cfs
-rw-rw-r-- 1 shlomiv shlomiv 1025078829 May 28 12:20 _7u.frq
-rw-rw-r-- 1 shlomiv shlomiv 4268372 May 28 12:20 _7w.cfs
-rw-rw-r-- 1 shlomiv shlomiv 2018359 May 28 12:20 _9m.cfs
-rw-rw-r-- 1 shlomiv shlomiv 26399240 May 28 12:20 _ak.cfs
-rw-rw-r-- 1 shlomiv shlomiv 6083264 May 28 12:20 _8r.cfs
-rw-rw-r-- 1 shlomiv shlomiv 25375450 May 28 12:20 _a1.cfs
-rw-rw-r-- 1 shlomiv shlomiv 8117536 May 28 12:20 _a3.tis
-rw-rw-r-- 1 shlomiv shlomiv 20583984 May 28 12:20 _a2.cfs
-rw-rw-r-- 1 shlomiv shlomiv 6141 May 28 12:20 segments_37
-rw-rw-r-- 1 shlomiv shlomiv 1409274404 May 28 12:20 _3n.fdt
-rw-rw-r-- 1 shlomiv shlomiv 29904079 May 28 12:20 _ae.cfs
-rw-rw-r-- 1 shlomiv shlomiv 42820363 May 28 12:20 _5u.cfs
-rw-rw-r-- 1 shlomiv shlomiv 24829554 May 28 12:20 _aj.cfs
-rw-rw-r-- 1 shlomiv shlomiv 24 May 28 12:20 _3n.fnm
-rw-rw-r-- 1 shlomiv shlomiv 6358589 May 28 12:20 _9x.cfs
-rw-rw-r-- 1 shlomiv shlomiv 1149812673 May 28 12:20 _a3.fdt
-rw-rw-r-- 1 shlomiv shlomiv 33184533 May 28 12:20 _ah.cfs
-rw-rw-r-- 1 shlomiv shlomiv 134733 May 28 12:20 _7u.tii
-rw-rw-r-- 1 shlomiv shlomiv 24 May 28 12:20 _7u.fnm
-rw-rw-r-- 1 shlomiv shlomiv 35973348 May 28 12:20 _3n.nrm
-rw-rw-r-- 1 shlomiv shlomiv 1427907462 May 28 12:20 _7u.fdt
-rw-rw-r-- 1 shlomiv shlomiv 970593 May 28 12:20 _5q.cfs
-rw-rw-r-- 1 shlomiv shlomiv 325456778 May 28 12:20 _ad.cfs
-rw-rw-r-- 1 shlomiv shlomiv 9615876 May 28 12:21 _7u.tis
-rw-rw-r-- 1 shlomiv shlomiv 4476572 May 28 12:21 _8l.cfs
-rw-rw-r-- 1 shlomiv shlomiv 36315548 May 28 12:21 _7u.nrm
-rw-rw-r-- 1 shlomiv shlomiv 232508244 May 28 12:21 _a3.fdx
-rw-rw-r-- 1 shlomiv shlomiv 35754834 May 28 12:21 _aa.cfs
-rw-rw-r-- 1 shlomiv shlomiv 113337 May 28 12:21 _a3.tii
-rw-rw-r-- 1 shlomiv shlomiv 287786756 May 28 12:21 _3n.fdx
-rw-rw-r-- 1 shlomiv shlomiv 164840022 May 28 12:21 _3n.prx
-rw-rw-r-- 1 shlomiv shlomiv 1202651 May 28 12:21 _7d.cfs
-rw-rw-r-- 1 shlomiv shlomiv 290524356 May 28 12:21 _7u.fdx
-rw-rw-r-- 1 shlomiv shlomiv 25101670 May 28 12:21 _a4.cfs
-rw-rw-r-- 1 shlomiv shlomiv 25086068 May 28 12:21 _al.cfs

here are luke's screen shots of the term counts for ES:

https://lh5.googleusercontent.com/-LkgOdK3YWSQ/UaR5KzUFHoI/AAAAAAAAAYo/BLYxDSkkUgQ/s1600/ES.png
and again for java :

https://lh3.googleusercontent.com/-7-kMQ7DbZ7Y/UaR5qkTQwbI/AAAAAAAAAYw/oN5NIgXUGCs/s1600/lucene.png

one thing i can say for sure - i indexed exactly the same documents.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

sorry to bother you again but if you don't optimize I can't really tell
which file uses more space and go deeper in the debugging. You node that ES
stores way more data but I need to see where it goes, can you optimize and
make sure no CFS is used and show me the listing again (should have the
same amout of files for both)

simon

On Tuesday, May 28, 2013 11:35:56 AM UTC+2, Shlomi wrote:

hey,

Yes, i thought you might want to see the original files listing, so i ran
it again over the night.

from what you say, does it mean that ES's lucene index will be about twice
the size of a plain lucene index? is it just something we have to live
with? (it's probably fine most of the times, but not for some use-cases..)

the listing is quite large... here is the one for ES: (no optimizing at
all, attempted to do "clean index dir" but it didnt delete any files..)

ls -ltra
total 16648036
drwxr-xr-x 5 elasticsearch elasticsearch 4096 May 27 17:34 ..
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 17:55 _vr.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 219332316 May 27 17:55 _vr.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 1981355017 May 27 17:55 _vr.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 677757390 May 27 17:56 _vr.tis
-rw-r--r-- 1 elasticsearch elasticsearch 6519398 May 27 17:56 _vr.tii
-rw-r--r-- 1 elasticsearch elasticsearch 401141598 May 27 17:56 _vr.prx
-rw-r--r-- 1 elasticsearch elasticsearch 925178084 May 27 17:56 _vr.frq
-rw-r--r-- 1 elasticsearch elasticsearch 27416543 May 27 17:56 _vr.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:02 _16f.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 5065644 May 27 18:02 _16f.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 45741364 May 27 18:02 _16f.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 16803650 May 27 18:02 _16f.tis
-rw-r--r-- 1 elasticsearch elasticsearch 162668 May 27 18:02 _16f.tii
-rw-r--r-- 1 elasticsearch elasticsearch 9329742 May 27 18:02 _16f.prx
-rw-r--r-- 1 elasticsearch elasticsearch 20571312 May 27 18:02 _16f.frq
-rw-r--r-- 1 elasticsearch elasticsearch 633209 May 27 18:02 _16f.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:16 _1s0.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 240390660 May 27 18:16 _1s0.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 2178157235 May 27 18:16 _1s0.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 742546522 May 27 18:17 _1s0.tis
-rw-r--r-- 1 elasticsearch elasticsearch 7152131 May 27 18:17 _1s0.tii
-rw-r--r-- 1 elasticsearch elasticsearch 440466009 May 27 18:17 _1s0.prx
-rw-r--r-- 1 elasticsearch elasticsearch 1017914310 May 27 18:17 _1s0.frq
-rw-r--r-- 1 elasticsearch elasticsearch 30048836 May 27 18:17 _1s0.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:18 _1ub.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 5279132 May 27 18:18 _1ub.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 47704644 May 27 18:18 _1ub.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 17641435 May 27 18:18 _1ub.tis
-rw-r--r-- 1 elasticsearch elasticsearch 171424 May 27 18:18 _1ub.tii
-rw-r--r-- 1 elasticsearch elasticsearch 9603799 May 27 18:18 _1ub.prx
-rw-r--r-- 1 elasticsearch elasticsearch 659895 May 27 18:18 _1ub.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 21415626 May 27 18:18 _1ub.frq
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:38 _2oj.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 2149916547 May 27 18:38 _2oj.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 238283772 May 27 18:38 _2oj.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:39 _2py.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 23177452 May 27 18:39 _2py.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 205782679 May 27 18:39 _2py.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 73967868 May 27 18:39 _2py.tis
-rw-r--r-- 1 elasticsearch elasticsearch 704213 May 27 18:39 _2py.tii
-rw-r--r-- 1 elasticsearch elasticsearch 41750825 May 27 18:39 _2py.prx
-rw-r--r-- 1 elasticsearch elasticsearch 95136302 May 27 18:39 _2py.frq
-rw-r--r-- 1 elasticsearch elasticsearch 2897185 May 27 18:39 _2py.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 735613612 May 27 18:39 _2oj.tis
-rw-r--r-- 1 elasticsearch elasticsearch 7082393 May 27 18:39 _2oj.tii
-rw-r--r-- 1 elasticsearch elasticsearch 434339734 May 27 18:39 _2oj.prx
-rw-r--r-- 1 elasticsearch elasticsearch 1005557319 May 27 18:39 _2oj.frq
-rw-r--r-- 1 elasticsearch elasticsearch 29785475 May 27 18:39 _2oj.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:40 _2rp.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 26444028 May 27 18:40 _2rp.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 239777276 May 27 18:40 _2rp.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 84715455 May 27 18:40 _2rp.tis
-rw-r--r-- 1 elasticsearch elasticsearch 808505 May 27 18:40 _2rp.tii
-rw-r--r-- 1 elasticsearch elasticsearch 48544290 May 27 18:40 _2rp.prx
-rw-r--r-- 1 elasticsearch elasticsearch 110352247 May 27 18:40 _2rp.frq
-rw-r--r-- 1 elasticsearch elasticsearch 3305507 May 27 18:40 _2rp.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:43 _2ve.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 23981876 May 27 18:43 _2ve.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 218838291 May 27 18:43 _2ve.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 76737689 May 27 18:43 _2ve.tis
-rw-r--r-- 1 elasticsearch elasticsearch 731850 May 27 18:43 _2ve.tii
-rw-r--r-- 1 elasticsearch elasticsearch 44207710 May 27 18:43 _2ve.prx
-rw-r--r-- 1 elasticsearch elasticsearch 100267293 May 27 18:43 _2ve.frq
-rw-r--r-- 1 elasticsearch elasticsearch 2997738 May 27 18:43 _2ve.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:44 _2y8.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 25456740 May 27 18:44 _2y8.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 229765680 May 27 18:44 _2y8.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 80929752 May 27 18:45 _2y8.tis
-rw-r--r-- 1 elasticsearch elasticsearch 769492 May 27 18:45 _2y8.tii
-rw-r--r-- 1 elasticsearch elasticsearch 46988152 May 27 18:45 _2y8.prx
-rw-r--r-- 1 elasticsearch elasticsearch 105942428 May 27 18:45 _2y8.frq
-rw-r--r-- 1 elasticsearch elasticsearch 3182096 May 27 18:45 _2y8.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:46 _30h.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 37740247 May 27 18:46 _30h.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 4044764 May 27 18:46 _30h.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 13444775 May 27 18:46 _30h.tis
-rw-r--r-- 1 elasticsearch elasticsearch 130539 May 27 18:46 _30h.tii
-rw-r--r-- 1 elasticsearch elasticsearch 7427643 May 27 18:46 _30h.prx
-rw-r--r-- 1 elasticsearch elasticsearch 16383492 May 27 18:46 _30h.frq
-rw-r--r-- 1 elasticsearch elasticsearch 505599 May 27 18:46 _30h.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:47 _31f.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 21543756 May 27 18:47 _31f.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 197299138 May 27 18:47 _31f.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 68879876 May 27 18:47 _31f.tis
-rw-r--r-- 1 elasticsearch elasticsearch 656343 May 27 18:47 _31f.tii
-rw-r--r-- 1 elasticsearch elasticsearch 39593224 May 27 18:47 _31f.prx
-rw-r--r-- 1 elasticsearch elasticsearch 89855250 May 27 18:47 _31f.frq
-rw-r--r-- 1 elasticsearch elasticsearch 2692973 May 27 18:47 _31f.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:47 _325.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 3627348 May 27 18:47 _325.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 32383651 May 27 18:47 _325.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 12637394 May 27 18:47 _325.tis
-rw-r--r-- 1 elasticsearch elasticsearch 124679 May 27 18:47 _325.tii
-rw-r--r-- 1 elasticsearch elasticsearch 6240381 May 27 18:47 _325.prx
-rw-r--r-- 1 elasticsearch elasticsearch 14493173 May 27 18:47 _325.frq
-rw-r--r-- 1 elasticsearch elasticsearch 453422 May 27 18:47 _325.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:49 _34x.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 22657484 May 27 18:49 _34x.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 203319124 May 27 18:49 _34x.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 72839221 May 27 18:49 _34x.tis
-rw-r--r-- 1 elasticsearch elasticsearch 695455 May 27 18:49 _34x.tii
-rw-r--r-- 1 elasticsearch elasticsearch 41510104 May 27 18:49 _34x.prx
-rw-r--r-- 1 elasticsearch elasticsearch 94734687 May 27 18:49 _34x.frq
-rw-r--r-- 1 elasticsearch elasticsearch 2832189 May 27 18:49 _34x.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:49 _357.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 3226756 May 27 18:49 _357.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 29482762 May 27 18:49 _357.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 10932081 May 27 18:49 _357.tis
-rw-r--r-- 1 elasticsearch elasticsearch 106617 May 27 18:49 _357.tii
-rw-r--r-- 1 elasticsearch elasticsearch 5931548 May 27 18:49 _357.prx
-rw-r--r-- 1 elasticsearch elasticsearch 403348 May 27 18:49 _357.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 13060705 May 27 18:49 _357.frq
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:49 _35h.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 2190948 May 27 18:49 _35h.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 19853187 May 27 18:49 _35h.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 7499717 May 27 18:49 _35h.tis
-rw-r--r-- 1 elasticsearch elasticsearch 73038 May 27 18:49 _35h.tii
-rw-r--r-- 1 elasticsearch elasticsearch 4004660 May 27 18:49 _35h.prx
-rw-r--r-- 1 elasticsearch elasticsearch 273872 May 27 18:49 _35h.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 8850349 May 27 18:49 _35h.frq
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:49 _35r.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 639604 May 27 18:49 _35r.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 5879432 May 27 18:49 _35r.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 2309837 May 27 18:49 _35r.tis
-rw-r--r-- 1 elasticsearch elasticsearch 22840 May 27 18:49 _35r.tii
-rw-r--r-- 1 elasticsearch elasticsearch 1173359 May 27 18:49 _35r.prx
-rw-r--r-- 1 elasticsearch elasticsearch 79954 May 27 18:49 _35r.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 2564359 May 27 18:49 _35r.frq
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _35y.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 2354380 May 27 18:50 _35y.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 21611994 May 27 18:50 _35y.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 8030981 May 27 18:50 _35y.tis
-rw-r--r-- 1 elasticsearch elasticsearch 78300 May 27 18:50 _35y.tii
-rw-r--r-- 1 elasticsearch elasticsearch 4317653 May 27 18:50 _35y.prx
-rw-r--r-- 1 elasticsearch elasticsearch 294301 May 27 18:50 _35y.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 9554754 May 27 18:50 _35y.frq
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _367.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 2873548 May 27 18:50 _367.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 26261768 May 27 18:50 _367.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 9730410 May 27 18:50 _367.tis
-rw-r--r-- 1 elasticsearch elasticsearch 94696 May 27 18:50 _367.tii
-rw-r--r-- 1 elasticsearch elasticsearch 11684334 May 27 18:50 _367.frq
-rw-r--r-- 1 elasticsearch elasticsearch 5265938 May 27 18:50 _367.prx
-rw-r--r-- 1 elasticsearch elasticsearch 359197 May 27 18:50 _367.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 309897 May 27 18:50 _36e.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 33652 May 27 18:50 _36e.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 151593 May 27 18:50 _36e.tis
-rw-r--r-- 1 elasticsearch elasticsearch 1642 May 27 18:50 _36e.tii
-rw-r--r-- 1 elasticsearch elasticsearch 61814 May 27 18:50 _36e.prx
-rw-r--r-- 1 elasticsearch elasticsearch 126563 May 27 18:50 _36e.frq
-rw-r--r-- 1 elasticsearch elasticsearch 4210 May 27 18:50 _36e.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _36e.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 11788 May 27 18:50 _36f.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 107037 May 27 18:50 _36f.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 59456 May 27 18:50 _36f.tis
-rw-r--r-- 1 elasticsearch elasticsearch 666 May 27 18:50 _36f.tii
-rw-r--r-- 1 elasticsearch elasticsearch 21688 May 27 18:50 _36f.prx
-rw-r--r-- 1 elasticsearch elasticsearch 1477 May 27 18:50 _36f.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 43858 May 27 18:50 _36f.frq
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _36f.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 1806 May 27 18:50 _36i.tis
-rw-r--r-- 1 elasticsearch elasticsearch 49 May 27 18:50 _36i.tii
-rw-r--r-- 1 elasticsearch elasticsearch 304 May 27 18:50 _36i.prx
-rw-r--r-- 1 elasticsearch elasticsearch 25 May 27 18:50 _36i.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 514 May 27 18:50 _36i.frq
-rw-r--r-- 1 elasticsearch elasticsearch 172 May 27 18:50 _36i.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 1576 May 27 18:50 _36i.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _36i.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _36l.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 2467780 May 27 18:50 _36l.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 22442774 May 27 18:50 _36l.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 8411793 May 27 18:50 _36l.tis
-rw-r--r-- 1 elasticsearch elasticsearch 81892 May 27 18:50 _36l.tii
-rw-r--r-- 1 elasticsearch elasticsearch 4520607 May 27 18:50 _36l.prx
-rw-r--r-- 1 elasticsearch elasticsearch 10027553 May 27 18:50 _36l.frq
-rw-r--r-- 1 elasticsearch elasticsearch 308476 May 27 18:50 _36l.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _36v.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 2480516 May 27 18:50 _36v.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 22822808 May 27 18:50 _36v.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 8801263 May 27 18:50 _36v.tis
-rw-r--r-- 1 elasticsearch elasticsearch 87153 May 27 18:50 _36v.tii
-rw-r--r-- 1 elasticsearch elasticsearch 4494453 May 27 18:50 _36v.prx
-rw-r--r-- 1 elasticsearch elasticsearch 10257485 May 27 18:50 _36v.frq
-rw-r--r-- 1 elasticsearch elasticsearch 310068 May 27 18:50 _36v.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 44132 May 27 18:50 _372.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 415201 May 27 18:50 _372.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 198527 May 27 18:50 _372.tis
-rw-r--r-- 1 elasticsearch elasticsearch 2133 May 27 18:50 _372.tii
-rw-r--r-- 1 elasticsearch elasticsearch 80597 May 27 18:50 _372.prx
-rw-r--r-- 1 elasticsearch elasticsearch 5520 May 27 18:50 _372.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 167499 May 27 18:50 _372.frq
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _372.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _375.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 3645828 May 27 18:50 _375.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 34062621 May 27 18:50 _375.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 106852 May 27 18:50 _374.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 990040 May 27 18:50 _374.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 453836 May 27 18:50 _374.tis
-rw-r--r-- 1 elasticsearch elasticsearch 4743 May 27 18:50 _374.tii
-rw-r--r-- 1 elasticsearch elasticsearch 196913 May 27 18:50 _374.prx
-rw-r--r-- 1 elasticsearch elasticsearch 420451 May 27 18:50 _374.frq
-rw-r--r-- 1 elasticsearch elasticsearch 13360 May 27 18:50 _374.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _374.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 202868 May 27 18:50 _376.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 1898905 May 27 18:50 _376.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 828823 May 27 18:50 _376.tis
-rw-r--r-- 1 elasticsearch elasticsearch 8567 May 27 18:50 _376.tii
-rw-r--r-- 1 elasticsearch elasticsearch 372288 May 27 18:50 _376.prx
-rw-r--r-- 1 elasticsearch elasticsearch 816220 May 27 18:50 _376.frq
-rw-r--r-- 1 elasticsearch elasticsearch 25362 May 27 18:50 _376.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _376.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 20540 May 27 18:50 _377.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 189637 May 27 18:50 _377.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 103554 May 27 18:50 _377.tis
-rw-r--r-- 1 elasticsearch elasticsearch 1156 May 27 18:50 _377.tii
-rw-r--r-- 1 elasticsearch elasticsearch 37645 May 27 18:50 _377.prx
-rw-r--r-- 1 elasticsearch elasticsearch 2571 May 27 18:50 _377.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 77707 May 27 18:50 _377.frq
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _377.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 12573871 May 27 18:50 _375.tis
-rw-r--r-- 1 elasticsearch elasticsearch 123694 May 27 18:50 _375.tii
-rw-r--r-- 1 elasticsearch elasticsearch 6640864 May 27 18:50 _375.prx
-rw-r--r-- 1 elasticsearch elasticsearch 14999500 May 27 18:50 _375.frq
-rw-r--r-- 1 elasticsearch elasticsearch 455732 May 27 18:50 _375.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 380684 May 27 18:50 _378.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 3608362 May 27 18:50 _378.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 1471835 May 27 18:50 _378.tis
-rw-r--r-- 1 elasticsearch elasticsearch 14864 May 27 18:50 _378.tii
-rw-r--r-- 1 elasticsearch elasticsearch 696429 May 27 18:50 _378.prx
-rw-r--r-- 1 elasticsearch elasticsearch 47589 May 27 18:50 _378.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 1544148 May 27 18:50 _378.frq
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _378.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 346908 May 27 18:50 _379.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 3295714 May 27 18:50 _379.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 1359640 May 27 18:50 _379.tis
-rw-r--r-- 1 elasticsearch elasticsearch 13724 May 27 18:50 _379.tii
-rw-r--r-- 1 elasticsearch elasticsearch 1411584 May 27 18:50 _379.frq
-rw-r--r-- 1 elasticsearch elasticsearch 633187 May 27 18:50 _379.prx
-rw-r--r-- 1 elasticsearch elasticsearch 43367 May 27 18:50 _379.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _379.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 3671437 May 27 18:50 _37a.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 390892 May 27 18:50 _37a.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 1506758 May 27 18:50 _37a.tis
-rw-r--r-- 1 elasticsearch elasticsearch 15167 May 27 18:50 _37a.tii
-rw-r--r-- 1 elasticsearch elasticsearch 713155 May 27 18:50 _37a.prx
-rw-r--r-- 1 elasticsearch elasticsearch 1584719 May 27 18:50 _37a.frq
-rw-r--r-- 1 elasticsearch elasticsearch 48865 May 27 18:50 _37a.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _37a.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 33420 May 27 18:50 _37b.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 313623 May 27 18:50 _37b.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 160429 May 27 18:50 _37b.tis
-rw-r--r-- 1 elasticsearch elasticsearch 1752 May 27 18:50 _37b.tii
-rw-r--r-- 1 elasticsearch elasticsearch 61122 May 27 18:50 _37b.prx
-rw-r--r-- 1 elasticsearch elasticsearch 4181 May 27 18:50 _37b.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 127012 May 27 18:50 _37b.frq
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _37b.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 7863 May 27 18:51
segments_4m
-rw-r--r-- 1 elasticsearch elasticsearch 20 May 27 18:51
segments.gen
drwxr-xr-x 2 elasticsearch elasticsearch 20480 May 28 11:20 .

and here is the one for java (no optimize, with clean):

ls -ltr
total 8778696
-rw-rw-r-- 1 shlomiv shlomiv 20 May 28 12:20 segments.gen
-rw-rw-r-- 1 shlomiv shlomiv 24 May 28 12:20 _a3.fnm
-rw-rw-r-- 1 shlomiv shlomiv 16048551 May 28 12:20 _a0.cfs
-rw-rw-r-- 1 shlomiv shlomiv 350019 May 28 12:20 _7j.cfs
-rw-rw-r-- 1 shlomiv shlomiv 12482462 May 28 12:20 _af.cfs
-rw-rw-r-- 1 shlomiv shlomiv 6799056 May 28 12:20 _91.cfs
-rw-rw-r-- 1 shlomiv shlomiv 7100132 May 28 12:20 _ai.cfs
-rw-rw-r-- 1 shlomiv shlomiv 1010209161 May 28 12:20 _3n.frq
-rw-rw-r-- 1 shlomiv shlomiv 15936518 May 28 12:20 _a7.cfs
-rw-rw-r-- 1 shlomiv shlomiv 167517463 May 28 12:20 _7u.prx
-rw-rw-r-- 1 shlomiv shlomiv 1261800 May 28 12:20 _9.cfs
-rw-rw-r-- 1 shlomiv shlomiv 1876833 May 28 12:20 _8g.cfs
-rw-rw-r-- 1 shlomiv shlomiv 132634863 May 28 12:20 _a3.prx
-rw-rw-r-- 1 shlomiv shlomiv 192364 May 28 12:20 _4j.cfs
-rw-rw-r-- 1 shlomiv shlomiv 117388 May 28 12:20 _3n.tii
-rw-rw-r-- 1 shlomiv shlomiv 816708539 May 28 12:20 _a3.frq
-rw-rw-r-- 1 shlomiv shlomiv 11859580 May 28 12:20 _ab.cfs
-rw-rw-r-- 1 shlomiv shlomiv 29063534 May 28 12:20 _a3.nrm
-rw-rw-r-- 1 shlomiv shlomiv 8406001 May 28 12:20 _3n.tis
-rw-rw-r-- 1 shlomiv shlomiv 32785413 May 28 12:20 _ag.cfs
-rw-rw-r-- 1 shlomiv shlomiv 1025078829 May 28 12:20 _7u.frq
-rw-rw-r-- 1 shlomiv shlomiv 4268372 May 28 12:20 _7w.cfs
-rw-rw-r-- 1 shlomiv shlomiv 2018359 May 28 12:20 _9m.cfs
-rw-rw-r-- 1 shlomiv shlomiv 26399240 May 28 12:20 _ak.cfs
-rw-rw-r-- 1 shlomiv shlomiv 6083264 May 28 12:20 _8r.cfs
-rw-rw-r-- 1 shlomiv shlomiv 25375450 May 28 12:20 _a1.cfs
-rw-rw-r-- 1 shlomiv shlomiv 8117536 May 28 12:20 _a3.tis
-rw-rw-r-- 1 shlomiv shlomiv 20583984 May 28 12:20 _a2.cfs
-rw-rw-r-- 1 shlomiv shlomiv 6141 May 28 12:20 segments_37
-rw-rw-r-- 1 shlomiv shlomiv 1409274404 May 28 12:20 _3n.fdt
-rw-rw-r-- 1 shlomiv shlomiv 29904079 May 28 12:20 _ae.cfs
-rw-rw-r-- 1 shlomiv shlomiv 42820363 May 28 12:20 _5u.cfs
-rw-rw-r-- 1 shlomiv shlomiv 24829554 May 28 12:20 _aj.cfs
-rw-rw-r-- 1 shlomiv shlomiv 24 May 28 12:20 _3n.fnm
-rw-rw-r-- 1 shlomiv shlomiv 6358589 May 28 12:20 _9x.cfs
-rw-rw-r-- 1 shlomiv shlomiv 1149812673 May 28 12:20 _a3.fdt
-rw-rw-r-- 1 shlomiv shlomiv 33184533 May 28 12:20 _ah.cfs
-rw-rw-r-- 1 shlomiv shlomiv 134733 May 28 12:20 _7u.tii
-rw-rw-r-- 1 shlomiv shlomiv 24 May 28 12:20 _7u.fnm
-rw-rw-r-- 1 shlomiv shlomiv 35973348 May 28 12:20 _3n.nrm
-rw-rw-r-- 1 shlomiv shlomiv 1427907462 May 28 12:20 _7u.fdt
-rw-rw-r-- 1 shlomiv shlomiv 970593 May 28 12:20 _5q.cfs
-rw-rw-r-- 1 shlomiv shlomiv 325456778 May 28 12:20 _ad.cfs
-rw-rw-r-- 1 shlomiv shlomiv 9615876 May 28 12:21 _7u.tis
-rw-rw-r-- 1 shlomiv shlomiv 4476572 May 28 12:21 _8l.cfs
-rw-rw-r-- 1 shlomiv shlomiv 36315548 May 28 12:21 _7u.nrm
-rw-rw-r-- 1 shlomiv shlomiv 232508244 May 28 12:21 _a3.fdx
-rw-rw-r-- 1 shlomiv shlomiv 35754834 May 28 12:21 _aa.cfs
-rw-rw-r-- 1 shlomiv shlomiv 113337 May 28 12:21 _a3.tii
-rw-rw-r-- 1 shlomiv shlomiv 287786756 May 28 12:21 _3n.fdx
-rw-rw-r-- 1 shlomiv shlomiv 164840022 May 28 12:21 _3n.prx
-rw-rw-r-- 1 shlomiv shlomiv 1202651 May 28 12:21 _7d.cfs
-rw-rw-r-- 1 shlomiv shlomiv 290524356 May 28 12:21 _7u.fdx
-rw-rw-r-- 1 shlomiv shlomiv 25101670 May 28 12:21 _a4.cfs
-rw-rw-r-- 1 shlomiv shlomiv 25086068 May 28 12:21 _al.cfs

here are luke's screen shots of the term counts for ES:

https://lh5.googleusercontent.com/-LkgOdK3YWSQ/UaR5KzUFHoI/AAAAAAAAAYo/BLYxDSkkUgQ/s1600/ES.png
and again for java :

https://lh3.googleusercontent.com/-7-kMQ7DbZ7Y/UaR5qkTQwbI/AAAAAAAAAYw/oN5NIgXUGCs/s1600/lucene.png

one thing i can say for sure - i indexed exactly the same documents.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

if i hit optimize ill get back to a single file per index..

can you think of any reason why elastic is nearly twice the size of the
original lucene index?

is it because it saves _uid's?

is there any hope to get reasonable sizes from elastic?

Thanks..

On Tuesday, May 28, 2013 12:35:56 PM UTC+3, Shlomi wrote:

hey,

Yes, i thought you might want to see the original files listing, so i ran
it again over the night.

from what you say, does it mean that ES's lucene index will be about twice
the size of a plain lucene index? is it just something we have to live
with? (it's probably fine most of the times, but not for some use-cases..)

the listing is quite large... here is the one for ES: (no optimizing at
all, attempted to do "clean index dir" but it didnt delete any files..)

ls -ltra
total 16648036
drwxr-xr-x 5 elasticsearch elasticsearch 4096 May 27 17:34 ..
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 17:55 _vr.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 219332316 May 27 17:55 _vr.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 1981355017 May 27 17:55 _vr.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 677757390 May 27 17:56 _vr.tis
-rw-r--r-- 1 elasticsearch elasticsearch 6519398 May 27 17:56 _vr.tii
-rw-r--r-- 1 elasticsearch elasticsearch 401141598 May 27 17:56 _vr.prx
-rw-r--r-- 1 elasticsearch elasticsearch 925178084 May 27 17:56 _vr.frq
-rw-r--r-- 1 elasticsearch elasticsearch 27416543 May 27 17:56 _vr.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:02 _16f.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 5065644 May 27 18:02 _16f.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 45741364 May 27 18:02 _16f.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 16803650 May 27 18:02 _16f.tis
-rw-r--r-- 1 elasticsearch elasticsearch 162668 May 27 18:02 _16f.tii
-rw-r--r-- 1 elasticsearch elasticsearch 9329742 May 27 18:02 _16f.prx
-rw-r--r-- 1 elasticsearch elasticsearch 20571312 May 27 18:02 _16f.frq
-rw-r--r-- 1 elasticsearch elasticsearch 633209 May 27 18:02 _16f.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:16 _1s0.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 240390660 May 27 18:16 _1s0.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 2178157235 May 27 18:16 _1s0.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 742546522 May 27 18:17 _1s0.tis
-rw-r--r-- 1 elasticsearch elasticsearch 7152131 May 27 18:17 _1s0.tii
-rw-r--r-- 1 elasticsearch elasticsearch 440466009 May 27 18:17 _1s0.prx
-rw-r--r-- 1 elasticsearch elasticsearch 1017914310 May 27 18:17 _1s0.frq
-rw-r--r-- 1 elasticsearch elasticsearch 30048836 May 27 18:17 _1s0.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:18 _1ub.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 5279132 May 27 18:18 _1ub.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 47704644 May 27 18:18 _1ub.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 17641435 May 27 18:18 _1ub.tis
-rw-r--r-- 1 elasticsearch elasticsearch 171424 May 27 18:18 _1ub.tii
-rw-r--r-- 1 elasticsearch elasticsearch 9603799 May 27 18:18 _1ub.prx
-rw-r--r-- 1 elasticsearch elasticsearch 659895 May 27 18:18 _1ub.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 21415626 May 27 18:18 _1ub.frq
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:38 _2oj.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 2149916547 May 27 18:38 _2oj.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 238283772 May 27 18:38 _2oj.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:39 _2py.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 23177452 May 27 18:39 _2py.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 205782679 May 27 18:39 _2py.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 73967868 May 27 18:39 _2py.tis
-rw-r--r-- 1 elasticsearch elasticsearch 704213 May 27 18:39 _2py.tii
-rw-r--r-- 1 elasticsearch elasticsearch 41750825 May 27 18:39 _2py.prx
-rw-r--r-- 1 elasticsearch elasticsearch 95136302 May 27 18:39 _2py.frq
-rw-r--r-- 1 elasticsearch elasticsearch 2897185 May 27 18:39 _2py.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 735613612 May 27 18:39 _2oj.tis
-rw-r--r-- 1 elasticsearch elasticsearch 7082393 May 27 18:39 _2oj.tii
-rw-r--r-- 1 elasticsearch elasticsearch 434339734 May 27 18:39 _2oj.prx
-rw-r--r-- 1 elasticsearch elasticsearch 1005557319 May 27 18:39 _2oj.frq
-rw-r--r-- 1 elasticsearch elasticsearch 29785475 May 27 18:39 _2oj.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:40 _2rp.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 26444028 May 27 18:40 _2rp.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 239777276 May 27 18:40 _2rp.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 84715455 May 27 18:40 _2rp.tis
-rw-r--r-- 1 elasticsearch elasticsearch 808505 May 27 18:40 _2rp.tii
-rw-r--r-- 1 elasticsearch elasticsearch 48544290 May 27 18:40 _2rp.prx
-rw-r--r-- 1 elasticsearch elasticsearch 110352247 May 27 18:40 _2rp.frq
-rw-r--r-- 1 elasticsearch elasticsearch 3305507 May 27 18:40 _2rp.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:43 _2ve.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 23981876 May 27 18:43 _2ve.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 218838291 May 27 18:43 _2ve.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 76737689 May 27 18:43 _2ve.tis
-rw-r--r-- 1 elasticsearch elasticsearch 731850 May 27 18:43 _2ve.tii
-rw-r--r-- 1 elasticsearch elasticsearch 44207710 May 27 18:43 _2ve.prx
-rw-r--r-- 1 elasticsearch elasticsearch 100267293 May 27 18:43 _2ve.frq
-rw-r--r-- 1 elasticsearch elasticsearch 2997738 May 27 18:43 _2ve.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:44 _2y8.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 25456740 May 27 18:44 _2y8.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 229765680 May 27 18:44 _2y8.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 80929752 May 27 18:45 _2y8.tis
-rw-r--r-- 1 elasticsearch elasticsearch 769492 May 27 18:45 _2y8.tii
-rw-r--r-- 1 elasticsearch elasticsearch 46988152 May 27 18:45 _2y8.prx
-rw-r--r-- 1 elasticsearch elasticsearch 105942428 May 27 18:45 _2y8.frq
-rw-r--r-- 1 elasticsearch elasticsearch 3182096 May 27 18:45 _2y8.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:46 _30h.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 37740247 May 27 18:46 _30h.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 4044764 May 27 18:46 _30h.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 13444775 May 27 18:46 _30h.tis
-rw-r--r-- 1 elasticsearch elasticsearch 130539 May 27 18:46 _30h.tii
-rw-r--r-- 1 elasticsearch elasticsearch 7427643 May 27 18:46 _30h.prx
-rw-r--r-- 1 elasticsearch elasticsearch 16383492 May 27 18:46 _30h.frq
-rw-r--r-- 1 elasticsearch elasticsearch 505599 May 27 18:46 _30h.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:47 _31f.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 21543756 May 27 18:47 _31f.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 197299138 May 27 18:47 _31f.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 68879876 May 27 18:47 _31f.tis
-rw-r--r-- 1 elasticsearch elasticsearch 656343 May 27 18:47 _31f.tii
-rw-r--r-- 1 elasticsearch elasticsearch 39593224 May 27 18:47 _31f.prx
-rw-r--r-- 1 elasticsearch elasticsearch 89855250 May 27 18:47 _31f.frq
-rw-r--r-- 1 elasticsearch elasticsearch 2692973 May 27 18:47 _31f.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:47 _325.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 3627348 May 27 18:47 _325.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 32383651 May 27 18:47 _325.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 12637394 May 27 18:47 _325.tis
-rw-r--r-- 1 elasticsearch elasticsearch 124679 May 27 18:47 _325.tii
-rw-r--r-- 1 elasticsearch elasticsearch 6240381 May 27 18:47 _325.prx
-rw-r--r-- 1 elasticsearch elasticsearch 14493173 May 27 18:47 _325.frq
-rw-r--r-- 1 elasticsearch elasticsearch 453422 May 27 18:47 _325.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:49 _34x.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 22657484 May 27 18:49 _34x.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 203319124 May 27 18:49 _34x.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 72839221 May 27 18:49 _34x.tis
-rw-r--r-- 1 elasticsearch elasticsearch 695455 May 27 18:49 _34x.tii
-rw-r--r-- 1 elasticsearch elasticsearch 41510104 May 27 18:49 _34x.prx
-rw-r--r-- 1 elasticsearch elasticsearch 94734687 May 27 18:49 _34x.frq
-rw-r--r-- 1 elasticsearch elasticsearch 2832189 May 27 18:49 _34x.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:49 _357.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 3226756 May 27 18:49 _357.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 29482762 May 27 18:49 _357.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 10932081 May 27 18:49 _357.tis
-rw-r--r-- 1 elasticsearch elasticsearch 106617 May 27 18:49 _357.tii
-rw-r--r-- 1 elasticsearch elasticsearch 5931548 May 27 18:49 _357.prx
-rw-r--r-- 1 elasticsearch elasticsearch 403348 May 27 18:49 _357.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 13060705 May 27 18:49 _357.frq
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:49 _35h.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 2190948 May 27 18:49 _35h.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 19853187 May 27 18:49 _35h.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 7499717 May 27 18:49 _35h.tis
-rw-r--r-- 1 elasticsearch elasticsearch 73038 May 27 18:49 _35h.tii
-rw-r--r-- 1 elasticsearch elasticsearch 4004660 May 27 18:49 _35h.prx
-rw-r--r-- 1 elasticsearch elasticsearch 273872 May 27 18:49 _35h.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 8850349 May 27 18:49 _35h.frq
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:49 _35r.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 639604 May 27 18:49 _35r.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 5879432 May 27 18:49 _35r.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 2309837 May 27 18:49 _35r.tis
-rw-r--r-- 1 elasticsearch elasticsearch 22840 May 27 18:49 _35r.tii
-rw-r--r-- 1 elasticsearch elasticsearch 1173359 May 27 18:49 _35r.prx
-rw-r--r-- 1 elasticsearch elasticsearch 79954 May 27 18:49 _35r.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 2564359 May 27 18:49 _35r.frq
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _35y.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 2354380 May 27 18:50 _35y.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 21611994 May 27 18:50 _35y.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 8030981 May 27 18:50 _35y.tis
-rw-r--r-- 1 elasticsearch elasticsearch 78300 May 27 18:50 _35y.tii
-rw-r--r-- 1 elasticsearch elasticsearch 4317653 May 27 18:50 _35y.prx
-rw-r--r-- 1 elasticsearch elasticsearch 294301 May 27 18:50 _35y.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 9554754 May 27 18:50 _35y.frq
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _367.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 2873548 May 27 18:50 _367.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 26261768 May 27 18:50 _367.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 9730410 May 27 18:50 _367.tis
-rw-r--r-- 1 elasticsearch elasticsearch 94696 May 27 18:50 _367.tii
-rw-r--r-- 1 elasticsearch elasticsearch 11684334 May 27 18:50 _367.frq
-rw-r--r-- 1 elasticsearch elasticsearch 5265938 May 27 18:50 _367.prx
-rw-r--r-- 1 elasticsearch elasticsearch 359197 May 27 18:50 _367.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 309897 May 27 18:50 _36e.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 33652 May 27 18:50 _36e.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 151593 May 27 18:50 _36e.tis
-rw-r--r-- 1 elasticsearch elasticsearch 1642 May 27 18:50 _36e.tii
-rw-r--r-- 1 elasticsearch elasticsearch 61814 May 27 18:50 _36e.prx
-rw-r--r-- 1 elasticsearch elasticsearch 126563 May 27 18:50 _36e.frq
-rw-r--r-- 1 elasticsearch elasticsearch 4210 May 27 18:50 _36e.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _36e.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 11788 May 27 18:50 _36f.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 107037 May 27 18:50 _36f.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 59456 May 27 18:50 _36f.tis
-rw-r--r-- 1 elasticsearch elasticsearch 666 May 27 18:50 _36f.tii
-rw-r--r-- 1 elasticsearch elasticsearch 21688 May 27 18:50 _36f.prx
-rw-r--r-- 1 elasticsearch elasticsearch 1477 May 27 18:50 _36f.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 43858 May 27 18:50 _36f.frq
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _36f.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 1806 May 27 18:50 _36i.tis
-rw-r--r-- 1 elasticsearch elasticsearch 49 May 27 18:50 _36i.tii
-rw-r--r-- 1 elasticsearch elasticsearch 304 May 27 18:50 _36i.prx
-rw-r--r-- 1 elasticsearch elasticsearch 25 May 27 18:50 _36i.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 514 May 27 18:50 _36i.frq
-rw-r--r-- 1 elasticsearch elasticsearch 172 May 27 18:50 _36i.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 1576 May 27 18:50 _36i.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _36i.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _36l.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 2467780 May 27 18:50 _36l.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 22442774 May 27 18:50 _36l.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 8411793 May 27 18:50 _36l.tis
-rw-r--r-- 1 elasticsearch elasticsearch 81892 May 27 18:50 _36l.tii
-rw-r--r-- 1 elasticsearch elasticsearch 4520607 May 27 18:50 _36l.prx
-rw-r--r-- 1 elasticsearch elasticsearch 10027553 May 27 18:50 _36l.frq
-rw-r--r-- 1 elasticsearch elasticsearch 308476 May 27 18:50 _36l.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _36v.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 2480516 May 27 18:50 _36v.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 22822808 May 27 18:50 _36v.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 8801263 May 27 18:50 _36v.tis
-rw-r--r-- 1 elasticsearch elasticsearch 87153 May 27 18:50 _36v.tii
-rw-r--r-- 1 elasticsearch elasticsearch 4494453 May 27 18:50 _36v.prx
-rw-r--r-- 1 elasticsearch elasticsearch 10257485 May 27 18:50 _36v.frq
-rw-r--r-- 1 elasticsearch elasticsearch 310068 May 27 18:50 _36v.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 44132 May 27 18:50 _372.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 415201 May 27 18:50 _372.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 198527 May 27 18:50 _372.tis
-rw-r--r-- 1 elasticsearch elasticsearch 2133 May 27 18:50 _372.tii
-rw-r--r-- 1 elasticsearch elasticsearch 80597 May 27 18:50 _372.prx
-rw-r--r-- 1 elasticsearch elasticsearch 5520 May 27 18:50 _372.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 167499 May 27 18:50 _372.frq
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _372.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _375.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 3645828 May 27 18:50 _375.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 34062621 May 27 18:50 _375.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 106852 May 27 18:50 _374.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 990040 May 27 18:50 _374.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 453836 May 27 18:50 _374.tis
-rw-r--r-- 1 elasticsearch elasticsearch 4743 May 27 18:50 _374.tii
-rw-r--r-- 1 elasticsearch elasticsearch 196913 May 27 18:50 _374.prx
-rw-r--r-- 1 elasticsearch elasticsearch 420451 May 27 18:50 _374.frq
-rw-r--r-- 1 elasticsearch elasticsearch 13360 May 27 18:50 _374.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _374.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 202868 May 27 18:50 _376.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 1898905 May 27 18:50 _376.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 828823 May 27 18:50 _376.tis
-rw-r--r-- 1 elasticsearch elasticsearch 8567 May 27 18:50 _376.tii
-rw-r--r-- 1 elasticsearch elasticsearch 372288 May 27 18:50 _376.prx
-rw-r--r-- 1 elasticsearch elasticsearch 816220 May 27 18:50 _376.frq
-rw-r--r-- 1 elasticsearch elasticsearch 25362 May 27 18:50 _376.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _376.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 20540 May 27 18:50 _377.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 189637 May 27 18:50 _377.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 103554 May 27 18:50 _377.tis
-rw-r--r-- 1 elasticsearch elasticsearch 1156 May 27 18:50 _377.tii
-rw-r--r-- 1 elasticsearch elasticsearch 37645 May 27 18:50 _377.prx
-rw-r--r-- 1 elasticsearch elasticsearch 2571 May 27 18:50 _377.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 77707 May 27 18:50 _377.frq
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _377.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 12573871 May 27 18:50 _375.tis
-rw-r--r-- 1 elasticsearch elasticsearch 123694 May 27 18:50 _375.tii
-rw-r--r-- 1 elasticsearch elasticsearch 6640864 May 27 18:50 _375.prx
-rw-r--r-- 1 elasticsearch elasticsearch 14999500 May 27 18:50 _375.frq
-rw-r--r-- 1 elasticsearch elasticsearch 455732 May 27 18:50 _375.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 380684 May 27 18:50 _378.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 3608362 May 27 18:50 _378.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 1471835 May 27 18:50 _378.tis
-rw-r--r-- 1 elasticsearch elasticsearch 14864 May 27 18:50 _378.tii
-rw-r--r-- 1 elasticsearch elasticsearch 696429 May 27 18:50 _378.prx
-rw-r--r-- 1 elasticsearch elasticsearch 47589 May 27 18:50 _378.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 1544148 May 27 18:50 _378.frq
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _378.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 346908 May 27 18:50 _379.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 3295714 May 27 18:50 _379.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 1359640 May 27 18:50 _379.tis
-rw-r--r-- 1 elasticsearch elasticsearch 13724 May 27 18:50 _379.tii
-rw-r--r-- 1 elasticsearch elasticsearch 1411584 May 27 18:50 _379.frq
-rw-r--r-- 1 elasticsearch elasticsearch 633187 May 27 18:50 _379.prx
-rw-r--r-- 1 elasticsearch elasticsearch 43367 May 27 18:50 _379.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _379.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 3671437 May 27 18:50 _37a.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 390892 May 27 18:50 _37a.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 1506758 May 27 18:50 _37a.tis
-rw-r--r-- 1 elasticsearch elasticsearch 15167 May 27 18:50 _37a.tii
-rw-r--r-- 1 elasticsearch elasticsearch 713155 May 27 18:50 _37a.prx
-rw-r--r-- 1 elasticsearch elasticsearch 1584719 May 27 18:50 _37a.frq
-rw-r--r-- 1 elasticsearch elasticsearch 48865 May 27 18:50 _37a.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _37a.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 33420 May 27 18:50 _37b.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 313623 May 27 18:50 _37b.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 160429 May 27 18:50 _37b.tis
-rw-r--r-- 1 elasticsearch elasticsearch 1752 May 27 18:50 _37b.tii
-rw-r--r-- 1 elasticsearch elasticsearch 61122 May 27 18:50 _37b.prx
-rw-r--r-- 1 elasticsearch elasticsearch 4181 May 27 18:50 _37b.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 127012 May 27 18:50 _37b.frq
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:50 _37b.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 7863 May 27 18:51
segments_4m
-rw-r--r-- 1 elasticsearch elasticsearch 20 May 27 18:51
segments.gen
drwxr-xr-x 2 elasticsearch elasticsearch 20480 May 28 11:20 .

and here is the one for java (no optimize, with clean):

ls -ltr
total 8778696
-rw-rw-r-- 1 shlomiv shlomiv 20 May 28 12:20 segments.gen
-rw-rw-r-- 1 shlomiv shlomiv 24 May 28 12:20 _a3.fnm
-rw-rw-r-- 1 shlomiv shlomiv 16048551 May 28 12:20 _a0.cfs
-rw-rw-r-- 1 shlomiv shlomiv 350019 May 28 12:20 _7j.cfs
-rw-rw-r-- 1 shlomiv shlomiv 12482462 May 28 12:20 _af.cfs
-rw-rw-r-- 1 shlomiv shlomiv 6799056 May 28 12:20 _91.cfs
-rw-rw-r-- 1 shlomiv shlomiv 7100132 May 28 12:20 _ai.cfs
-rw-rw-r-- 1 shlomiv shlomiv 1010209161 May 28 12:20 _3n.frq
-rw-rw-r-- 1 shlomiv shlomiv 15936518 May 28 12:20 _a7.cfs
-rw-rw-r-- 1 shlomiv shlomiv 167517463 May 28 12:20 _7u.prx
-rw-rw-r-- 1 shlomiv shlomiv 1261800 May 28 12:20 _9.cfs
-rw-rw-r-- 1 shlomiv shlomiv 1876833 May 28 12:20 _8g.cfs
-rw-rw-r-- 1 shlomiv shlomiv 132634863 May 28 12:20 _a3.prx
-rw-rw-r-- 1 shlomiv shlomiv 192364 May 28 12:20 _4j.cfs
-rw-rw-r-- 1 shlomiv shlomiv 117388 May 28 12:20 _3n.tii
-rw-rw-r-- 1 shlomiv shlomiv 816708539 May 28 12:20 _a3.frq
-rw-rw-r-- 1 shlomiv shlomiv 11859580 May 28 12:20 _ab.cfs
-rw-rw-r-- 1 shlomiv shlomiv 29063534 May 28 12:20 _a3.nrm
-rw-rw-r-- 1 shlomiv shlomiv 8406001 May 28 12:20 _3n.tis
-rw-rw-r-- 1 shlomiv shlomiv 32785413 May 28 12:20 _ag.cfs
-rw-rw-r-- 1 shlomiv shlomiv 1025078829 May 28 12:20 _7u.frq
-rw-rw-r-- 1 shlomiv shlomiv 4268372 May 28 12:20 _7w.cfs
-rw-rw-r-- 1 shlomiv shlomiv 2018359 May 28 12:20 _9m.cfs
-rw-rw-r-- 1 shlomiv shlomiv 26399240 May 28 12:20 _ak.cfs
-rw-rw-r-- 1 shlomiv shlomiv 6083264 May 28 12:20 _8r.cfs
-rw-rw-r-- 1 shlomiv shlomiv 25375450 May 28 12:20 _a1.cfs
-rw-rw-r-- 1 shlomiv shlomiv 8117536 May 28 12:20 _a3.tis
-rw-rw-r-- 1 shlomiv shlomiv 20583984 May 28 12:20 _a2.cfs
-rw-rw-r-- 1 shlomiv shlomiv 6141 May 28 12:20 segments_37
-rw-rw-r-- 1 shlomiv shlomiv 1409274404 May 28 12:20 _3n.fdt
-rw-rw-r-- 1 shlomiv shlomiv 29904079 May 28 12:20 _ae.cfs
-rw-rw-r-- 1 shlomiv shlomiv 42820363 May 28 12:20 _5u.cfs
-rw-rw-r-- 1 shlomiv shlomiv 24829554 May 28 12:20 _aj.cfs
-rw-rw-r-- 1 shlomiv shlomiv 24 May 28 12:20 _3n.fnm
-rw-rw-r-- 1 shlomiv shlomiv 6358589 May 28 12:20 _9x.cfs
-rw-rw-r-- 1 shlomiv shlomiv 1149812673 May 28 12:20 _a3.fdt
-rw-rw-r-- 1 shlomiv shlomiv 33184533 May 28 12:20 _ah.cfs
-rw-rw-r-- 1 shlomiv shlomiv 134733 May 28 12:20 _7u.tii
-rw-rw-r-- 1 shlomiv shlomiv 24 May 28 12:20 _7u.fnm
-rw-rw-r-- 1 shlomiv shlomiv 35973348 May 28 12:20 _3n.nrm
-rw-rw-r-- 1 shlomiv shlomiv 1427907462 May 28 12:20 _7u.fdt
-rw-rw-r-- 1 shlomiv shlomiv 970593 May 28 12:20 _5q.cfs
-rw-rw-r-- 1 shlomiv shlomiv 325456778 May 28 12:20 _ad.cfs
-rw-rw-r-- 1 shlomiv shlomiv 9615876 May 28 12:21 _7u.tis
-rw-rw-r-- 1 shlomiv shlomiv 4476572 May 28 12:21 _8l.cfs
-rw-rw-r-- 1 shlomiv shlomiv 36315548 May 28 12:21 _7u.nrm
-rw-rw-r-- 1 shlomiv shlomiv 232508244 May 28 12:21 _a3.fdx
-rw-rw-r-- 1 shlomiv shlomiv 35754834 May 28 12:21 _aa.cfs
-rw-rw-r-- 1 shlomiv shlomiv 113337 May 28 12:21 _a3.tii
-rw-rw-r-- 1 shlomiv shlomiv 287786756 May 28 12:21 _3n.fdx
-rw-rw-r-- 1 shlomiv shlomiv 164840022 May 28 12:21 _3n.prx
-rw-rw-r-- 1 shlomiv shlomiv 1202651 May 28 12:21 _7d.cfs
-rw-rw-r-- 1 shlomiv shlomiv 290524356 May 28 12:21 _7u.fdx
-rw-rw-r-- 1 shlomiv shlomiv 25101670 May 28 12:21 _a4.cfs
-rw-rw-r-- 1 shlomiv shlomiv 25086068 May 28 12:21 _al.cfs

here are luke's screen shots of the term counts for ES:

https://lh5.googleusercontent.com/-LkgOdK3YWSQ/UaR5KzUFHoI/AAAAAAAAAYo/BLYxDSkkUgQ/s1600/ES.png
and again for java :

https://lh3.googleusercontent.com/-7-kMQ7DbZ7Y/UaR5qkTQwbI/AAAAAAAAAYw/oN5NIgXUGCs/s1600/lucene.png

one thing i can say for sure - i indexed exactly the same documents.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

No, it's not the _uids. You have just different configurations.

In your file listing you have listed ES in Lucene's native non-CFS
format and Lucene variant in CFS format. This does not help much, you
can not compare the size of non-CFS with CFS index.

Maybe you really prefer CFS index format for whatever reason. Then just
set in ES config index.compound_format to true, or better, disable CFS
in your Lucene variant. I repeat CFS is not good for performance, you
will have higher search and index times.

Jörg

Am 29.05.13 16:01, schrieb Shlomi:

if i hit optimize ill get back to a single file per index..

can you think of any reason why elastic is nearly twice the size of
the original lucene index?

is it because it saves _uid's?

is there any hope to get reasonable sizes from elastic?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks Jörg,

Ok, i disabled CFS in lucene, and ran it again. it didnt seem to make much
difference:

ls -ltra

total 8777268
drwxrwxrwt 30 root root 20480 May 29 18:30 ..
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:39 _34.fnm
-rw-rw-r-- 1 shlomiv shlomiv 263247124 May 29 18:39 _34.fdx
-rw-rw-r-- 1 shlomiv shlomiv 1295797937 May 29 18:39 _34.fdt
-rw-rw-r-- 1 shlomiv shlomiv 8154272 May 29 18:40 _34.tis
-rw-rw-r-- 1 shlomiv shlomiv 114159 May 29 18:40 _34.tii
-rw-rw-r-- 1 shlomiv shlomiv 151801202 May 29 18:40 _34.prx
-rw-rw-r-- 1 shlomiv shlomiv 925753058 May 29 18:40 _34.frq
-rw-rw-r-- 1 shlomiv shlomiv 32905894 May 29 18:40 _34.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:47 _66.fnm
-rw-rw-r-- 1 shlomiv shlomiv 244284132 May 29 18:47 _66.fdx
-rw-rw-r-- 1 shlomiv shlomiv 1200444042 May 29 18:47 _66.fdt
-rw-rw-r-- 1 shlomiv shlomiv 9156383 May 29 18:48 _66.tis
-rw-rw-r-- 1 shlomiv shlomiv 127850 May 29 18:48 _66.tii
-rw-rw-r-- 1 shlomiv shlomiv 140138826 May 29 18:48 _66.prx
-rw-rw-r-- 1 shlomiv shlomiv 862107109 May 29 18:48 _66.frq
-rw-rw-r-- 1 shlomiv shlomiv 30535520 May 29 18:48 _66.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:56 _99.fnm
-rw-rw-r-- 1 shlomiv shlomiv 254267076 May 29 18:56 _99.fdx
-rw-rw-r-- 1 shlomiv shlomiv 1245411536 May 29 18:56 _99.fdt
-rw-rw-r-- 1 shlomiv shlomiv 8429835 May 29 18:57 _99.tis
-rw-rw-r-- 1 shlomiv shlomiv 117255 May 29 18:57 _99.tii
-rw-rw-r-- 1 shlomiv shlomiv 144073653 May 29 18:57 _99.prx
-rw-rw-r-- 1 shlomiv shlomiv 891520868 May 29 18:57 _99.frq
-rw-rw-r-- 1 shlomiv shlomiv 31783388 May 29 18:57 _99.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:57 _9j.fnm
-rw-rw-r-- 1 shlomiv shlomiv 27082292 May 29 18:57 _9j.fdx
-rw-rw-r-- 1 shlomiv shlomiv 135824217 May 29 18:57 _9j.fdt
-rw-rw-r-- 1 shlomiv shlomiv 2030268 May 29 18:57 _9j.tis
-rw-rw-r-- 1 shlomiv shlomiv 28317 May 29 18:57 _9j.tii
-rw-rw-r-- 1 shlomiv shlomiv 15952977 May 29 18:57 _9j.prx
-rw-rw-r-- 1 shlomiv shlomiv 94820220 May 29 18:57 _9j.frq
-rw-rw-r-- 1 shlomiv shlomiv 3385290 May 29 18:57 _9j.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:58 _9u.fnm
-rw-rw-r-- 1 shlomiv shlomiv 27505988 May 29 18:58 _9u.fdx
-rw-rw-r-- 1 shlomiv shlomiv 136384808 May 29 18:58 _9u.fdt
-rw-rw-r-- 1 shlomiv shlomiv 1674541 May 29 18:58 _9u.tis
-rw-rw-r-- 1 shlomiv shlomiv 23094 May 29 18:58 _9u.tii
-rw-rw-r-- 1 shlomiv shlomiv 16176340 May 29 18:58 _9u.prx
-rw-rw-r-- 1 shlomiv shlomiv 96098118 May 29 18:58 _9u.frq
-rw-rw-r-- 1 shlomiv shlomiv 3438252 May 29 18:58 _9u.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:59 _a5.fnm
-rw-rw-r-- 1 shlomiv shlomiv 24784676 May 29 18:59 _a5.fdx
-rw-rw-r-- 1 shlomiv shlomiv 120749640 May 29 18:59 _a5.fdt
-rw-rw-r-- 1 shlomiv shlomiv 2457010 May 29 18:59 _a5.tis
-rw-rw-r-- 1 shlomiv shlomiv 33481 May 29 18:59 _a5.tii
-rw-rw-r-- 1 shlomiv shlomiv 13850257 May 29 18:59 _a5.prx
-rw-rw-r-- 1 shlomiv shlomiv 86453688 May 29 18:59 _a5.frq
-rw-rw-r-- 1 shlomiv shlomiv 3098088 May 29 18:59 _a5.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:59 _ag.fnm
-rw-rw-r-- 1 shlomiv shlomiv 26705412 May 29 18:59 _ag.fdx
-rw-rw-r-- 1 shlomiv shlomiv 131405660 May 29 18:59 _ag.fdt
-rw-rw-r-- 1 shlomiv shlomiv 1204948 May 29 19:00 _ah.fdx
-rw-rw-r-- 1 shlomiv shlomiv 5911795 May 29 19:00 _ah.fdt
-rw-rw-r-- 1 shlomiv shlomiv 330832 May 29 19:00 _ah.tis
-rw-rw-r-- 1 shlomiv shlomiv 4639 May 29 19:00 _ah.tii
-rw-rw-r-- 1 shlomiv shlomiv 696420 May 29 19:00 _ah.prx
-rw-rw-r-- 1 shlomiv shlomiv 150622 May 29 19:00 _ah.nrm
-rw-rw-r-- 1 shlomiv shlomiv 4183072 May 29 19:00 _ah.frq
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _ah.fnm
-rw-rw-r-- 1 shlomiv shlomiv 2569983 May 29 19:00 _ag.tis
-rw-rw-r-- 1 shlomiv shlomiv 35436 May 29 19:00 _ag.tii
-rw-rw-r-- 1 shlomiv shlomiv 15571391 May 29 19:00 _ag.prx
-rw-rw-r-- 1 shlomiv shlomiv 94078982 May 29 19:00 _ag.frq
-rw-rw-r-- 1 shlomiv shlomiv 3338180 May 29 19:00 _ag.nrm
-rw-rw-r-- 1 shlomiv shlomiv 3123508 May 29 19:00 _ai.fdx
-rw-rw-r-- 1 shlomiv shlomiv 15925878 May 29 19:00 _ai.fdt
-rw-rw-r-- 1 shlomiv shlomiv 614128 May 29 19:00 _ai.tis
-rw-rw-r-- 1 shlomiv shlomiv 8534 May 29 19:00 _ai.tii
-rw-rw-r-- 1 shlomiv shlomiv 1811166 May 29 19:00 _ai.prx
-rw-rw-r-- 1 shlomiv shlomiv 390442 May 29 19:00 _ai.nrm
-rw-rw-r-- 1 shlomiv shlomiv 10911623 May 29 19:00 _ai.frq
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _ai.fnm
-rw-rw-r-- 1 shlomiv shlomiv 3217708 May 29 19:00 _aj.fdx
-rw-rw-r-- 1 shlomiv shlomiv 15914211 May 29 19:00 _aj.fdt
-rw-rw-r-- 1 shlomiv shlomiv 577853 May 29 19:00 _aj.tis
-rw-rw-r-- 1 shlomiv shlomiv 7902 May 29 19:00 _aj.tii
-rw-rw-r-- 1 shlomiv shlomiv 1874837 May 29 19:00 _aj.prx
-rw-rw-r-- 1 shlomiv shlomiv 11189671 May 29 19:00 _aj.frq
-rw-rw-r-- 1 shlomiv shlomiv 402217 May 29 19:00 _aj.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _aj.fnm
-rw-rw-r-- 1 shlomiv shlomiv 671412 May 29 19:00 _ak.fdx
-rw-rw-r-- 1 shlomiv shlomiv 3394019 May 29 19:00 _ak.fdt
-rw-rw-r-- 1 shlomiv shlomiv 232124 May 29 19:00 _ak.tis
-rw-rw-r-- 1 shlomiv shlomiv 3258 May 29 19:00 _ak.tii
-rw-rw-r-- 1 shlomiv shlomiv 390852 May 29 19:00 _ak.prx
-rw-rw-r-- 1 shlomiv shlomiv 83930 May 29 19:00 _ak.nrm
-rw-rw-r-- 1 shlomiv shlomiv 2324403 May 29 19:00 _ak.frq
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _ak.fnm
-rw-rw-r-- 1 shlomiv shlomiv 2320804 May 29 19:00 _al.fdx
-rw-rw-r-- 1 shlomiv shlomiv 11751843 May 29 19:00 _al.fdt
-rw-rw-r-- 1 shlomiv shlomiv 919429 May 29 19:00 _al.tis
-rw-rw-r-- 1 shlomiv shlomiv 12504 May 29 19:00 _al.tii
-rw-rw-r-- 1 shlomiv shlomiv 1279214 May 29 19:00 _al.prx
-rw-rw-r-- 1 shlomiv shlomiv 8255522 May 29 19:00 _al.frq
-rw-rw-r-- 1 shlomiv shlomiv 290104 May 29 19:00 _al.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _al.fnm
-rw-rw-r-- 1 shlomiv shlomiv 2432020 May 29 19:00 _am.fdx
-rw-rw-r-- 1 shlomiv shlomiv 12765052 May 29 19:00 _am.fdt
-rw-rw-r-- 1 shlomiv shlomiv 885852 May 29 19:00 _am.tis
-rw-rw-r-- 1 shlomiv shlomiv 12305 May 29 19:00 _am.tii
-rw-rw-r-- 1 shlomiv shlomiv 1360925 May 29 19:00 _am.prx
-rw-rw-r-- 1 shlomiv shlomiv 304006 May 29 19:00 _am.nrm
-rw-rw-r-- 1 shlomiv shlomiv 8638946 May 29 19:00 _am.frq
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _am.fnm
-rw-rw-r-- 1 shlomiv shlomiv 2304708 May 29 19:00 _an.fdx
-rw-rw-r-- 1 shlomiv shlomiv 12215320 May 29 19:00 _an.fdt
-rw-rw-r-- 1 shlomiv shlomiv 740581 May 29 19:00 _an.tis
-rw-rw-r-- 1 shlomiv shlomiv 10122 May 29 19:00 _an.tii
-rw-rw-r-- 1 shlomiv shlomiv 1325492 May 29 19:00 _an.prx
-rw-rw-r-- 1 shlomiv shlomiv 288092 May 29 19:00 _an.nrm
-rw-rw-r-- 1 shlomiv shlomiv 8201619 May 29 19:00 _an.frq
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _an.fnm
-rw-rw-r-- 1 shlomiv shlomiv 20 May 29 19:00 segments.gen
-rw-rw-r-- 1 shlomiv shlomiv 2888 May 29 19:00 segments_37
drwxrwxr-x 2 shlomiv shlomiv 4096 May 29 19:04 .

what do you think i should try next?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I think Simon asked for a completely optimized listing of each index (max
segments = 1).

On Wed, May 29, 2013 at 9:12 AM, Shlomi Vaknin shlomivaknin@gmail.comwrote:

Thanks Jörg,

Ok, i disabled CFS in lucene, and ran it again. it didnt seem to make much
difference:

ls -ltra

total 8777268
drwxrwxrwt 30 root root 20480 May 29 18:30 ..
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:39 _34.fnm
-rw-rw-r-- 1 shlomiv shlomiv 263247124 May 29 18:39 _34.fdx
-rw-rw-r-- 1 shlomiv shlomiv 1295797937 May 29 18:39 _34.fdt
-rw-rw-r-- 1 shlomiv shlomiv 8154272 May 29 18:40 _34.tis
-rw-rw-r-- 1 shlomiv shlomiv 114159 May 29 18:40 _34.tii
-rw-rw-r-- 1 shlomiv shlomiv 151801202 May 29 18:40 _34.prx
-rw-rw-r-- 1 shlomiv shlomiv 925753058 May 29 18:40 _34.frq
-rw-rw-r-- 1 shlomiv shlomiv 32905894 May 29 18:40 _34.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:47 _66.fnm
-rw-rw-r-- 1 shlomiv shlomiv 244284132 May 29 18:47 _66.fdx
-rw-rw-r-- 1 shlomiv shlomiv 1200444042 May 29 18:47 _66.fdt
-rw-rw-r-- 1 shlomiv shlomiv 9156383 May 29 18:48 _66.tis
-rw-rw-r-- 1 shlomiv shlomiv 127850 May 29 18:48 _66.tii
-rw-rw-r-- 1 shlomiv shlomiv 140138826 May 29 18:48 _66.prx
-rw-rw-r-- 1 shlomiv shlomiv 862107109 May 29 18:48 _66.frq
-rw-rw-r-- 1 shlomiv shlomiv 30535520 May 29 18:48 _66.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:56 _99.fnm
-rw-rw-r-- 1 shlomiv shlomiv 254267076 May 29 18:56 _99.fdx
-rw-rw-r-- 1 shlomiv shlomiv 1245411536 May 29 18:56 _99.fdt
-rw-rw-r-- 1 shlomiv shlomiv 8429835 May 29 18:57 _99.tis
-rw-rw-r-- 1 shlomiv shlomiv 117255 May 29 18:57 _99.tii
-rw-rw-r-- 1 shlomiv shlomiv 144073653 May 29 18:57 _99.prx
-rw-rw-r-- 1 shlomiv shlomiv 891520868 May 29 18:57 _99.frq
-rw-rw-r-- 1 shlomiv shlomiv 31783388 May 29 18:57 _99.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:57 _9j.fnm
-rw-rw-r-- 1 shlomiv shlomiv 27082292 May 29 18:57 _9j.fdx
-rw-rw-r-- 1 shlomiv shlomiv 135824217 May 29 18:57 _9j.fdt
-rw-rw-r-- 1 shlomiv shlomiv 2030268 May 29 18:57 _9j.tis
-rw-rw-r-- 1 shlomiv shlomiv 28317 May 29 18:57 _9j.tii
-rw-rw-r-- 1 shlomiv shlomiv 15952977 May 29 18:57 _9j.prx
-rw-rw-r-- 1 shlomiv shlomiv 94820220 May 29 18:57 _9j.frq
-rw-rw-r-- 1 shlomiv shlomiv 3385290 May 29 18:57 _9j.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:58 _9u.fnm
-rw-rw-r-- 1 shlomiv shlomiv 27505988 May 29 18:58 _9u.fdx
-rw-rw-r-- 1 shlomiv shlomiv 136384808 May 29 18:58 _9u.fdt
-rw-rw-r-- 1 shlomiv shlomiv 1674541 May 29 18:58 _9u.tis
-rw-rw-r-- 1 shlomiv shlomiv 23094 May 29 18:58 _9u.tii
-rw-rw-r-- 1 shlomiv shlomiv 16176340 May 29 18:58 _9u.prx
-rw-rw-r-- 1 shlomiv shlomiv 96098118 May 29 18:58 _9u.frq
-rw-rw-r-- 1 shlomiv shlomiv 3438252 May 29 18:58 _9u.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:59 _a5.fnm
-rw-rw-r-- 1 shlomiv shlomiv 24784676 May 29 18:59 _a5.fdx
-rw-rw-r-- 1 shlomiv shlomiv 120749640 May 29 18:59 _a5.fdt
-rw-rw-r-- 1 shlomiv shlomiv 2457010 May 29 18:59 _a5.tis
-rw-rw-r-- 1 shlomiv shlomiv 33481 May 29 18:59 _a5.tii
-rw-rw-r-- 1 shlomiv shlomiv 13850257 May 29 18:59 _a5.prx
-rw-rw-r-- 1 shlomiv shlomiv 86453688 May 29 18:59 _a5.frq
-rw-rw-r-- 1 shlomiv shlomiv 3098088 May 29 18:59 _a5.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:59 _ag.fnm
-rw-rw-r-- 1 shlomiv shlomiv 26705412 May 29 18:59 _ag.fdx
-rw-rw-r-- 1 shlomiv shlomiv 131405660 May 29 18:59 _ag.fdt
-rw-rw-r-- 1 shlomiv shlomiv 1204948 May 29 19:00 _ah.fdx
-rw-rw-r-- 1 shlomiv shlomiv 5911795 May 29 19:00 _ah.fdt
-rw-rw-r-- 1 shlomiv shlomiv 330832 May 29 19:00 _ah.tis
-rw-rw-r-- 1 shlomiv shlomiv 4639 May 29 19:00 _ah.tii
-rw-rw-r-- 1 shlomiv shlomiv 696420 May 29 19:00 _ah.prx
-rw-rw-r-- 1 shlomiv shlomiv 150622 May 29 19:00 _ah.nrm
-rw-rw-r-- 1 shlomiv shlomiv 4183072 May 29 19:00 _ah.frq
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _ah.fnm
-rw-rw-r-- 1 shlomiv shlomiv 2569983 May 29 19:00 _ag.tis
-rw-rw-r-- 1 shlomiv shlomiv 35436 May 29 19:00 _ag.tii
-rw-rw-r-- 1 shlomiv shlomiv 15571391 May 29 19:00 _ag.prx
-rw-rw-r-- 1 shlomiv shlomiv 94078982 May 29 19:00 _ag.frq
-rw-rw-r-- 1 shlomiv shlomiv 3338180 May 29 19:00 _ag.nrm
-rw-rw-r-- 1 shlomiv shlomiv 3123508 May 29 19:00 _ai.fdx
-rw-rw-r-- 1 shlomiv shlomiv 15925878 May 29 19:00 _ai.fdt
-rw-rw-r-- 1 shlomiv shlomiv 614128 May 29 19:00 _ai.tis
-rw-rw-r-- 1 shlomiv shlomiv 8534 May 29 19:00 _ai.tii
-rw-rw-r-- 1 shlomiv shlomiv 1811166 May 29 19:00 _ai.prx
-rw-rw-r-- 1 shlomiv shlomiv 390442 May 29 19:00 _ai.nrm
-rw-rw-r-- 1 shlomiv shlomiv 10911623 May 29 19:00 _ai.frq
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _ai.fnm
-rw-rw-r-- 1 shlomiv shlomiv 3217708 May 29 19:00 _aj.fdx
-rw-rw-r-- 1 shlomiv shlomiv 15914211 May 29 19:00 _aj.fdt
-rw-rw-r-- 1 shlomiv shlomiv 577853 May 29 19:00 _aj.tis
-rw-rw-r-- 1 shlomiv shlomiv 7902 May 29 19:00 _aj.tii
-rw-rw-r-- 1 shlomiv shlomiv 1874837 May 29 19:00 _aj.prx
-rw-rw-r-- 1 shlomiv shlomiv 11189671 May 29 19:00 _aj.frq
-rw-rw-r-- 1 shlomiv shlomiv 402217 May 29 19:00 _aj.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _aj.fnm
-rw-rw-r-- 1 shlomiv shlomiv 671412 May 29 19:00 _ak.fdx
-rw-rw-r-- 1 shlomiv shlomiv 3394019 May 29 19:00 _ak.fdt
-rw-rw-r-- 1 shlomiv shlomiv 232124 May 29 19:00 _ak.tis
-rw-rw-r-- 1 shlomiv shlomiv 3258 May 29 19:00 _ak.tii
-rw-rw-r-- 1 shlomiv shlomiv 390852 May 29 19:00 _ak.prx
-rw-rw-r-- 1 shlomiv shlomiv 83930 May 29 19:00 _ak.nrm
-rw-rw-r-- 1 shlomiv shlomiv 2324403 May 29 19:00 _ak.frq
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _ak.fnm
-rw-rw-r-- 1 shlomiv shlomiv 2320804 May 29 19:00 _al.fdx
-rw-rw-r-- 1 shlomiv shlomiv 11751843 May 29 19:00 _al.fdt
-rw-rw-r-- 1 shlomiv shlomiv 919429 May 29 19:00 _al.tis
-rw-rw-r-- 1 shlomiv shlomiv 12504 May 29 19:00 _al.tii
-rw-rw-r-- 1 shlomiv shlomiv 1279214 May 29 19:00 _al.prx
-rw-rw-r-- 1 shlomiv shlomiv 8255522 May 29 19:00 _al.frq
-rw-rw-r-- 1 shlomiv shlomiv 290104 May 29 19:00 _al.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _al.fnm
-rw-rw-r-- 1 shlomiv shlomiv 2432020 May 29 19:00 _am.fdx
-rw-rw-r-- 1 shlomiv shlomiv 12765052 May 29 19:00 _am.fdt
-rw-rw-r-- 1 shlomiv shlomiv 885852 May 29 19:00 _am.tis
-rw-rw-r-- 1 shlomiv shlomiv 12305 May 29 19:00 _am.tii
-rw-rw-r-- 1 shlomiv shlomiv 1360925 May 29 19:00 _am.prx
-rw-rw-r-- 1 shlomiv shlomiv 304006 May 29 19:00 _am.nrm
-rw-rw-r-- 1 shlomiv shlomiv 8638946 May 29 19:00 _am.frq
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _am.fnm
-rw-rw-r-- 1 shlomiv shlomiv 2304708 May 29 19:00 _an.fdx
-rw-rw-r-- 1 shlomiv shlomiv 12215320 May 29 19:00 _an.fdt
-rw-rw-r-- 1 shlomiv shlomiv 740581 May 29 19:00 _an.tis
-rw-rw-r-- 1 shlomiv shlomiv 10122 May 29 19:00 _an.tii
-rw-rw-r-- 1 shlomiv shlomiv 1325492 May 29 19:00 _an.prx
-rw-rw-r-- 1 shlomiv shlomiv 288092 May 29 19:00 _an.nrm
-rw-rw-r-- 1 shlomiv shlomiv 8201619 May 29 19:00 _an.frq
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _an.fnm
-rw-rw-r-- 1 shlomiv shlomiv 20 May 29 19:00 segments.gen
-rw-rw-r-- 1 shlomiv shlomiv 2888 May 29 19:00 segments_37
drwxrwxr-x 2 shlomiv shlomiv 4096 May 29 19:04 .

what do you think i should try next?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

yeah, i actually noticed that and optimized, but then it turned it to a CFS
format.. so i am rerunning it now. ill have the new listing tomorrow.

although, in the code i have a call to writer.optimize(); so i am not sure
if there will be different results.

On Wed, May 29, 2013 at 7:16 PM, Matt Weber matt.weber@gmail.com wrote:

I think Simon asked for a completely optimized listing of each index (max
segments = 1).

On Wed, May 29, 2013 at 9:12 AM, Shlomi Vaknin shlomivaknin@gmail.comwrote:

Thanks Jörg,

Ok, i disabled CFS in lucene, and ran it again. it didnt seem to make
much difference:

ls -ltra

total 8777268
drwxrwxrwt 30 root root 20480 May 29 18:30 ..
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:39 _34.fnm
-rw-rw-r-- 1 shlomiv shlomiv 263247124 May 29 18:39 _34.fdx
-rw-rw-r-- 1 shlomiv shlomiv 1295797937 May 29 18:39 _34.fdt
-rw-rw-r-- 1 shlomiv shlomiv 8154272 May 29 18:40 _34.tis
-rw-rw-r-- 1 shlomiv shlomiv 114159 May 29 18:40 _34.tii
-rw-rw-r-- 1 shlomiv shlomiv 151801202 May 29 18:40 _34.prx
-rw-rw-r-- 1 shlomiv shlomiv 925753058 May 29 18:40 _34.frq
-rw-rw-r-- 1 shlomiv shlomiv 32905894 May 29 18:40 _34.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:47 _66.fnm
-rw-rw-r-- 1 shlomiv shlomiv 244284132 May 29 18:47 _66.fdx
-rw-rw-r-- 1 shlomiv shlomiv 1200444042 May 29 18:47 _66.fdt
-rw-rw-r-- 1 shlomiv shlomiv 9156383 May 29 18:48 _66.tis
-rw-rw-r-- 1 shlomiv shlomiv 127850 May 29 18:48 _66.tii
-rw-rw-r-- 1 shlomiv shlomiv 140138826 May 29 18:48 _66.prx
-rw-rw-r-- 1 shlomiv shlomiv 862107109 May 29 18:48 _66.frq
-rw-rw-r-- 1 shlomiv shlomiv 30535520 May 29 18:48 _66.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:56 _99.fnm
-rw-rw-r-- 1 shlomiv shlomiv 254267076 May 29 18:56 _99.fdx
-rw-rw-r-- 1 shlomiv shlomiv 1245411536 May 29 18:56 _99.fdt
-rw-rw-r-- 1 shlomiv shlomiv 8429835 May 29 18:57 _99.tis
-rw-rw-r-- 1 shlomiv shlomiv 117255 May 29 18:57 _99.tii
-rw-rw-r-- 1 shlomiv shlomiv 144073653 May 29 18:57 _99.prx
-rw-rw-r-- 1 shlomiv shlomiv 891520868 May 29 18:57 _99.frq
-rw-rw-r-- 1 shlomiv shlomiv 31783388 May 29 18:57 _99.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:57 _9j.fnm
-rw-rw-r-- 1 shlomiv shlomiv 27082292 May 29 18:57 _9j.fdx
-rw-rw-r-- 1 shlomiv shlomiv 135824217 May 29 18:57 _9j.fdt
-rw-rw-r-- 1 shlomiv shlomiv 2030268 May 29 18:57 _9j.tis
-rw-rw-r-- 1 shlomiv shlomiv 28317 May 29 18:57 _9j.tii
-rw-rw-r-- 1 shlomiv shlomiv 15952977 May 29 18:57 _9j.prx
-rw-rw-r-- 1 shlomiv shlomiv 94820220 May 29 18:57 _9j.frq
-rw-rw-r-- 1 shlomiv shlomiv 3385290 May 29 18:57 _9j.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:58 _9u.fnm
-rw-rw-r-- 1 shlomiv shlomiv 27505988 May 29 18:58 _9u.fdx
-rw-rw-r-- 1 shlomiv shlomiv 136384808 May 29 18:58 _9u.fdt
-rw-rw-r-- 1 shlomiv shlomiv 1674541 May 29 18:58 _9u.tis
-rw-rw-r-- 1 shlomiv shlomiv 23094 May 29 18:58 _9u.tii
-rw-rw-r-- 1 shlomiv shlomiv 16176340 May 29 18:58 _9u.prx
-rw-rw-r-- 1 shlomiv shlomiv 96098118 May 29 18:58 _9u.frq
-rw-rw-r-- 1 shlomiv shlomiv 3438252 May 29 18:58 _9u.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:59 _a5.fnm
-rw-rw-r-- 1 shlomiv shlomiv 24784676 May 29 18:59 _a5.fdx
-rw-rw-r-- 1 shlomiv shlomiv 120749640 May 29 18:59 _a5.fdt
-rw-rw-r-- 1 shlomiv shlomiv 2457010 May 29 18:59 _a5.tis
-rw-rw-r-- 1 shlomiv shlomiv 33481 May 29 18:59 _a5.tii
-rw-rw-r-- 1 shlomiv shlomiv 13850257 May 29 18:59 _a5.prx
-rw-rw-r-- 1 shlomiv shlomiv 86453688 May 29 18:59 _a5.frq
-rw-rw-r-- 1 shlomiv shlomiv 3098088 May 29 18:59 _a5.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:59 _ag.fnm
-rw-rw-r-- 1 shlomiv shlomiv 26705412 May 29 18:59 _ag.fdx
-rw-rw-r-- 1 shlomiv shlomiv 131405660 May 29 18:59 _ag.fdt
-rw-rw-r-- 1 shlomiv shlomiv 1204948 May 29 19:00 _ah.fdx
-rw-rw-r-- 1 shlomiv shlomiv 5911795 May 29 19:00 _ah.fdt
-rw-rw-r-- 1 shlomiv shlomiv 330832 May 29 19:00 _ah.tis
-rw-rw-r-- 1 shlomiv shlomiv 4639 May 29 19:00 _ah.tii
-rw-rw-r-- 1 shlomiv shlomiv 696420 May 29 19:00 _ah.prx
-rw-rw-r-- 1 shlomiv shlomiv 150622 May 29 19:00 _ah.nrm
-rw-rw-r-- 1 shlomiv shlomiv 4183072 May 29 19:00 _ah.frq
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _ah.fnm
-rw-rw-r-- 1 shlomiv shlomiv 2569983 May 29 19:00 _ag.tis
-rw-rw-r-- 1 shlomiv shlomiv 35436 May 29 19:00 _ag.tii
-rw-rw-r-- 1 shlomiv shlomiv 15571391 May 29 19:00 _ag.prx
-rw-rw-r-- 1 shlomiv shlomiv 94078982 May 29 19:00 _ag.frq
-rw-rw-r-- 1 shlomiv shlomiv 3338180 May 29 19:00 _ag.nrm
-rw-rw-r-- 1 shlomiv shlomiv 3123508 May 29 19:00 _ai.fdx
-rw-rw-r-- 1 shlomiv shlomiv 15925878 May 29 19:00 _ai.fdt
-rw-rw-r-- 1 shlomiv shlomiv 614128 May 29 19:00 _ai.tis
-rw-rw-r-- 1 shlomiv shlomiv 8534 May 29 19:00 _ai.tii
-rw-rw-r-- 1 shlomiv shlomiv 1811166 May 29 19:00 _ai.prx
-rw-rw-r-- 1 shlomiv shlomiv 390442 May 29 19:00 _ai.nrm
-rw-rw-r-- 1 shlomiv shlomiv 10911623 May 29 19:00 _ai.frq
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _ai.fnm
-rw-rw-r-- 1 shlomiv shlomiv 3217708 May 29 19:00 _aj.fdx
-rw-rw-r-- 1 shlomiv shlomiv 15914211 May 29 19:00 _aj.fdt
-rw-rw-r-- 1 shlomiv shlomiv 577853 May 29 19:00 _aj.tis
-rw-rw-r-- 1 shlomiv shlomiv 7902 May 29 19:00 _aj.tii
-rw-rw-r-- 1 shlomiv shlomiv 1874837 May 29 19:00 _aj.prx
-rw-rw-r-- 1 shlomiv shlomiv 11189671 May 29 19:00 _aj.frq
-rw-rw-r-- 1 shlomiv shlomiv 402217 May 29 19:00 _aj.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _aj.fnm
-rw-rw-r-- 1 shlomiv shlomiv 671412 May 29 19:00 _ak.fdx
-rw-rw-r-- 1 shlomiv shlomiv 3394019 May 29 19:00 _ak.fdt
-rw-rw-r-- 1 shlomiv shlomiv 232124 May 29 19:00 _ak.tis
-rw-rw-r-- 1 shlomiv shlomiv 3258 May 29 19:00 _ak.tii
-rw-rw-r-- 1 shlomiv shlomiv 390852 May 29 19:00 _ak.prx
-rw-rw-r-- 1 shlomiv shlomiv 83930 May 29 19:00 _ak.nrm
-rw-rw-r-- 1 shlomiv shlomiv 2324403 May 29 19:00 _ak.frq
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _ak.fnm
-rw-rw-r-- 1 shlomiv shlomiv 2320804 May 29 19:00 _al.fdx
-rw-rw-r-- 1 shlomiv shlomiv 11751843 May 29 19:00 _al.fdt
-rw-rw-r-- 1 shlomiv shlomiv 919429 May 29 19:00 _al.tis
-rw-rw-r-- 1 shlomiv shlomiv 12504 May 29 19:00 _al.tii
-rw-rw-r-- 1 shlomiv shlomiv 1279214 May 29 19:00 _al.prx
-rw-rw-r-- 1 shlomiv shlomiv 8255522 May 29 19:00 _al.frq
-rw-rw-r-- 1 shlomiv shlomiv 290104 May 29 19:00 _al.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _al.fnm
-rw-rw-r-- 1 shlomiv shlomiv 2432020 May 29 19:00 _am.fdx
-rw-rw-r-- 1 shlomiv shlomiv 12765052 May 29 19:00 _am.fdt
-rw-rw-r-- 1 shlomiv shlomiv 885852 May 29 19:00 _am.tis
-rw-rw-r-- 1 shlomiv shlomiv 12305 May 29 19:00 _am.tii
-rw-rw-r-- 1 shlomiv shlomiv 1360925 May 29 19:00 _am.prx
-rw-rw-r-- 1 shlomiv shlomiv 304006 May 29 19:00 _am.nrm
-rw-rw-r-- 1 shlomiv shlomiv 8638946 May 29 19:00 _am.frq
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _am.fnm
-rw-rw-r-- 1 shlomiv shlomiv 2304708 May 29 19:00 _an.fdx
-rw-rw-r-- 1 shlomiv shlomiv 12215320 May 29 19:00 _an.fdt
-rw-rw-r-- 1 shlomiv shlomiv 740581 May 29 19:00 _an.tis
-rw-rw-r-- 1 shlomiv shlomiv 10122 May 29 19:00 _an.tii
-rw-rw-r-- 1 shlomiv shlomiv 1325492 May 29 19:00 _an.prx
-rw-rw-r-- 1 shlomiv shlomiv 288092 May 29 19:00 _an.nrm
-rw-rw-r-- 1 shlomiv shlomiv 8201619 May 29 19:00 _an.frq
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _an.fnm
-rw-rw-r-- 1 shlomiv shlomiv 20 May 29 19:00 segments.gen
-rw-rw-r-- 1 shlomiv shlomiv 2888 May 29 19:00 segments_37
drwxrwxr-x 2 shlomiv shlomiv 4096 May 29 19:04 .

what do you think i should try next?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/6j0E-2pTbWg/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

In lucene use forceMerge(1) or optimize(1) and in ES use
/_optimize?max_num_segments=1.

Thanks,
Matt Weber

On Wed, May 29, 2013 at 9:19 AM, Shlomi Vaknin shlomivaknin@gmail.comwrote:

yeah, i actually noticed that and optimized, but then it turned it to a
CFS format.. so i am rerunning it now. ill have the new listing tomorrow.

although, in the code i have a call to writer.optimize(); so i am not sure
if there will be different results.

On Wed, May 29, 2013 at 7:16 PM, Matt Weber matt.weber@gmail.com wrote:

I think Simon asked for a completely optimized listing of each index (max
segments = 1).

On Wed, May 29, 2013 at 9:12 AM, Shlomi Vaknin shlomivaknin@gmail.comwrote:

Thanks Jörg,

Ok, i disabled CFS in lucene, and ran it again. it didnt seem to make
much difference:

ls -ltra

total 8777268
drwxrwxrwt 30 root root 20480 May 29 18:30 ..
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:39 _34.fnm
-rw-rw-r-- 1 shlomiv shlomiv 263247124 May 29 18:39 _34.fdx
-rw-rw-r-- 1 shlomiv shlomiv 1295797937 May 29 18:39 _34.fdt
-rw-rw-r-- 1 shlomiv shlomiv 8154272 May 29 18:40 _34.tis
-rw-rw-r-- 1 shlomiv shlomiv 114159 May 29 18:40 _34.tii
-rw-rw-r-- 1 shlomiv shlomiv 151801202 May 29 18:40 _34.prx
-rw-rw-r-- 1 shlomiv shlomiv 925753058 May 29 18:40 _34.frq
-rw-rw-r-- 1 shlomiv shlomiv 32905894 May 29 18:40 _34.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:47 _66.fnm
-rw-rw-r-- 1 shlomiv shlomiv 244284132 May 29 18:47 _66.fdx
-rw-rw-r-- 1 shlomiv shlomiv 1200444042 May 29 18:47 _66.fdt
-rw-rw-r-- 1 shlomiv shlomiv 9156383 May 29 18:48 _66.tis
-rw-rw-r-- 1 shlomiv shlomiv 127850 May 29 18:48 _66.tii
-rw-rw-r-- 1 shlomiv shlomiv 140138826 May 29 18:48 _66.prx
-rw-rw-r-- 1 shlomiv shlomiv 862107109 May 29 18:48 _66.frq
-rw-rw-r-- 1 shlomiv shlomiv 30535520 May 29 18:48 _66.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:56 _99.fnm
-rw-rw-r-- 1 shlomiv shlomiv 254267076 May 29 18:56 _99.fdx
-rw-rw-r-- 1 shlomiv shlomiv 1245411536 May 29 18:56 _99.fdt
-rw-rw-r-- 1 shlomiv shlomiv 8429835 May 29 18:57 _99.tis
-rw-rw-r-- 1 shlomiv shlomiv 117255 May 29 18:57 _99.tii
-rw-rw-r-- 1 shlomiv shlomiv 144073653 May 29 18:57 _99.prx
-rw-rw-r-- 1 shlomiv shlomiv 891520868 May 29 18:57 _99.frq
-rw-rw-r-- 1 shlomiv shlomiv 31783388 May 29 18:57 _99.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:57 _9j.fnm
-rw-rw-r-- 1 shlomiv shlomiv 27082292 May 29 18:57 _9j.fdx
-rw-rw-r-- 1 shlomiv shlomiv 135824217 May 29 18:57 _9j.fdt
-rw-rw-r-- 1 shlomiv shlomiv 2030268 May 29 18:57 _9j.tis
-rw-rw-r-- 1 shlomiv shlomiv 28317 May 29 18:57 _9j.tii
-rw-rw-r-- 1 shlomiv shlomiv 15952977 May 29 18:57 _9j.prx
-rw-rw-r-- 1 shlomiv shlomiv 94820220 May 29 18:57 _9j.frq
-rw-rw-r-- 1 shlomiv shlomiv 3385290 May 29 18:57 _9j.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:58 _9u.fnm
-rw-rw-r-- 1 shlomiv shlomiv 27505988 May 29 18:58 _9u.fdx
-rw-rw-r-- 1 shlomiv shlomiv 136384808 May 29 18:58 _9u.fdt
-rw-rw-r-- 1 shlomiv shlomiv 1674541 May 29 18:58 _9u.tis
-rw-rw-r-- 1 shlomiv shlomiv 23094 May 29 18:58 _9u.tii
-rw-rw-r-- 1 shlomiv shlomiv 16176340 May 29 18:58 _9u.prx
-rw-rw-r-- 1 shlomiv shlomiv 96098118 May 29 18:58 _9u.frq
-rw-rw-r-- 1 shlomiv shlomiv 3438252 May 29 18:58 _9u.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:59 _a5.fnm
-rw-rw-r-- 1 shlomiv shlomiv 24784676 May 29 18:59 _a5.fdx
-rw-rw-r-- 1 shlomiv shlomiv 120749640 May 29 18:59 _a5.fdt
-rw-rw-r-- 1 shlomiv shlomiv 2457010 May 29 18:59 _a5.tis
-rw-rw-r-- 1 shlomiv shlomiv 33481 May 29 18:59 _a5.tii
-rw-rw-r-- 1 shlomiv shlomiv 13850257 May 29 18:59 _a5.prx
-rw-rw-r-- 1 shlomiv shlomiv 86453688 May 29 18:59 _a5.frq
-rw-rw-r-- 1 shlomiv shlomiv 3098088 May 29 18:59 _a5.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:59 _ag.fnm
-rw-rw-r-- 1 shlomiv shlomiv 26705412 May 29 18:59 _ag.fdx
-rw-rw-r-- 1 shlomiv shlomiv 131405660 May 29 18:59 _ag.fdt
-rw-rw-r-- 1 shlomiv shlomiv 1204948 May 29 19:00 _ah.fdx
-rw-rw-r-- 1 shlomiv shlomiv 5911795 May 29 19:00 _ah.fdt
-rw-rw-r-- 1 shlomiv shlomiv 330832 May 29 19:00 _ah.tis
-rw-rw-r-- 1 shlomiv shlomiv 4639 May 29 19:00 _ah.tii
-rw-rw-r-- 1 shlomiv shlomiv 696420 May 29 19:00 _ah.prx
-rw-rw-r-- 1 shlomiv shlomiv 150622 May 29 19:00 _ah.nrm
-rw-rw-r-- 1 shlomiv shlomiv 4183072 May 29 19:00 _ah.frq
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _ah.fnm
-rw-rw-r-- 1 shlomiv shlomiv 2569983 May 29 19:00 _ag.tis
-rw-rw-r-- 1 shlomiv shlomiv 35436 May 29 19:00 _ag.tii
-rw-rw-r-- 1 shlomiv shlomiv 15571391 May 29 19:00 _ag.prx
-rw-rw-r-- 1 shlomiv shlomiv 94078982 May 29 19:00 _ag.frq
-rw-rw-r-- 1 shlomiv shlomiv 3338180 May 29 19:00 _ag.nrm
-rw-rw-r-- 1 shlomiv shlomiv 3123508 May 29 19:00 _ai.fdx
-rw-rw-r-- 1 shlomiv shlomiv 15925878 May 29 19:00 _ai.fdt
-rw-rw-r-- 1 shlomiv shlomiv 614128 May 29 19:00 _ai.tis
-rw-rw-r-- 1 shlomiv shlomiv 8534 May 29 19:00 _ai.tii
-rw-rw-r-- 1 shlomiv shlomiv 1811166 May 29 19:00 _ai.prx
-rw-rw-r-- 1 shlomiv shlomiv 390442 May 29 19:00 _ai.nrm
-rw-rw-r-- 1 shlomiv shlomiv 10911623 May 29 19:00 _ai.frq
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _ai.fnm
-rw-rw-r-- 1 shlomiv shlomiv 3217708 May 29 19:00 _aj.fdx
-rw-rw-r-- 1 shlomiv shlomiv 15914211 May 29 19:00 _aj.fdt
-rw-rw-r-- 1 shlomiv shlomiv 577853 May 29 19:00 _aj.tis
-rw-rw-r-- 1 shlomiv shlomiv 7902 May 29 19:00 _aj.tii
-rw-rw-r-- 1 shlomiv shlomiv 1874837 May 29 19:00 _aj.prx
-rw-rw-r-- 1 shlomiv shlomiv 11189671 May 29 19:00 _aj.frq
-rw-rw-r-- 1 shlomiv shlomiv 402217 May 29 19:00 _aj.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _aj.fnm
-rw-rw-r-- 1 shlomiv shlomiv 671412 May 29 19:00 _ak.fdx
-rw-rw-r-- 1 shlomiv shlomiv 3394019 May 29 19:00 _ak.fdt
-rw-rw-r-- 1 shlomiv shlomiv 232124 May 29 19:00 _ak.tis
-rw-rw-r-- 1 shlomiv shlomiv 3258 May 29 19:00 _ak.tii
-rw-rw-r-- 1 shlomiv shlomiv 390852 May 29 19:00 _ak.prx
-rw-rw-r-- 1 shlomiv shlomiv 83930 May 29 19:00 _ak.nrm
-rw-rw-r-- 1 shlomiv shlomiv 2324403 May 29 19:00 _ak.frq
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _ak.fnm
-rw-rw-r-- 1 shlomiv shlomiv 2320804 May 29 19:00 _al.fdx
-rw-rw-r-- 1 shlomiv shlomiv 11751843 May 29 19:00 _al.fdt
-rw-rw-r-- 1 shlomiv shlomiv 919429 May 29 19:00 _al.tis
-rw-rw-r-- 1 shlomiv shlomiv 12504 May 29 19:00 _al.tii
-rw-rw-r-- 1 shlomiv shlomiv 1279214 May 29 19:00 _al.prx
-rw-rw-r-- 1 shlomiv shlomiv 8255522 May 29 19:00 _al.frq
-rw-rw-r-- 1 shlomiv shlomiv 290104 May 29 19:00 _al.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _al.fnm
-rw-rw-r-- 1 shlomiv shlomiv 2432020 May 29 19:00 _am.fdx
-rw-rw-r-- 1 shlomiv shlomiv 12765052 May 29 19:00 _am.fdt
-rw-rw-r-- 1 shlomiv shlomiv 885852 May 29 19:00 _am.tis
-rw-rw-r-- 1 shlomiv shlomiv 12305 May 29 19:00 _am.tii
-rw-rw-r-- 1 shlomiv shlomiv 1360925 May 29 19:00 _am.prx
-rw-rw-r-- 1 shlomiv shlomiv 304006 May 29 19:00 _am.nrm
-rw-rw-r-- 1 shlomiv shlomiv 8638946 May 29 19:00 _am.frq
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _am.fnm
-rw-rw-r-- 1 shlomiv shlomiv 2304708 May 29 19:00 _an.fdx
-rw-rw-r-- 1 shlomiv shlomiv 12215320 May 29 19:00 _an.fdt
-rw-rw-r-- 1 shlomiv shlomiv 740581 May 29 19:00 _an.tis
-rw-rw-r-- 1 shlomiv shlomiv 10122 May 29 19:00 _an.tii
-rw-rw-r-- 1 shlomiv shlomiv 1325492 May 29 19:00 _an.prx
-rw-rw-r-- 1 shlomiv shlomiv 288092 May 29 19:00 _an.nrm
-rw-rw-r-- 1 shlomiv shlomiv 8201619 May 29 19:00 _an.frq
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _an.fnm
-rw-rw-r-- 1 shlomiv shlomiv 20 May 29 19:00 segments.gen
-rw-rw-r-- 1 shlomiv shlomiv 2888 May 29 19:00 segments_37
drwxrwxr-x 2 shlomiv shlomiv 4096 May 29 19:04 .

what do you think i should try next?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/6j0E-2pTbWg/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Just because this saga is annoying me (no offense), I ran a perl one-liner
to sum up the file sizes. Here's the total sizes by extension:

ES:
fdt: 7,964,999,668
fdx: 880,736,240
fnm: 992
frq: 3,705,120,837
gen: 20
nrm: 110,092,142
prx: 1,610,367,435
tii: 26,441,783
tis: 2,749,100,849

Lucene:
fdt: 4,343,895,958
fdx: 883,151,808
fnm: 336
frq: 3,104,536,899
gen: 20
nrm: 110,394,025
prx: 506,303,552
tii: 538,856
tis: 38,773,091

I suspect the sizes after optimize will be pretty much the same.

On Wed, May 29, 2013 at 7:36 PM, Matt Weber matt.weber@gmail.com wrote:

In lucene use forceMerge(1) or optimize(1) and in ES use
/_optimize?max_num_segments=1.

Thanks,
Matt Weber

On Wed, May 29, 2013 at 9:19 AM, Shlomi Vaknin shlomivaknin@gmail.comwrote:

yeah, i actually noticed that and optimized, but then it turned it to a
CFS format.. so i am rerunning it now. ill have the new listing tomorrow.

although, in the code i have a call to writer.optimize(); so i am not
sure if there will be different results.

On Wed, May 29, 2013 at 7:16 PM, Matt Weber matt.weber@gmail.com wrote:

I think Simon asked for a completely optimized listing of each index
(max segments = 1).

On Wed, May 29, 2013 at 9:12 AM, Shlomi Vaknin shlomivaknin@gmail.comwrote:

Thanks Jörg,

Ok, i disabled CFS in lucene, and ran it again. it didnt seem to make
much difference:

ls -ltra

total 8777268
drwxrwxrwt 30 root root 20480 May 29 18:30 ..
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:39 _34.fnm
-rw-rw-r-- 1 shlomiv shlomiv 263247124 May 29 18:39 _34.fdx
-rw-rw-r-- 1 shlomiv shlomiv 1295797937 May 29 18:39 _34.fdt
-rw-rw-r-- 1 shlomiv shlomiv 8154272 May 29 18:40 _34.tis
-rw-rw-r-- 1 shlomiv shlomiv 114159 May 29 18:40 _34.tii
-rw-rw-r-- 1 shlomiv shlomiv 151801202 May 29 18:40 _34.prx
-rw-rw-r-- 1 shlomiv shlomiv 925753058 May 29 18:40 _34.frq
-rw-rw-r-- 1 shlomiv shlomiv 32905894 May 29 18:40 _34.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:47 _66.fnm
-rw-rw-r-- 1 shlomiv shlomiv 244284132 May 29 18:47 _66.fdx
-rw-rw-r-- 1 shlomiv shlomiv 1200444042 May 29 18:47 _66.fdt
-rw-rw-r-- 1 shlomiv shlomiv 9156383 May 29 18:48 _66.tis
-rw-rw-r-- 1 shlomiv shlomiv 127850 May 29 18:48 _66.tii
-rw-rw-r-- 1 shlomiv shlomiv 140138826 May 29 18:48 _66.prx
-rw-rw-r-- 1 shlomiv shlomiv 862107109 May 29 18:48 _66.frq
-rw-rw-r-- 1 shlomiv shlomiv 30535520 May 29 18:48 _66.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:56 _99.fnm
-rw-rw-r-- 1 shlomiv shlomiv 254267076 May 29 18:56 _99.fdx
-rw-rw-r-- 1 shlomiv shlomiv 1245411536 May 29 18:56 _99.fdt
-rw-rw-r-- 1 shlomiv shlomiv 8429835 May 29 18:57 _99.tis
-rw-rw-r-- 1 shlomiv shlomiv 117255 May 29 18:57 _99.tii
-rw-rw-r-- 1 shlomiv shlomiv 144073653 May 29 18:57 _99.prx
-rw-rw-r-- 1 shlomiv shlomiv 891520868 May 29 18:57 _99.frq
-rw-rw-r-- 1 shlomiv shlomiv 31783388 May 29 18:57 _99.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:57 _9j.fnm
-rw-rw-r-- 1 shlomiv shlomiv 27082292 May 29 18:57 _9j.fdx
-rw-rw-r-- 1 shlomiv shlomiv 135824217 May 29 18:57 _9j.fdt
-rw-rw-r-- 1 shlomiv shlomiv 2030268 May 29 18:57 _9j.tis
-rw-rw-r-- 1 shlomiv shlomiv 28317 May 29 18:57 _9j.tii
-rw-rw-r-- 1 shlomiv shlomiv 15952977 May 29 18:57 _9j.prx
-rw-rw-r-- 1 shlomiv shlomiv 94820220 May 29 18:57 _9j.frq
-rw-rw-r-- 1 shlomiv shlomiv 3385290 May 29 18:57 _9j.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:58 _9u.fnm
-rw-rw-r-- 1 shlomiv shlomiv 27505988 May 29 18:58 _9u.fdx
-rw-rw-r-- 1 shlomiv shlomiv 136384808 May 29 18:58 _9u.fdt
-rw-rw-r-- 1 shlomiv shlomiv 1674541 May 29 18:58 _9u.tis
-rw-rw-r-- 1 shlomiv shlomiv 23094 May 29 18:58 _9u.tii
-rw-rw-r-- 1 shlomiv shlomiv 16176340 May 29 18:58 _9u.prx
-rw-rw-r-- 1 shlomiv shlomiv 96098118 May 29 18:58 _9u.frq
-rw-rw-r-- 1 shlomiv shlomiv 3438252 May 29 18:58 _9u.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:59 _a5.fnm
-rw-rw-r-- 1 shlomiv shlomiv 24784676 May 29 18:59 _a5.fdx
-rw-rw-r-- 1 shlomiv shlomiv 120749640 May 29 18:59 _a5.fdt
-rw-rw-r-- 1 shlomiv shlomiv 2457010 May 29 18:59 _a5.tis
-rw-rw-r-- 1 shlomiv shlomiv 33481 May 29 18:59 _a5.tii
-rw-rw-r-- 1 shlomiv shlomiv 13850257 May 29 18:59 _a5.prx
-rw-rw-r-- 1 shlomiv shlomiv 86453688 May 29 18:59 _a5.frq
-rw-rw-r-- 1 shlomiv shlomiv 3098088 May 29 18:59 _a5.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:59 _ag.fnm
-rw-rw-r-- 1 shlomiv shlomiv 26705412 May 29 18:59 _ag.fdx
-rw-rw-r-- 1 shlomiv shlomiv 131405660 May 29 18:59 _ag.fdt
-rw-rw-r-- 1 shlomiv shlomiv 1204948 May 29 19:00 _ah.fdx
-rw-rw-r-- 1 shlomiv shlomiv 5911795 May 29 19:00 _ah.fdt
-rw-rw-r-- 1 shlomiv shlomiv 330832 May 29 19:00 _ah.tis
-rw-rw-r-- 1 shlomiv shlomiv 4639 May 29 19:00 _ah.tii
-rw-rw-r-- 1 shlomiv shlomiv 696420 May 29 19:00 _ah.prx
-rw-rw-r-- 1 shlomiv shlomiv 150622 May 29 19:00 _ah.nrm
-rw-rw-r-- 1 shlomiv shlomiv 4183072 May 29 19:00 _ah.frq
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _ah.fnm
-rw-rw-r-- 1 shlomiv shlomiv 2569983 May 29 19:00 _ag.tis
-rw-rw-r-- 1 shlomiv shlomiv 35436 May 29 19:00 _ag.tii
-rw-rw-r-- 1 shlomiv shlomiv 15571391 May 29 19:00 _ag.prx
-rw-rw-r-- 1 shlomiv shlomiv 94078982 May 29 19:00 _ag.frq
-rw-rw-r-- 1 shlomiv shlomiv 3338180 May 29 19:00 _ag.nrm
-rw-rw-r-- 1 shlomiv shlomiv 3123508 May 29 19:00 _ai.fdx
-rw-rw-r-- 1 shlomiv shlomiv 15925878 May 29 19:00 _ai.fdt
-rw-rw-r-- 1 shlomiv shlomiv 614128 May 29 19:00 _ai.tis
-rw-rw-r-- 1 shlomiv shlomiv 8534 May 29 19:00 _ai.tii
-rw-rw-r-- 1 shlomiv shlomiv 1811166 May 29 19:00 _ai.prx
-rw-rw-r-- 1 shlomiv shlomiv 390442 May 29 19:00 _ai.nrm
-rw-rw-r-- 1 shlomiv shlomiv 10911623 May 29 19:00 _ai.frq
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _ai.fnm
-rw-rw-r-- 1 shlomiv shlomiv 3217708 May 29 19:00 _aj.fdx
-rw-rw-r-- 1 shlomiv shlomiv 15914211 May 29 19:00 _aj.fdt
-rw-rw-r-- 1 shlomiv shlomiv 577853 May 29 19:00 _aj.tis
-rw-rw-r-- 1 shlomiv shlomiv 7902 May 29 19:00 _aj.tii
-rw-rw-r-- 1 shlomiv shlomiv 1874837 May 29 19:00 _aj.prx
-rw-rw-r-- 1 shlomiv shlomiv 11189671 May 29 19:00 _aj.frq
-rw-rw-r-- 1 shlomiv shlomiv 402217 May 29 19:00 _aj.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _aj.fnm
-rw-rw-r-- 1 shlomiv shlomiv 671412 May 29 19:00 _ak.fdx
-rw-rw-r-- 1 shlomiv shlomiv 3394019 May 29 19:00 _ak.fdt
-rw-rw-r-- 1 shlomiv shlomiv 232124 May 29 19:00 _ak.tis
-rw-rw-r-- 1 shlomiv shlomiv 3258 May 29 19:00 _ak.tii
-rw-rw-r-- 1 shlomiv shlomiv 390852 May 29 19:00 _ak.prx
-rw-rw-r-- 1 shlomiv shlomiv 83930 May 29 19:00 _ak.nrm
-rw-rw-r-- 1 shlomiv shlomiv 2324403 May 29 19:00 _ak.frq
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _ak.fnm
-rw-rw-r-- 1 shlomiv shlomiv 2320804 May 29 19:00 _al.fdx
-rw-rw-r-- 1 shlomiv shlomiv 11751843 May 29 19:00 _al.fdt
-rw-rw-r-- 1 shlomiv shlomiv 919429 May 29 19:00 _al.tis
-rw-rw-r-- 1 shlomiv shlomiv 12504 May 29 19:00 _al.tii
-rw-rw-r-- 1 shlomiv shlomiv 1279214 May 29 19:00 _al.prx
-rw-rw-r-- 1 shlomiv shlomiv 8255522 May 29 19:00 _al.frq
-rw-rw-r-- 1 shlomiv shlomiv 290104 May 29 19:00 _al.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _al.fnm
-rw-rw-r-- 1 shlomiv shlomiv 2432020 May 29 19:00 _am.fdx
-rw-rw-r-- 1 shlomiv shlomiv 12765052 May 29 19:00 _am.fdt
-rw-rw-r-- 1 shlomiv shlomiv 885852 May 29 19:00 _am.tis
-rw-rw-r-- 1 shlomiv shlomiv 12305 May 29 19:00 _am.tii
-rw-rw-r-- 1 shlomiv shlomiv 1360925 May 29 19:00 _am.prx
-rw-rw-r-- 1 shlomiv shlomiv 304006 May 29 19:00 _am.nrm
-rw-rw-r-- 1 shlomiv shlomiv 8638946 May 29 19:00 _am.frq
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _am.fnm
-rw-rw-r-- 1 shlomiv shlomiv 2304708 May 29 19:00 _an.fdx
-rw-rw-r-- 1 shlomiv shlomiv 12215320 May 29 19:00 _an.fdt
-rw-rw-r-- 1 shlomiv shlomiv 740581 May 29 19:00 _an.tis
-rw-rw-r-- 1 shlomiv shlomiv 10122 May 29 19:00 _an.tii
-rw-rw-r-- 1 shlomiv shlomiv 1325492 May 29 19:00 _an.prx
-rw-rw-r-- 1 shlomiv shlomiv 288092 May 29 19:00 _an.nrm
-rw-rw-r-- 1 shlomiv shlomiv 8201619 May 29 19:00 _an.frq
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _an.fnm
-rw-rw-r-- 1 shlomiv shlomiv 20 May 29 19:00 segments.gen
-rw-rw-r-- 1 shlomiv shlomiv 2888 May 29 19:00 segments_37
drwxrwxr-x 2 shlomiv shlomiv 4096 May 29 19:04 .

what do you think i should try next?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/6j0E-2pTbWg/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Just because this saga is annoying me (no offense)

That came out harsher than intended, sorry...

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

ok guys I will spend some time and write up a test for this and explain it
based on this. This is all pointless :wink:

simon
On Thursday, May 30, 2013 6:16:16 AM UTC+2, Israel Tsadok wrote:

Just because this saga is annoying me (no offense), I ran a perl one-liner
to sum up the file sizes. Here's the total sizes by extension:

ES:
fdt: 7,964,999,668
fdx: 880,736,240
fnm: 992
frq: 3,705,120,837
gen: 20
nrm: 110,092,142
prx: 1,610,367,435
tii: 26,441,783
tis: 2,749,100,849

Lucene:
fdt: 4,343,895,958
fdx: 883,151,808
fnm: 336
frq: 3,104,536,899
gen: 20
nrm: 110,394,025
prx: 506,303,552
tii: 538,856
tis: 38,773,091

I suspect the sizes after optimize will be pretty much the same.

On Wed, May 29, 2013 at 7:36 PM, Matt Weber <matt....@gmail.com<javascript:>

wrote:

In lucene use forceMerge(1) or optimize(1) and in ES use
/_optimize?max_num_segments=1.

Thanks,
Matt Weber

On Wed, May 29, 2013 at 9:19 AM, Shlomi Vaknin <shlomi...@gmail.com<javascript:>

wrote:

yeah, i actually noticed that and optimized, but then it turned it to a
CFS format.. so i am rerunning it now. ill have the new listing tomorrow.

although, in the code i have a call to writer.optimize(); so i am not
sure if there will be different results.

On Wed, May 29, 2013 at 7:16 PM, Matt Weber <matt....@gmail.com<javascript:>

wrote:

I think Simon asked for a completely optimized listing of each index
(max segments = 1).

On Wed, May 29, 2013 at 9:12 AM, Shlomi Vaknin <shlomi...@gmail.com<javascript:>

wrote:

Thanks Jörg,

Ok, i disabled CFS in lucene, and ran it again. it didnt seem to make
much difference:

ls -ltra

total 8777268
drwxrwxrwt 30 root root 20480 May 29 18:30 ..
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:39 _34.fnm
-rw-rw-r-- 1 shlomiv shlomiv 263247124 May 29 18:39 _34.fdx
-rw-rw-r-- 1 shlomiv shlomiv 1295797937 May 29 18:39 _34.fdt
-rw-rw-r-- 1 shlomiv shlomiv 8154272 May 29 18:40 _34.tis
-rw-rw-r-- 1 shlomiv shlomiv 114159 May 29 18:40 _34.tii
-rw-rw-r-- 1 shlomiv shlomiv 151801202 May 29 18:40 _34.prx
-rw-rw-r-- 1 shlomiv shlomiv 925753058 May 29 18:40 _34.frq
-rw-rw-r-- 1 shlomiv shlomiv 32905894 May 29 18:40 _34.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:47 _66.fnm
-rw-rw-r-- 1 shlomiv shlomiv 244284132 May 29 18:47 _66.fdx
-rw-rw-r-- 1 shlomiv shlomiv 1200444042 May 29 18:47 _66.fdt
-rw-rw-r-- 1 shlomiv shlomiv 9156383 May 29 18:48 _66.tis
-rw-rw-r-- 1 shlomiv shlomiv 127850 May 29 18:48 _66.tii
-rw-rw-r-- 1 shlomiv shlomiv 140138826 May 29 18:48 _66.prx
-rw-rw-r-- 1 shlomiv shlomiv 862107109 May 29 18:48 _66.frq
-rw-rw-r-- 1 shlomiv shlomiv 30535520 May 29 18:48 _66.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:56 _99.fnm
-rw-rw-r-- 1 shlomiv shlomiv 254267076 May 29 18:56 _99.fdx
-rw-rw-r-- 1 shlomiv shlomiv 1245411536 May 29 18:56 _99.fdt
-rw-rw-r-- 1 shlomiv shlomiv 8429835 May 29 18:57 _99.tis
-rw-rw-r-- 1 shlomiv shlomiv 117255 May 29 18:57 _99.tii
-rw-rw-r-- 1 shlomiv shlomiv 144073653 May 29 18:57 _99.prx
-rw-rw-r-- 1 shlomiv shlomiv 891520868 May 29 18:57 _99.frq
-rw-rw-r-- 1 shlomiv shlomiv 31783388 May 29 18:57 _99.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:57 _9j.fnm
-rw-rw-r-- 1 shlomiv shlomiv 27082292 May 29 18:57 _9j.fdx
-rw-rw-r-- 1 shlomiv shlomiv 135824217 May 29 18:57 _9j.fdt
-rw-rw-r-- 1 shlomiv shlomiv 2030268 May 29 18:57 _9j.tis
-rw-rw-r-- 1 shlomiv shlomiv 28317 May 29 18:57 _9j.tii
-rw-rw-r-- 1 shlomiv shlomiv 15952977 May 29 18:57 _9j.prx
-rw-rw-r-- 1 shlomiv shlomiv 94820220 May 29 18:57 _9j.frq
-rw-rw-r-- 1 shlomiv shlomiv 3385290 May 29 18:57 _9j.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:58 _9u.fnm
-rw-rw-r-- 1 shlomiv shlomiv 27505988 May 29 18:58 _9u.fdx
-rw-rw-r-- 1 shlomiv shlomiv 136384808 May 29 18:58 _9u.fdt
-rw-rw-r-- 1 shlomiv shlomiv 1674541 May 29 18:58 _9u.tis
-rw-rw-r-- 1 shlomiv shlomiv 23094 May 29 18:58 _9u.tii
-rw-rw-r-- 1 shlomiv shlomiv 16176340 May 29 18:58 _9u.prx
-rw-rw-r-- 1 shlomiv shlomiv 96098118 May 29 18:58 _9u.frq
-rw-rw-r-- 1 shlomiv shlomiv 3438252 May 29 18:58 _9u.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:59 _a5.fnm
-rw-rw-r-- 1 shlomiv shlomiv 24784676 May 29 18:59 _a5.fdx
-rw-rw-r-- 1 shlomiv shlomiv 120749640 May 29 18:59 _a5.fdt
-rw-rw-r-- 1 shlomiv shlomiv 2457010 May 29 18:59 _a5.tis
-rw-rw-r-- 1 shlomiv shlomiv 33481 May 29 18:59 _a5.tii
-rw-rw-r-- 1 shlomiv shlomiv 13850257 May 29 18:59 _a5.prx
-rw-rw-r-- 1 shlomiv shlomiv 86453688 May 29 18:59 _a5.frq
-rw-rw-r-- 1 shlomiv shlomiv 3098088 May 29 18:59 _a5.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 18:59 _ag.fnm
-rw-rw-r-- 1 shlomiv shlomiv 26705412 May 29 18:59 _ag.fdx
-rw-rw-r-- 1 shlomiv shlomiv 131405660 May 29 18:59 _ag.fdt
-rw-rw-r-- 1 shlomiv shlomiv 1204948 May 29 19:00 _ah.fdx
-rw-rw-r-- 1 shlomiv shlomiv 5911795 May 29 19:00 _ah.fdt
-rw-rw-r-- 1 shlomiv shlomiv 330832 May 29 19:00 _ah.tis
-rw-rw-r-- 1 shlomiv shlomiv 4639 May 29 19:00 _ah.tii
-rw-rw-r-- 1 shlomiv shlomiv 696420 May 29 19:00 _ah.prx
-rw-rw-r-- 1 shlomiv shlomiv 150622 May 29 19:00 _ah.nrm
-rw-rw-r-- 1 shlomiv shlomiv 4183072 May 29 19:00 _ah.frq
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _ah.fnm
-rw-rw-r-- 1 shlomiv shlomiv 2569983 May 29 19:00 _ag.tis
-rw-rw-r-- 1 shlomiv shlomiv 35436 May 29 19:00 _ag.tii
-rw-rw-r-- 1 shlomiv shlomiv 15571391 May 29 19:00 _ag.prx
-rw-rw-r-- 1 shlomiv shlomiv 94078982 May 29 19:00 _ag.frq
-rw-rw-r-- 1 shlomiv shlomiv 3338180 May 29 19:00 _ag.nrm
-rw-rw-r-- 1 shlomiv shlomiv 3123508 May 29 19:00 _ai.fdx
-rw-rw-r-- 1 shlomiv shlomiv 15925878 May 29 19:00 _ai.fdt
-rw-rw-r-- 1 shlomiv shlomiv 614128 May 29 19:00 _ai.tis
-rw-rw-r-- 1 shlomiv shlomiv 8534 May 29 19:00 _ai.tii
-rw-rw-r-- 1 shlomiv shlomiv 1811166 May 29 19:00 _ai.prx
-rw-rw-r-- 1 shlomiv shlomiv 390442 May 29 19:00 _ai.nrm
-rw-rw-r-- 1 shlomiv shlomiv 10911623 May 29 19:00 _ai.frq
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _ai.fnm
-rw-rw-r-- 1 shlomiv shlomiv 3217708 May 29 19:00 _aj.fdx
-rw-rw-r-- 1 shlomiv shlomiv 15914211 May 29 19:00 _aj.fdt
-rw-rw-r-- 1 shlomiv shlomiv 577853 May 29 19:00 _aj.tis
-rw-rw-r-- 1 shlomiv shlomiv 7902 May 29 19:00 _aj.tii
-rw-rw-r-- 1 shlomiv shlomiv 1874837 May 29 19:00 _aj.prx
-rw-rw-r-- 1 shlomiv shlomiv 11189671 May 29 19:00 _aj.frq
-rw-rw-r-- 1 shlomiv shlomiv 402217 May 29 19:00 _aj.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _aj.fnm
-rw-rw-r-- 1 shlomiv shlomiv 671412 May 29 19:00 _ak.fdx
-rw-rw-r-- 1 shlomiv shlomiv 3394019 May 29 19:00 _ak.fdt
-rw-rw-r-- 1 shlomiv shlomiv 232124 May 29 19:00 _ak.tis
-rw-rw-r-- 1 shlomiv shlomiv 3258 May 29 19:00 _ak.tii
-rw-rw-r-- 1 shlomiv shlomiv 390852 May 29 19:00 _ak.prx
-rw-rw-r-- 1 shlomiv shlomiv 83930 May 29 19:00 _ak.nrm
-rw-rw-r-- 1 shlomiv shlomiv 2324403 May 29 19:00 _ak.frq
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _ak.fnm
-rw-rw-r-- 1 shlomiv shlomiv 2320804 May 29 19:00 _al.fdx
-rw-rw-r-- 1 shlomiv shlomiv 11751843 May 29 19:00 _al.fdt
-rw-rw-r-- 1 shlomiv shlomiv 919429 May 29 19:00 _al.tis
-rw-rw-r-- 1 shlomiv shlomiv 12504 May 29 19:00 _al.tii
-rw-rw-r-- 1 shlomiv shlomiv 1279214 May 29 19:00 _al.prx
-rw-rw-r-- 1 shlomiv shlomiv 8255522 May 29 19:00 _al.frq
-rw-rw-r-- 1 shlomiv shlomiv 290104 May 29 19:00 _al.nrm
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _al.fnm
-rw-rw-r-- 1 shlomiv shlomiv 2432020 May 29 19:00 _am.fdx
-rw-rw-r-- 1 shlomiv shlomiv 12765052 May 29 19:00 _am.fdt
-rw-rw-r-- 1 shlomiv shlomiv 885852 May 29 19:00 _am.tis
-rw-rw-r-- 1 shlomiv shlomiv 12305 May 29 19:00 _am.tii
-rw-rw-r-- 1 shlomiv shlomiv 1360925 May 29 19:00 _am.prx
-rw-rw-r-- 1 shlomiv shlomiv 304006 May 29 19:00 _am.nrm
-rw-rw-r-- 1 shlomiv shlomiv 8638946 May 29 19:00 _am.frq
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _am.fnm
-rw-rw-r-- 1 shlomiv shlomiv 2304708 May 29 19:00 _an.fdx
-rw-rw-r-- 1 shlomiv shlomiv 12215320 May 29 19:00 _an.fdt
-rw-rw-r-- 1 shlomiv shlomiv 740581 May 29 19:00 _an.tis
-rw-rw-r-- 1 shlomiv shlomiv 10122 May 29 19:00 _an.tii
-rw-rw-r-- 1 shlomiv shlomiv 1325492 May 29 19:00 _an.prx
-rw-rw-r-- 1 shlomiv shlomiv 288092 May 29 19:00 _an.nrm
-rw-rw-r-- 1 shlomiv shlomiv 8201619 May 29 19:00 _an.frq
-rw-rw-r-- 1 shlomiv shlomiv 24 May 29 19:00 _an.fnm
-rw-rw-r-- 1 shlomiv shlomiv 20 May 29 19:00 segments.gen
-rw-rw-r-- 1 shlomiv shlomiv 2888 May 29 19:00 segments_37
drwxrwxr-x 2 shlomiv shlomiv 4096 May 29 19:04 .

what do you think i should try next?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/6j0E-2pTbWg/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Israel, sorry for any inconvenience my thread has caused you.

now back to the really annoying results:

ES version :

curl -XPOST 'http://host:9200/test/_optimize?max_num_segments=1'
{"ok":true,"_shards":{"total":0,"successful":0,"failed":0}}

ls -ltra
total 16623176
drwxr-xr-x 5 elasticsearch elasticsearch 4096 May 27 17:34 ..
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:16 _1s0.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 240390660 May 27 18:16 _1s0.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 2178157235 May 27 18:16 _1s0.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 742546522 May 27 18:17 _1s0.tis
-rw-r--r-- 1 elasticsearch elasticsearch 7152131 May 27 18:17 _1s0.tii
-rw-r--r-- 1 elasticsearch elasticsearch 440466009 May 27 18:17 _1s0.prx
-rw-r--r-- 1 elasticsearch elasticsearch 1017914310 May 27 18:17 _1s0.frq
-rw-r--r-- 1 elasticsearch elasticsearch 30048836 May 27 18:17 _1s0.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 27 18:38 _2oj.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 2149916547 May 27 18:38 _2oj.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 238283772 May 27 18:38 _2oj.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 735613612 May 27 18:39 _2oj.tis
-rw-r--r-- 1 elasticsearch elasticsearch 7082393 May 27 18:39 _2oj.tii
-rw-r--r-- 1 elasticsearch elasticsearch 434339734 May 27 18:39 _2oj.prx
-rw-r--r-- 1 elasticsearch elasticsearch 1005557319 May 27 18:39 _2oj.frq
-rw-r--r-- 1 elasticsearch elasticsearch 29785475 May 27 18:39 _2oj.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 0 May 30 11:49 write.lock
-rw-r--r-- 1 elasticsearch elasticsearch 31 May 30 11:50 _37c.fnm
-rw-r--r-- 1 elasticsearch elasticsearch 402061692 May 30 11:50 _37c.fdx
-rw-r--r-- 1 elasticsearch elasticsearch 3636925770 May 30 11:50 _37c.fdt
-rw-r--r-- 1 elasticsearch elasticsearch 1229530031 May 30 11:52 _37c.tis
-rw-r--r-- 1 elasticsearch elasticsearch 11770457 May 30 11:52 _37c.tii
-rw-r--r-- 1 elasticsearch elasticsearch 735561692 May 30 11:52 _37c.prx
-rw-r--r-- 1 elasticsearch elasticsearch 1698617265 May 30 11:52 _37c.frq
-rw-r--r-- 1 elasticsearch elasticsearch 50257715 May 30 11:53 _37c.nrm
-rw-r--r-- 1 elasticsearch elasticsearch 828 May 30 11:53 segments_4n
-rw-r--r-- 1 elasticsearch elasticsearch 20 May 30 11:53
segments.gen
-rw-r--r-- 1 elasticsearch elasticsearch 138 May 30 11:53
_checksums-1369903982814
drwxr-xr-x 2 elasticsearch elasticsearch 20480 May 30 11:53 .

java version after optimize to normal file format setting max_segments=1:

ls -ltr

total 8759876
-rw-rw-r-- 1 shlomiv shlomiv 24 May 30 11:34 _ao.fnm
-rw-rw-r-- 1 shlomiv shlomiv 883151756 May 30 11:35 _ao.fdx
-rw-rw-r-- 1 shlomiv shlomiv 4343895906 May 30 11:35 _ao.fdt
-rw-rw-r-- 1 shlomiv shlomiv 14132289 May 30 11:36 _ao.tis
-rw-rw-r-- 1 shlomiv shlomiv 197431 May 30 11:36 _ao.tii
-rw-rw-r-- 1 shlomiv shlomiv 506303552 May 30 11:36 _ao.prx
-rw-rw-r-- 1 shlomiv shlomiv 3111989398 May 30 11:36 _ao.frq
-rw-rw-r-- 1 shlomiv shlomiv 110393973 May 30 11:36 _ao.nrm
-rw-rw-r-- 1 shlomiv shlomiv 285 May 30 11:36 segments_38
-rw-rw-r-- 1 shlomiv shlomiv 20 May 30 11:36 segments.gen

still twice the size, after optimization. Israel is right, this saga is
really annoying :slight_smile:

thanks a lot for your patience, i think its important to understand the
cause of this size increase, and not just for my sake

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.