Lucene vs elasticsearch file size

Hi,

I ran identical indexing tests for elasticsearch and lucene and the size of
the index in elasticsearch is 3 times more than Lucene.

What is the explanation for this?

We're considering moving from Lucene to elasticsearch and the files size is
important for a huge indexing system.

Thanks,

Ophir

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Ophir,

Do you use all defaults?

If so, you should be aware that Elasticsearch stores your index, but also your source document as is and creates one replica of each shard (not sure your are concerned for this if you started a single node).

What version do you use? Did you enable compression? Starting from 0.90, compression is now always enabled. http://www.elasticsearch.org/guide/reference/index-modules/store.html

Does it start to answer?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 27 févr. 2013 à 14:26, Ophir Michaeli ophirmichaeli@gmail.com a écrit :

Hi,

I ran identical indexing tests for elasticsearch and lucene and the size of the index in elasticsearch is 3 times more than Lucene.
What is the explanation for this?
We're considering moving from Lucene to elasticsearch and the files size is important for a huge indexing system.

Thanks,
Ophir

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I'm not using defaults, this is the .net Nest code for index creation:

MapAndAnalyze(

            a => a.Analyzers(an => an.Add(snowball, new SnowballAnalyzer{ Language = language })),

            m => m

                     .Properties(p => p

                                          .String(sm => sm

                                                            .Name(f => 

f.Description)

.IndexAnalyzer(snowball)

.SearchAnalyzer(snowball)

                                                            .Store(false

)

                                          )

                                          .String(sm => sm.Name(f => 

f.Board).Store(false))

                                          .String(sm => sm.Name(f => 

f.User).Store(false))

                                          .String(sm => sm.Name(f => 

f.IDPin).Store())

                                          .String(sm => sm.Name(f => 

f.IDPicture).Store())

                     )

                     .DisableAllField()

            );

I'm not saving "All" field (_source) and store only 2 fields (as I do with
lucene).

From what I saw working with 1 replicas and then with 1 shard, the 1
replica was not created till I opened a new node.

I'm using elasticsearch version 0.20.2.

Is compression default?
Thanks!
On Wednesday, February 27, 2013 4:03:41 PM UTC+2, David Pilato wrote:

Hi Ophir,

Do you use all defaults?

If so, you should be aware that Elasticsearch stores your index, but also
your source document as is and creates one replica of each shard (not sure
your are concerned for this if you started a single node).

What version do you use? Did you enable compression? Starting from 0.90,
compression is now always enabled.
http://www.elasticsearch.org/guide/reference/index-modules/store.html

Does it start to answer?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

Le 27 févr. 2013 à 14:26, Ophir Michaeli <ophirm...@gmail.com<javascript:>>
a écrit :

Hi,

I ran identical indexing tests for elasticsearch and lucene and the size
of the index in elasticsearch is 3 times more than Lucene.
What is the explanation for this?
We're considering moving from Lucene to elasticsearch and the files size
is important for a huge indexing system.

Thanks,
Ophir

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

_all and _source are different fields.
http://www.elasticsearch.org/guide/reference/mapping/all-field.html
http://www.elasticsearch.org/guide/reference/mapping/source-field.html

Compression before 0.90 is not by default.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 27 févr. 2013 à 15:38, Ophir Michaeli ophirmichaeli@gmail.com a écrit :

I'm not using defaults, this is the .net Nest code for index creation:
MapAndAnalyze(
a => a.Analyzers(an => an.Add(snowball, new SnowballAnalyzer { Language = language })),
m => m
.Properties(p => p
.String(sm => sm
.Name(f => f.Description)
.IndexAnalyzer(snowball)
.SearchAnalyzer(snowball)
.Store(false)
)
.String(sm => sm.Name(f => f.Board).Store(false))
.String(sm => sm.Name(f => f.User).Store(false))
.String(sm => sm.Name(f => f.IDPin).Store())
.String(sm => sm.Name(f => f.IDPicture).Store())
)
.DisableAllField()
);

I'm not saving "All" field (_source) and store only 2 fields (as I do with lucene).
From what I saw working with 1 replicas and then with 1 shard, the 1 replica was not created till I opened a new node.
I'm using elasticsearch version 0.20.2.
Is compression default?
Thanks!
On Wednesday, February 27, 2013 4:03:41 PM UTC+2, David Pilato wrote:
Hi Ophir,

Do you use all defaults?

If so, you should be aware that Elasticsearch stores your index, but also your source document as is and creates one replica of each shard (not sure your are concerned for this if you started a single node).

What version do you use? Did you enable compression? Starting from 0.90, compression is now always enabled. http://www.elasticsearch.org/guide/reference/index-modules/store.html

Does it start to answer?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 27 févr. 2013 à 14:26, Ophir Michaeli ophirm...@gmail.com a écrit :

Hi,

I ran identical indexing tests for elasticsearch and lucene and the size of the index in elasticsearch is 3 times more than Lucene.
What is the explanation for this?
We're considering moving from Lucene to elasticsearch and the files size is important for a huge indexing system.

Thanks,
Ophir

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey Ophir,

as David said, you should make sure you disable the _source field otherwise
ES will store the actual json used to create (index) your document. If you
run those kind of comparisons make sure you optimize the index afterwards
(force merge) otherwise your segments might be of different size.

simon

On Wednesday, February 27, 2013 4:03:07 PM UTC+1, David Pilato wrote:

_all and _source are different fields.
http://www.elasticsearch.org/guide/reference/mapping/all-field.html
http://www.elasticsearch.org/guide/reference/mapping/source-field.html

Compression before 0.90 is not by default.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

Le 27 févr. 2013 à 15:38, Ophir Michaeli <ophirm...@gmail.com<javascript:>>
a écrit :

I'm not using defaults, this is the .net Nest code for index creation:
MapAndAnalyze(
a => a.Analyzers(an => an.Add(snowball, new
SnowballAnalyzer { Language = language })),
m => m
.Properties(p => p
.String(sm => sm
.Name(f
=> f.Description)

.IndexAnalyzer(snowball)

.SearchAnalyzer(snowball)
.Store(
false)
)
.String(sm => sm.Name(f =>
f.Board).Store(false))
.String(sm => sm.Name(f =>
f.User).Store(false))
.String(sm => sm.Name(f =>
f.IDPin).Store())
.String(sm => sm.Name(f =>
f.IDPicture).Store())
)
.DisableAllField()
);

I'm not saving "All" field (_source) and store only 2 fields (as I do with
lucene).
From what I saw working with 1 replicas and then with 1 shard, the 1
replica was not created till I opened a new node.
I'm using elasticsearch version 0.20.2.
Is compression default?
Thanks!
On Wednesday, February 27, 2013 4:03:41 PM UTC+2, David Pilato wrote:

Hi Ophir,

Do you use all defaults?

If so, you should be aware that Elasticsearch stores your index, but also
your source document as is and creates one replica of each shard (not sure
your are concerned for this if you started a single node).

What version do you use? Did you enable compression? Starting from 0.90,
compression is now always enabled.
http://www.elasticsearch.org/guide/reference/index-modules/store.html

Does it start to answer?

--
David Pilato | Technical Advocate | *Elasticsearch.comhttp://elasticsearch.com/
*
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

Le 27 févr. 2013 à 14:26, Ophir Michaeli ophirm...@gmail.com a écrit :

Hi,

I ran identical indexing tests for elasticsearch and lucene and the size
of the index in elasticsearch is 3 times more than Lucene.
What is the explanation for this?
We're considering moving from Lucene to elasticsearch and the files size
is important for a huge indexing system.

Thanks,
Ophir

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.