Lucene vs elasticsearch file size

Ophir_Michaeli · February 27, 2013, 1:26pm

Hi,

I ran identical indexing tests for elasticsearch and lucene and the size of
the index in elasticsearch is 3 times more than Lucene.

What is the explanation for this?

We're considering moving from Lucene to elasticsearch and the files size is
important for a huge indexing system.

Thanks,

Ophir

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · February 27, 2013, 2:03pm

Hi Ophir,

Do you use all defaults?

If so, you should be aware that Elasticsearch stores your index, but also your source document as is and creates one replica of each shard (not sure your are concerned for this if you started a single node).

What version do you use? Did you enable compression? Starting from 0.90, compression is now always enabled. Elasticsearch Platform — Find real-time answers at scale | Elastic

Does it start to answer?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 27 févr. 2013 à 14:26, Ophir Michaeli ophirmichaeli@gmail.com a écrit :

Hi,

I ran identical indexing tests for elasticsearch and lucene and the size of the index in elasticsearch is 3 times more than Lucene.
What is the explanation for this?
We're considering moving from Lucene to elasticsearch and the files size is important for a huge indexing system.

Thanks,
Ophir

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ophir_Michaeli · February 27, 2013, 2:38pm

I'm not using defaults, this is the .net Nest code for index creation:

MapAndAnalyze(

            a => a.Analyzers(an => an.Add(snowball, new SnowballAnalyzer{ Language = language })),

            m => m

                     .Properties(p => p

                                          .String(sm => sm

                                                            .Name(f =>

f.Description)

.IndexAnalyzer(snowball)

.SearchAnalyzer(snowball)

                                                            .Store(false

)

                                          )

                                          .String(sm => sm.Name(f =>

f.Board).Store(false))

                                          .String(sm => sm.Name(f =>

f.User).Store(false))

                                          .String(sm => sm.Name(f =>

f.IDPin).Store())

                                          .String(sm => sm.Name(f =>

f.IDPicture).Store())

                     )

                     .DisableAllField()

            );

I'm not saving "All" field (_source) and store only 2 fields (as I do with
lucene).

From what I saw working with 1 replicas and then with 1 shard, the 1
replica was not created till I opened a new node.

I'm using elasticsearch version 0.20.2.

Is compression default?
Thanks!
On Wednesday, February 27, 2013 4:03:41 PM UTC+2, David Pilato wrote:

Hi Ophir,

Do you use all defaults?

If so, you should be aware that Elasticsearch stores your index, but also
your source document as is and creates one replica of each shard (not sure
your are concerned for this if you started a single node).

What version do you use? Did you enable compression? Starting from 0.90,
compression is now always enabled.
Elasticsearch Platform — Find real-time answers at scale | Elastic

Does it start to answer?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

Le 27 févr. 2013 à 14:26, Ophir Michaeli <ophirm...@gmail.com<javascript:>>
a écrit :

Hi,

I ran identical indexing tests for elasticsearch and lucene and the size
of the index in elasticsearch is 3 times more than Lucene.
What is the explanation for this?
We're considering moving from Lucene to elasticsearch and the files size
is important for a huge indexing system.

Thanks,
Ophir

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · February 27, 2013, 3:03pm

_all and _source are different fields.

Compression before 0.90 is not by default.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 27 févr. 2013 à 15:38, Ophir Michaeli ophirmichaeli@gmail.com a écrit :

I'm not using defaults, this is the .net Nest code for index creation:
MapAndAnalyze(
a => a.Analyzers(an => an.Add(snowball, new SnowballAnalyzer { Language = language })),
m => m
.Properties(p => p
.String(sm => sm
.Name(f => f.Description)
.IndexAnalyzer(snowball)
.SearchAnalyzer(snowball)
.Store(false)
)
.String(sm => sm.Name(f => f.Board).Store(false))
.String(sm => sm.Name(f => f.User).Store(false))
.String(sm => sm.Name(f => f.IDPin).Store())
.String(sm => sm.Name(f => f.IDPicture).Store())
)
.DisableAllField()
);

I'm not saving "All" field (_source) and store only 2 fields (as I do with lucene).
From what I saw working with 1 replicas and then with 1 shard, the 1 replica was not created till I opened a new node.
I'm using elasticsearch version 0.20.2.
Is compression default?
Thanks!
On Wednesday, February 27, 2013 4:03:41 PM UTC+2, David Pilato wrote:
Hi Ophir,

Do you use all defaults?

If so, you should be aware that Elasticsearch stores your index, but also your source document as is and creates one replica of each shard (not sure your are concerned for this if you started a single node).

What version do you use? Did you enable compression? Starting from 0.90, compression is now always enabled. Elasticsearch Platform — Find real-time answers at scale | Elastic

Does it start to answer?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 27 févr. 2013 à 14:26, Ophir Michaeli ophirm...@gmail.com a écrit :

Hi,

I ran identical indexing tests for elasticsearch and lucene and the size of the index in elasticsearch is 3 times more than Lucene.
What is the explanation for this?
We're considering moving from Lucene to elasticsearch and the files size is important for a huge indexing system.

Thanks,
Ophir

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

simonw_2 · February 28, 2013, 8:07am

Hey Ophir,

as David said, you should make sure you disable the _source field otherwise
ES will store the actual json used to create (index) your document. If you
run those kind of comparisons make sure you optimize the index afterwards
(force merge) otherwise your segments might be of different size.

simon

On Wednesday, February 27, 2013 4:03:07 PM UTC+1, David Pilato wrote:

_all and _source are different fields.
Elasticsearch Platform — Find real-time answers at scale | Elastic
Elasticsearch Platform — Find real-time answers at scale | Elastic

Compression before 0.90 is not by default.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

Le 27 févr. 2013 à 15:38, Ophir Michaeli <ophirm...@gmail.com<javascript:>>
a écrit :

I'm not using defaults, this is the .net Nest code for index creation:
MapAndAnalyze(
a => a.Analyzers(an => an.Add(snowball, new
SnowballAnalyzer { Language = language })),
m => m
.Properties(p => p
.String(sm => sm
.Name(f
=> f.Description)

.IndexAnalyzer(snowball)

.SearchAnalyzer(snowball)
.Store(
false)
)
.String(sm => sm.Name(f =>
f.Board).Store(false))
.String(sm => sm.Name(f =>
f.User).Store(false))
.String(sm => sm.Name(f =>
f.IDPin).Store())
.String(sm => sm.Name(f =>
f.IDPicture).Store())
)
.DisableAllField()
);

I'm not saving "All" field (_source) and store only 2 fields (as I do with
lucene).
From what I saw working with 1 replicas and then with 1 shard, the 1
replica was not created till I opened a new node.
I'm using elasticsearch version 0.20.2.
Is compression default?
Thanks!
On Wednesday, February 27, 2013 4:03:41 PM UTC+2, David Pilato wrote:

Hi Ophir,

Do you use all defaults?

If so, you should be aware that Elasticsearch stores your index, but also
your source document as is and creates one replica of each shard (not sure
your are concerned for this if you started a single node).

What version do you use? Did you enable compression? Starting from 0.90,
compression is now always enabled.
Elasticsearch Platform — Find real-time answers at scale | Elastic

Does it start to answer?

--
David Pilato | Technical Advocate | *Elasticsearch.com http://elasticsearch.com/
*
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

Le 27 févr. 2013 à 14:26, Ophir Michaeli ophirm...@gmail.com a écrit :

Hi,

I ran identical indexing tests for elasticsearch and lucene and the size
of the index in elasticsearch is 3 times more than Lucene.
What is the explanation for this?
We're considering moving from Lucene to elasticsearch and the files size
is important for a huge indexing system.

Thanks,
Ophir

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Elastic search has less results than lucene Elasticsearch	3	320	July 6, 2017
Total.store.size_in_bytes measures what? Elasticsearch	6	4871	July 6, 2017
Indices size Elasticsearch	4	576	July 6, 2017
Elasticsearch index size is less than it takes on disk Elasticsearch	4	2800	December 7, 2017
When is Elastic Serach with Lucene 4.x going to be released? Any ideas Elasticsearch	3	312	July 6, 2017

Lucene vs elasticsearch file size

Related topics