In my quest to switch from Sphinx to ElasticSearch again, I have found that
the size on disk of the indices is about 4x time bigger for our
ElasticSearch compared to our Sphinx files. The actual size I have is 82gb
for 166M documents or about 2000doc/mb. In Sphinx we were able to store
about 8000doc/mb. I'm a little worried about IO usage on my node disks
about those large files. Plus I found this (kind of old) article; http://zooie.wordpress.com/2009/07/06/a-comparison-of-open-source-search-engines-and-indexing-twitter/saying that Lucene indices are smaller than Sphinx one ? Does anyone have
and ideas why ElasticSearch indices would be that much bigger than Sphinx
(and Lucene) one ?
I already have
"_all" : {"enabled" : false}
"_source":{"enabled":false}
and I'm storing only 4 fields; 3 long and 1 integer
The biggest files are .frq and .tis as the part one of my index shows;
on your index and see if it will reduce the index size.
Which version of elasticsearch are you using?
On Monday, October 22, 2012 10:24:22 AM UTC-4, Jérôme Gagnon wrote:
Good Morning,
In my quest to switch from Sphinx to Elasticsearch again, I have found
that the size on disk of the indices is about 4x time bigger for our
Elasticsearch compared to our Sphinx files. The actual size I have is 82gb
for 166M documents or about 2000doc/mb. In Sphinx we were able to store
about 8000doc/mb. I'm a little worried about IO usage on my node disks
about those large files. Plus I found this (kind of old) article; http://zooie.wordpress.com/2009/07/06/a-comparison-of-open-source-search-engines-and-indexing-twitter/saying that Lucene indices are smaller than Sphinx one ? Does anyone have
and ideas why Elasticsearch indices would be that much bigger than Sphinx
(and Lucene) one ?
I already have
"_all" : {"enabled" : false}
"_source":{"enabled":false}
and I'm storing only 4 fields; 3 long and 1 integer
The biggest files are .frq and .tis as the part one of my index shows;
on your index and see if it will reduce the index size.
Which version of elasticsearch are you using?
On Monday, October 22, 2012 10:24:22 AM UTC-4, Jérôme Gagnon wrote:
Good Morning,
In my quest to switch from Sphinx to Elasticsearch again, I have found
that the size on disk of the indices is about 4x time bigger for our
Elasticsearch compared to our Sphinx files. The actual size I have is 82gb
for 166M documents or about 2000doc/mb. In Sphinx we were able to store
about 8000doc/mb. I'm a little worried about IO usage on my node disks about
those large files. Plus I found this (kind of old) article; A Comparison of Open Source Search Engines – Vik's Blog
saying that Lucene indices are smaller than Sphinx one ? Does anyone have
and ideas why Elasticsearch indices would be that much bigger than Sphinx
(and Lucene) one ?
I already have
"_all" : {"enabled" : false}
"_source":{"enabled":false}
and I'm storing only 4 fields; 3 long and 1 integer
The biggest files are .frq and .tis as the part one of my index shows;
Tried optimize things, did near to nothing to the size... Upgraded to
0.20.RC1 removed frequencies on some fields and played with precision_step
on low cardinality fields, but I think that there is only peanuts to win
with precision_step
On Monday, October 22, 2012 10:24:22 AM UTC-4, Jérôme Gagnon wrote:
Good Morning,
In my quest to switch from Sphinx to Elasticsearch again, I have found
that the size on disk of the indices is about 4x time bigger for our
Elasticsearch compared to our Sphinx files. The actual size I have is 82gb
for 166M documents or about 2000doc/mb. In Sphinx we were able to store
about 8000doc/mb. I'm a little worried about IO usage on my node disks
about those large files. Plus I found this (kind of old) article; http://zooie.wordpress.com/2009/07/06/a-comparison-of-open-source-search-engines-and-indexing-twitter/saying that Lucene indices are smaller than Sphinx one ? Does anyone have
and ideas why Elasticsearch indices would be that much bigger than Sphinx
(and Lucene) one ?
I already have
"_all" : {"enabled" : false}
"_source":{"enabled":false}
and I'm storing only 4 fields; 3 long and 1 integer
The biggest files are .frq and .tis as the part one of my index shows;
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.