With this below setting and mapping I indexed a text file of 1.3MB and now my index size in data/clusters_name/nodes/ size is almost 4 times. What is happening? what settings am I missing?
This is giving 9.5 kb of index size. In which the base64 content is just 48 bytes. Of course _all is enabled by default or content is stored and even the base64 content is stored or not.
I just want to index a file on the local fileSystem with title and content enabling highlighting. Can you please point out how you do this with the index size factor of 1 or less.
Well. You have also some metadata so with a small content like this you can not really measure anything.
Also, you have to know that stored fields are compressed. Here again it depends on the effective compress ratio.
What I recommend is to do extraction of text on your end or use elasticsearch 5.0 and the ingest-attachment plugin (not suitable for production yet).
@dadoonet
So, will Elasticsearch 5 (with Ingestion node / Ingestion-attachment plugin) became a valid and robust choice as primary repository? (blob content)
thanks
Gianni
Ingest node does not mean that we will store the "blob" in elasticsearch. That's totally the opposite. We will extract from the binary only the needed text and will index that text only.
Thanks @dadoonet for your answer.
Currently we are using mapper-attachment plugin in a big environment so the binary is in the _source and Elasticsearch is primary and only repository.
In this new scenario, will the blob continue to be stored in the same way?
The text extraction process in the ingestion node is asynchronous to the post?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.