I am Ingesting data as Nested Objects using type=nested and include_in_parent=true. I am ingesting event objects where each event object has 3000 nested objects i.e. 3000 nested documents. I insert several of these objects in ElasticSearch. Ingestion happens almost instantly and is not an issue.
However, when I am trying to search this data using Kibana or the JAVA API from Elasticsearch, it takes around 10-11 seconds to read each nested object. I was wondering is there a size limit when creating/inserting nested objects ?
Any pointers on this would be highly appreciated. The reason I used nested type is because I have to insert 3000 objects/message and have an ingestion rate of 40 messages/sec to be Ingested. When I use non-nested objects, i.e. ingest each of the 3000 objects as seperate objects, the Ingestion rate is very slow and allows me to ingest only 10 messages/sec where Ingesting it as Nested allows me to Ingest 20 messages/sec.
System : Ubuntu, 15GB dedicated to ElasticSearch, 64GB disk, Virtual Machine having 10 CPU
If you are having troubles while displaying a resultset, try to decrease the value of discover:sampleSize.
I found that for very big documents, most of the time is spent on the network to get a very BIG JSON doc.
Thank you very much for your reply. I am using the Bulk API for Ingesting the data for both the nested and non-nested test cases. AIso, I initially thought Kibana was the issue, but now even when I am using the JAVA API (multisearch API) to load the results, without sending it to Kibana, just the load time from Elasticsearch is 10-11 seconds.
To be honest it feels like an abuse of the nested capability to use it to improve indexing throughput. Whatever benefits it's giving you at write time you are likely to pay some costs at read time when you want to retrieve individual docs or perform analysis.
It's probably better to concentrate effort on figuring out why non-nested is only indexing 10 "messages" per second. Can you say more about the shape and number of these docs? Do you supply IDs for them?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.