My data is only 700 MB, one index, 376824 documents.
I'm querying 100 words linearly on python and it is taking over 2 seconds for all 100 words.
My data has 92 fields and 330k rows.
for word in var:
response = esclient.search(
index='patientdb',
body={
"size": 2,
"query": {
"match": {
"_all": word
}
}
}
)
var has around 100 words.
So basically I'm matching all the fields for word.
It is using only 55% of CPU and 40% of my memory. I have four cores.
Why is it taking so long?
So you are issuing around 100 queries in sequence from a single thread, meaning that the time it takes to execute each query, together with parsing and network round-trip time is around 20ms. What are you trying to achieve using this set of queries? Is there maybe some way to do that with fewer requests, e.g. by restructuring the query or simply send it all in a single request using the multi-search API?
I'm just trying to gauge the performance. There is no correlation between the searches. I have just copied 100 first names from the database and searching them.
My end game is to create as search-as-you-type using elasticsearch. Still figuring it out how to go about it.
I'm doing 100 queries at a time because once i build it, ill be getting 100 requests per second and i cannot afford 2s wait time. It is way too expensive. And right now I'm testing on just 330k rows, ill be having around 5 million rows. Hence I've this issue.
If you are serving search requests for multiple users I assume these would be sent to Elasticsearch in parallel, and not from a single thread as in your example. If you want to see how Elasticsearch performs in a certain production scenario, you need to try to simulate the load as accurately as possible. I would therefore recommend starting to issue queries from a number of threads and see how that performs. You can also use Rally for these kind of benchmarks.
So I had this issue that how many indices should I have?
How do I decide that?
Because more indices means more shards means it'll perform better when I query in parallel.
How do I form multiple indices while I have only one CSV file with 4 million rows?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.