For example, I want to search in 100,000 documents from each index, and it is not possible to add to that, and they are first loaded into the cache, then only this data is searched
If you have a timestamp field, you can filter on that field to search only in the now/m-15m documents.
Yes, I have a timestamp field. But it does not solve the problem, I want the search to be on the latest data only, meaning that the search is always on the latest data, for example, I have a lot of data in the hard drive and I want only part of this data to be uploaded to the cache automatically and then searched for it, but the rest of the data remains in the hard drive
But I want that to be taking into account that new data is always entered
What "cache" are you talking about?
RAM memory
i want execute query in last 100000 doc that insert to database in all time if query does not match any doc in last 100000 doc return null
I'm not sure you really can do that only on the last 100000 last docs, unless you have a number which tells the position of the document.
But I'm not sure why would you need to do that. Elasticsearch can search within millions of docs without any problem.
I don't understand the use case.
Because I don't have a large cache to process a large number so I want to do hashing so that the processing is faster
Is there a similar way or approach to what I want, for example, specific settings or a specific code close to my request, such as merging the timestamp field with something else so that it is done better
Another reason is that I don't want to be searched in a larger range than I want.
For example, I want the search to be in 100,000 documents, so I don't want it to go beyond that if the result matches, unless the result is that there is no result
What makes you think that Elasticsearch won't be efficient?
I still don't understand the business reason for your use case. I think that you are trying to solve a problem which actually does not exist.
Do you see any memory error in Elasticsearch logs? If so, please share what you are seeing.
Because I have a lot of data if I query it at once it will take some time and I don't want to be late I want the query to be very fast
Therefore, I would like to limit the search to part of the data, which is only recent data
How many indices and shards are you querying? What is the size of these indices and shards? What is the size and specification of the cluster?
What latencies are you seeing when you query the data?
No, there is nothing wrong
I'll explain to you.
For example, if you used the timestamp field and had it return to me the last 100,000 stored documents
Well, he will search the indexes until all the required data is collected
But here he's going to search all the data, and I don't want that.
I want him to only look at the extent that I specify and not exceed it if he finds the matching data in the first 100,000 documents he returns the result otherwise he returns blank
The times are big for example if you search for 10000 documents the time is 6000ms and more and sometimes less by a small difference
As for the number of indicators and parts, I do not know how many they are precisely. But I am looking in only one index and I think that the number of its parts does not exceed 2
To explain to you more
If I all have 1,000,000 documents and I don't want these documents to be searched in full, I want to search only a part of them, for example, 10,000 documents
When I want to inquire about 1000 documents, I do not want him to search in 1000000 documents, but I want him to search in 10000 documents only and not exceed it if he finds a matching result he returns it and if he does not find identical data in 10000 he returns null
You have a very big problem if it takes 6s to search for 10000 docs. I'm not speaking about extracting 10000 documents which is another story.
You need to be precise here. Please share what your query looks like and what are the first 10 lines of the response.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.