Hi.
We are trying to do the following and any help would be appreciated.
Say you make a search and 100,000 documents match.
We would like to increment a counter in each document that matched. Then at the same time select the first page say the first 50.
Can this be done in one operation or may be a parallel scenario.
As I understand your problem you can solve it by doing 2 queries.
base_query={"query":{"whateverfilter"}}
query = base_query + {"size": 50, "from": page} # here you add your limit as you want only 50 result on page 1
result = query.run() # you run your query that you'll display result on your first page 50 doc
keeping the same base query that you use for your search and apply it to update all your docs.
update_query={"script": {
"source": "ctx._source.match_docs_increment++",
"lang": "painless"
},
"query": {base_query}
update_query.run() # run the update
If you have few traffic on search this one can be ok but it will add load on your server as you'll update your documents each time you make a search.
Is it to set a weight on your document to sort on the most popular?
If it's just for statistics you can dump the result of your request in a file and set a filebeat to send them in elastic in a different index or server... Depends on what you want to do with.
Many thanks for your response.
This is in fact for statistics purposes.
Running 2 queries will indeed be problematic.
Any chance you could help us achieve it using the most efficient method. Of course we will pay for your consultancy service
I can help but can't provide you a service as consultant.
Which language, framework are you using?
My solution is pretty simple, just send the content to a log file it can be done in 2~3 lines of code, maybe less depend on your framework, then you can use filebeat to parse the logs and store them in elastic. I think this solution don't need deep technical skill as filebeat is really easy to use.
Hi,
Thanks for you response.
We are using the NEST library in an MVC Core application.
Unfortunately we have never used filebeat and we may need more time than a couple of lines of code for someone who has more experience. Is there any one you can recommend who assist us achieve this task?
I can't think of a way in which you can efficiently do this in one operation* (well, one request).
Update by query API would be the most logical way to increment a counter on 100,000 documents, however it does not return the ids of documents that were updated, so you wouldn't be able to collect the first 50 documents and return those.
I think two requests, a search query that returns the first 50 documents, and an update by query to increment counters executed at the same time would be the straightforward way to approach this.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.