Hi,
i need help how to create one search query in Elasticsearch.
Image we have index (login of clients) with 2 columns -> timestamp (when action happens) and user id.
What i would like to find what users have login after some date (for example this year) but not before this date (not previous years) .
In sql query would look like this:
SELECT userId
FROM A
WHERE A.timestamp > [DATE]
AND NOT IN (SELECT * FROM A WHERE A.timestamp < [DATE])
... and perfect would be, if returned 'userId' would be unique
I was initially thinking about first getting all users since the cutoff date in one query and then lookup which of these that have been seen previously in another query. If the cutoff timestamp is not recent and the first query results in a lot of users, this will however be very expensive and inefficient.
The best way to get the new users since a specific date might be to create an entity-centric index. If you had a separate index with one record per user (user id as key), which contain the timestamp of the users first activity, you could easily use this to get new users after a specific timestamp..
Hi,
entity-centric index is really good idea. But i would like also try option with "nested query".
I was try to create such query (first query to get users since cutoff date and lookup for users which have NOT been seen previously) -> but without no success. Can you give me any hint ?
Hi,
query should be executed (fortunately) over "subset" of all our data (in special index).
Every day there is increment of ~500 entries (and i have data for 3 years) -> so till today it is 600 000 entries.
Count of different users is less than 100 000.
I think you may need to do this through a scripted update as you want to set fields based on the content of the current document in the index, not just the one you are inserting. The Reindex API does however as far as I know not support scripted updates, so it may need to be done outside of Elasticsearch as outlined in this post on entity-centric indices.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.