How to query for "max" value?


(Jürgen kartnaller) #1

Hi all,
I'm about to use ES for a large set of data and could not figure out
how to do a query I need.

Having this set of data:

curl -XDELETE localhost:9200/data/
curl -XPUT localhost:9200/data
curl -XPUT localhost:9200/data/values/1 -d
'{"name":"1","ts":"2011-06-01","count":10}'
curl -XPUT localhost:9200/data/values/2 -d
'{"name":"1","ts":"2011-06-02","count":20}'
curl -XPUT localhost:9200/data/values/3 -d
'{"name":"2","ts":"2011-06-02","count":25}'
curl -XPUT localhost:9200/data/values/4 -d
'{"name":"2","ts":"2011-06-04","count":15}'

For every name I want to have the latest document (newest timestamp).

In this case the result should be:
'{"name":"1","ts":"2011-06-02","count":20}'
'{"name":"2","ts":"2011-06-04","count":15}'

I can't figure out how to express this as an elasticsearch query?

I would also note that this query will be performed on a large dataset
with much more than 1G documents.
There will be about 50M different names.

To simplify this it's maybe needed to have the "latest" document
stored under a different index or type to be able to run querys only
on "latest" documents.
In this case I need to duplicate all "latest" documents but have
unique names.

Is this an option, also for performance?

Jürgen


(Shay Banon) #2

What are are asking for, if I understand correctly, is grouping basically on the name, and its not implemented. Even when implemented, its going to come with memory and performance costs. An index holding the latest docs is a good solution.

On Saturday, June 4, 2011 at 11:08 AM, jukart wrote:

Hi all,
I'm about to use ES for a large set of data and could not figure out
how to do a query I need.

Having this set of data:

curl -XDELETE localhost:9200/data/
curl -XPUT localhost:9200/data
curl -XPUT localhost:9200/data/values/1 -d
'{"name":"1","ts":"2011-06-01","count":10}'
curl -XPUT localhost:9200/data/values/2 -d
'{"name":"1","ts":"2011-06-02","count":20}'
curl -XPUT localhost:9200/data/values/3 -d
'{"name":"2","ts":"2011-06-02","count":25}'
curl -XPUT localhost:9200/data/values/4 -d
'{"name":"2","ts":"2011-06-04","count":15}'

For every name I want to have the latest document (newest timestamp).

In this case the result should be:
'{"name":"1","ts":"2011-06-02","count":20}'
'{"name":"2","ts":"2011-06-04","count":15}'

I can't figure out how to express this as an elasticsearch query?

I would also note that this query will be performed on a large dataset
with much more than 1G documents.
There will be about 50M different names.

To simplify this it's maybe needed to have the "latest" document
stored under a different index or type to be able to run querys only
on "latest" documents.
In this case I need to duplicate all "latest" documents but have
unique names.

Is this an option, also for performance?

Jürgen


(Jürgen kartnaller) #3

On Sat, Jun 4, 2011 at 11:04 AM, Shay Banon shay.banon@elasticsearch.comwrote:

What are are asking for, if I understand correctly, is grouping basically
on the name, and its not implemented. Even when implemented, its going to
come with memory and performance costs. An index holding the latest docs is
a good solution.

Thanks for the answer, thats what I thought, will use a separate index :slight_smile:

On Saturday, June 4, 2011 at 11:08 AM, jukart wrote:

Hi all,
I'm about to use ES for a large set of data and could not figure out
how to do a query I need.

Having this set of data:

curl -XDELETE localhost:9200/data/
curl -XPUT localhost:9200/data
curl -XPUT localhost:9200/data/values/1 -d
'{"name":"1","ts":"2011-06-01","count":10}'
curl -XPUT localhost:9200/data/values/2 -d
'{"name":"1","ts":"2011-06-02","count":20}'
curl -XPUT localhost:9200/data/values/3 -d
'{"name":"2","ts":"2011-06-02","count":25}'
curl -XPUT localhost:9200/data/values/4 -d
'{"name":"2","ts":"2011-06-04","count":15}'

For every name I want to have the latest document (newest timestamp).

In this case the result should be:
'{"name":"1","ts":"2011-06-02","count":20}'
'{"name":"2","ts":"2011-06-04","count":15}'

I can't figure out how to express this as an elasticsearch query?

I would also note that this query will be performed on a large dataset
with much more than 1G documents.
There will be about 50M different names.

To simplify this it's maybe needed to have the "latest" document
stored under a different index or type to be able to run querys only
on "latest" documents.
In this case I need to duplicate all "latest" documents but have
unique names.

Is this an option, also for performance?

Jürgen

--
http://www.sfgdornbirn.at
http://www.mcb-bregenz.at


#4

Hi Shay,

I see that your reply was more than 4 years ago, has anything been changed since then with ES?

Is there any way to implement "point in time" query? So say though, my data is continuously updated, I need to quey & want results back as it was say 3 hours or 6 hours back. Is there any way to implement it?

Thanks,
P.


(system) #5