How would you design the store model in Elasticsearch for user behavior data


(潘飞) #1

user behavior data like this(transformed to JSON):

{"uid":"user001", "action":"click", "context":
{"level":21,"ip":"222.222.222.222", "val":87}}
{"uid":"user002", "action":"click", "context":
{"level":28,"ip":"222.222.222.221","val":96}} #1
{"uid":"user002", "action":"buy", "context":
{"level":28,"ip":"222.222.222.221","val":"abc"}} #2
...

  1. here val is in a numeric format
  2. here val is provided as a string

as for dynamic mapping, if the "val" field will be mapped to long type. and
then if another user action also with a val context, but not in numeric
format will case exception:

error" : "RemoteTransportException[[Spike][inet[/192.168.2.246:9300]][index]];
nested: MapperParsingException[failed to parse [context.val]]; nested:
NumberFormatException[For input string: "abc"]; ",

we can avoid this by setting "index" to "not_analyzed", then everything
will be treated as string and we can put anything to this field.

but, in this case, when we want to do some analytic on the "click" action,
for example to calculate the average val of all the user. it need to
convert every "val" from string to long to meet this purpose.it's really a
slow process to do type conversion by using script on millions of data. In
our case, it take about 90s to get the result on about 100 million records.

so , is there a better way to optimize this ? thanks very much in advance!

PS:
Elasticsearch 1.2.1
9 nodes, each with 8 CPU cores and 48GB RAM (ES_HEAP_SIZE=16GB)
10 indexes, each with 5 shards and 1 replication
total docs now: 130718318
total docs in size: 65GB

--
不学习,不知道

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BJstLBoekO3oCOLxdc5hUGdThtzs6Uwc8mCCqxZoinhsB3hog%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(潘飞) #2

is there any way to convert data type without using the script mechinism ?

2014-07-23 17:22 GMT+08:00 panfei cnweike@gmail.com:

user behavior data like this(transformed to JSON):

{"uid":"user001", "action":"click", "context":
{"level":21,"ip":"222.222.222.222", "val":87}}
{"uid":"user002", "action":"click", "context":
{"level":28,"ip":"222.222.222.221","val":96}} #1
{"uid":"user002", "action":"buy", "context":
{"level":28,"ip":"222.222.222.221","val":"abc"}} #2
...

  1. here val is in a numeric format
  2. here val is provided as a string

as for dynamic mapping, if the "val" field will be mapped to long type.
and then if another user action also with a val context, but not in numeric
format will case exception:

error" : "RemoteTransportException[[Spike][inet[/192.168.2.246:9300]][index]];
nested: MapperParsingException[failed to parse [context.val]]; nested:
NumberFormatException[For input string: "abc"]; ",

we can avoid this by setting "index" to "not_analyzed", then everything
will be treated as string and we can put anything to this field.

but, in this case, when we want to do some analytic on the "click" action,
for example to calculate the average val of all the user. it need to
convert every "val" from string to long to meet this purpose.it's really
a slow process to do type conversion by using script on millions of data.
In our case, it take about 90s to get the result on about 100 million
records.

so , is there a better way to optimize this ? thanks very much in advance!

PS:
Elasticsearch 1.2.1
9 nodes, each with 8 CPU cores and 48GB RAM (ES_HEAP_SIZE=16GB)
10 indexes, each with 5 shards and 1 replication
total docs now: 130718318
total docs in size: 65GB

--
不学习,不知道

--
不学习,不知道

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BJstLDESB3CGZzR4RBX3qYVD231dXTw8h1BMH2Tm%3Dmrycyt8w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(潘飞) #3

{
"size": 0,
"query": {
"filtered": {
"filter": {
"regexp": {
"who": "[0-9]+"
}
}
}
},
"aggs": {
"max_who": {
"max": {
"script": "Double.parseDouble(_source.who)"
}
}
}
}

This is very slow a query ...

2014-07-23 17:22 GMT+08:00 panfei cnweike@gmail.com:

user behavior data like this(transformed to JSON):

{"uid":"user001", "action":"click", "context":
{"level":21,"ip":"222.222.222.222", "val":87}}
{"uid":"user002", "action":"click", "context":
{"level":28,"ip":"222.222.222.221","val":96}} #1
{"uid":"user002", "action":"buy", "context":
{"level":28,"ip":"222.222.222.221","val":"abc"}} #2
...

  1. here val is in a numeric format
  2. here val is provided as a string

as for dynamic mapping, if the "val" field will be mapped to long type.
and then if another user action also with a val context, but not in numeric
format will case exception:

error" : "RemoteTransportException[[Spike][inet[/192.168.2.246:9300]][index]];
nested: MapperParsingException[failed to parse [context.val]]; nested:
NumberFormatException[For input string: "abc"]; ",

we can avoid this by setting "index" to "not_analyzed", then everything
will be treated as string and we can put anything to this field.

but, in this case, when we want to do some analytic on the "click" action,
for example to calculate the average val of all the user. it need to
convert every "val" from string to long to meet this purpose.it's really
a slow process to do type conversion by using script on millions of data.
In our case, it take about 90s to get the result on about 100 million
records.

so , is there a better way to optimize this ? thanks very much in advance!

PS:
Elasticsearch 1.2.1
9 nodes, each with 8 CPU cores and 48GB RAM (ES_HEAP_SIZE=16GB)
10 indexes, each with 5 shards and 1 replication
total docs now: 130718318
total docs in size: 65GB

--
不学习,不知道

--
不学习,不知道

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BJstLCNjiFbgCQ%3DJfkxVeaZgiJVwsXuzqmtFHdGLAaqu7iCCg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4