How can I split the value of field to “xxxxx” and then aggregate for this "xxxxx"

jaxma · April 19, 2019, 5:46am

Here is a demo :
message: send alarm success, code:195, content:{"code":195,"ip":"172.16.92.22","desc":"[dmp-custom-api] reqId: [2073d1ef3971ef2e] 查询async-hbase异常 table: fraud:general_feature_m_v2, rowKey: 861189030090467, cost: 3001 msg: Timed out after 3000ms when joining Deferred@718887266(state=PENDING, result=null, callback=type get response -> cn.jiguang.data.common.HbaseHelper$$Lambda$35/479850351@43b2d9e6 -> passthrough -> wakeup thread http-nio-11003-exec-5, errback=passthrough -> passthrough ->

How can I split the value to "查询async-hbase异常 table: fraud:general_feature_m_v2" ,and aggregate for this . I need to get the number of that.

gabriel_tessier · April 20, 2019, 3:55am

Hi,

Your need is to count how much documents have the string 查询async-hbase异常 table: fraud:general_feature_m_v2, inside?
If so can you consider indexing your data using a whitespace tokenizer with multi fields.

more details in the doc:
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/analysis-whitespace-tokenizer.html
and about search
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/analyzer.html
and multi fields
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/multi-fields.html

so you will have this 3 tokens:
1- 查询async-hbase异常
2- table:
3- fraud:general_feature_m_v2,

Then you can make a default count with a query string.

Maybe this way is more simple for that.

jaxma · April 20, 2019, 3:22pm

thanks for your help!!!!!
I will try that !

warkolm · April 20, 2019, 9:37pm

You other option would be to use the ingest API and split the message apart.

gabriel_tessier · April 21, 2019, 2:31am

Hi,
As @warkolm said you can use ingest API or logstash.

If you give more information about the structure of your document as I understand in your first question you have a "message" field that have a string separated with "," and containing something like a json dict but your search is about a substring from value contained in the message field.

Is it correct?

Also need to consider depends on the srtucture of your document that you will not be able to parse if it's not enough normalized. You will have too much fields.
You can read about here:

jaxma · April 21, 2019, 3:13pm

After reding the doc in your suggestion.
I still have a little doubt .so let describe in more details
My messge field represent my log information in my application like LOGGER.info(message) and also have been index into my elasticSearch (changing the index mode is not the first choose ), so it dont have a nomalized structure.
Now I want to know if I can aggregate message field only contain “查询async-hbase异常 table”.
My expect result will be like this:
[{"key":"查询async-hbase异常 table:xx",
"docCount":55},
{"key"："查询async-hbase异常 table:yy",
"docCount":99}]

system · May 19, 2019, 3:13pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Aggrigation with a whole string as key Elasticsearch	2	685	July 6, 2017
How to get the consolidated aggregartion on specfic part of String Elasticsearch	5	324	November 5, 2020
Terms aggregation split by whitespace Elasticsearch	4	2909	April 7, 2019
Terms aggregation split by coma Elasticsearch aggregations	6	337	March 7, 2024
Elasticsearch Java api split a field Elasticsearch	2	758	July 5, 2017

How can I split the value of field to “xxxxx” and then aggregate for this "xxxxx"

Related topics