How can I split the value of field to “xxxxx” and then aggregate for this "xxxxx"

Here is a demo :
message: send alarm success, code:195, content:{"code":195,"ip":"172.16.92.22","desc":"[dmp-custom-api] reqId: [2073d1ef3971ef2e] 查询async-hbase异常 table: fraud:general_feature_m_v2, rowKey: 861189030090467, cost: 3001 msg: Timed out after 3000ms when joining Deferred@718887266(state=PENDING, result=null, callback=type get response -> cn.jiguang.data.common.HbaseHelper$$Lambda$35/479850351@43b2d9e6 -> passthrough -> wakeup thread http-nio-11003-exec-5, errback=passthrough -> passthrough ->

How can I split the value to "查询async-hbase异常 table: fraud:general_feature_m_v2" ,and aggregate for this . I need to get the number of that.

Hi,

Your need is to count how much documents have the string 查询async-hbase异常 table: fraud:general_feature_m_v2, inside?
If so can you consider indexing your data using a whitespace tokenizer with multi fields.

more details in the doc:
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/analysis-whitespace-tokenizer.html
and about search
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/analyzer.html
and multi fields
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/multi-fields.html

so you will have this 3 tokens:
1- 查询async-hbase异常
2- table:
3- fraud:general_feature_m_v2,

Then you can make a default count with a query string.

Maybe this way is more simple for that. :grinning:

thanks for your help!!!!!
I will try that ! :grin:

You other option would be to use the ingest API and split the message apart.

1 Like

Hi,
As @warkolm said you can use ingest API or logstash.

If you give more information about the structure of your document as I understand in your first question you have a "message" field that have a string separated with "," and containing something like a json dict but your search is about a substring from value contained in the message field.

Is it correct?

Also need to consider depends on the srtucture of your document that you will not be able to parse if it's not enough normalized. You will have too much fields.
You can read about here:

After reding the doc in your suggestion.
I still have a little doubt .so let describe in more details
My messge field represent my log information in my application like LOGGER.info(message) and also have been index into my elasticSearch (changing the index mode is not the first choose :joy:), so it dont have a nomalized structure.
Now I want to know if I can aggregate message field only contain “查询async-hbase异常 table”.
My expect result will be like this:
[{"key":"查询async-hbase异常 table:xx",
"docCount":55},
{"key":"查询async-hbase异常 table:yy",
"docCount":99}]

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.