URLのコンテキスト毎に集計するクエリを知りたいです

nidcode · February 12, 2019, 6:22am

現在、Elasticsearchにウェブサーバーのアクセスログがあり、"path"というフィールドにURLのパスが保存されています。

一度のクエリでpathのコンテキスト毎にドキュメント数を集計可能かどうか、良い方法があれば知りたいです。

データ
/aaaa/index.html
/aaaa/api/v1/user
/bbbb/index.html
結果のイメージ
{
"aaaa": 2,
"bbbb": 1
}

tsgkdt · February 12, 2019, 11:28am

コンテキストごとに集計可能か、とのことですが２つぐらいの方法で可能かと思います。

データ投入時にコンテキストに相当する文字列を抽出して別フィールドに格納しておき、そこに対してAggregationを指定する
検索クエリ発行時に指定するScriptを使った結果に対してAggregationを実行する

できれば、データ投入時にpathを加工してIndexを作成した方が検索時のパフォーマンスは良いかと思います。

どちらもScriptを使いますが、以下の「部分文字列を返す」の箇所をご覧いただくと、イメージしやすいかとおもいます。

1. データ投入時にコンテキストを指定する例

Ingest NodeのScript Processorを使います。

Pipelineの作成

pathというフィールドの文字列から最初の/を除いて、次の/までの文字を取り出し、path_contextというフィールドに詰めます。

PUT _ingest/pipeline/pipe0212
{
  "processors": [
    {
      "script": {
        "source": """
            def path = ctx.path;
            if (path != null) {
              if (path.startsWith('/')) {
                path = path.substring(1, path.length()-1);
              }
              int slashIndex = path.indexOf('/');
              if (slashIndex > 0) {
                ctx.path_context = path.substring(0, slashIndex);
              }
            }
"""
      }
    }
  ]
}

データ投入例

先に作成したPipelineを指定してデータ投入。

PUT forum0212/_doc/1?pipeline=pipe0212
{
  "title": "pathで最初の方の文字がほしい",
  "path": "/aaaa/index.html"
}
PUT forum0212/_doc/2?pipeline=pipe0212
{
  "title": "pathで最初の方の文字がほしいです",
  "path": "/bbbb/index.html"
}

結果

GET forum0212/_doc/1

ここで、path_contextフィールドでaaaaが抜き出されていることが確認できるので、このフィールドに対してAggregationを指定すれば期待する結果が得られると思います。

{
  "_index" : "forum0212",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 3,
  "_seq_no" : 2,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "path" : "/aaaa/index.html",
    "title" : "pathで最初の方の文字がほしい",
    "path_context" : "aaaa"
  }
}

2. Scriptの結果に対してAggregationを指定する例

terms aggregationを使っているかと思いますが、中身をscriptにし、先のpipelineと同じような感じで、指定文字列を抜き出せるようにします。

GET forum0212/_search
{
  "query": {
    "match_all": {}
  },
  "size": 0,
  "aggs": {
    "contextaggs": {
      "terms": {
        "script": {
          "lang": "painless",
          "source": """
            def path = doc['path.keyword'].value;
            if (path != null) {
              if (path.startsWith('/')) {
                path = path.substring(1, path.length()-1);
              }
              int slashIndex = path.indexOf('/');
              if (slashIndex > 0) {
                path = path.substring(0, slashIndex);
                return path;
              }
            }
            return "";
"""
        },
        "size": 10
      }
    }
  }
}

結果

contextaggsのところで、期待するaaaa, bbbbが取得できていることが分かります。

{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "contextaggs" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "aaaa",
          "doc_count" : 1
        },
        {
          "key" : "bbbb",
          "doc_count" : 1
        }
      ]
    }
  }
}

nidcode · February 15, 2019, 6:22am

tsgkdtさん、ありがとうございます！

2.の方法で取得できました。
なお、pathが'/'の場合にうまくいかなかったので下記のように対応しました。
念のため共有いたします。

def path = doc['path.keyword'].value;
if (path != null) {
if (path.length() > 1 && path.startsWith('/')){
path = path.substring(1, path.length()-1);
}
int slashIndex = path.indexOf('/');
if (slashIndex > 0) {
path = path.substring(0, slashIndex);
return path;
}
}
return "";

system · March 15, 2019, 6:22am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearchにデータを保存する方法について日本語による質問・議論はこちら	5	3950	July 6, 2017
Kibanaでフィルタリングをかけた際の検索クエリの取得方法日本語による質問・議論はこちら	3	2133	September 27, 2019
Elasticsearchへアクセスする際のセキュリティ日本語による質問・議論はこちら	3	917	November 27, 2017
Kibanaのelasticsearchサーバ設定について日本語による質問・議論はこちら	4	1950	October 31, 2017
Elasticsearch内のデータの結合について日本語による質問・議論はこちら	8	7068	May 21, 2019

URLのコンテキスト毎に集計するクエリを知りたいです

1. データ投入時にコンテキストを指定する例

Pipelineの作成

データ投入例

結果

2. Scriptの結果に対してAggregationを指定する例

Related topics