Hashtag searches and Japanese full text search

hari-ram-s · November 16, 2023, 11:57am

We are trying to incorporate hashtag searches and Japanese full text searches in our data. We were able to achieve them separately but when we try to combine the two configs together, it doesn't work as expected.
I found a similar thread here - How can I correctly index @screen_name, #hashtag and url in Japanese text?. But no reply.
PS: Our data will be stored in multiple languages.

Below are the configs that we used.

Hashtag search:

{
    "settings": {
        "index": {
            "analysis": {
                "filter": {
                    "hashtag_filter": {
                        "type": "word_delimiter",
                        "type_table": [
                            "# => ALPHA"
                        ]
                    }
                },
                "analyzer": {
                    "hashtag_analyzer": {
                        "type": "custom",
                        "tokenizer": "whitespace",
                        "filter": [
                            "lowercase",
                            "hashtag_filter"
                        ]
                    }
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "caption": {
                "type": "text",
                "analyzer": "hashtag_analyzer"
            }
        }
    }
}

CJK full text search:

    "settings": {
        "index": {
            "analysis": {
                "analyzer": {
                    "kuromoji_analyzer": {
                        "char_filter": [
                            "icu_normalizer"
                        ],
                        "tokenizer": "kuromoji_tokenizer",
                        "filter": [
                            "kuromoji_baseform",
                            "kuromoji_part_of_speech",
                            "cjk_width",
                            "ja_stop",
                            "kuromoji_stemmer",
                            "lowercase"
                        ]
                    }
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "caption": {
                "type": "text",
                "analyzer": "kuromoji_analyzer"
            }
        }
    }
}

The two configs combined together, which doesn't seem to be working:

{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "hashtag_filter": {
            "type": "word_delimiter",
            "type_table": [
              "# => ALPHA"
            ]
          }
        },
        "analyzer": {
          "kuromoji_hashtag_analyzer": {
            "char_filter": [
              "icu_normalizer"
            ],
            "type": "custom",
            "tokenizer": "kuromoji_tokenizer",
            "filter": [
              "kuromoji_baseform",
              "kuromoji_part_of_speech",
              "cjk_width",
              "ja_stop",
              "kuromoji_stemmer",
              "lowercase",
              "hashtag_filter"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "caption": {
        "type": "text",
        "analyzer": "kuromoji_hashtag_analyzer"
      }
    }
  }
}

hari-ram-s · November 21, 2023, 6:11am

Any help here?

system · December 19, 2023, 6:12am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How can I correctly index @screen_name, #hashtag and url in Japanese text? Elasticsearch	1	824	October 8, 2018
Dumb question- using the cjk analyzer Elasticsearch	3	620	July 6, 2017
Kuromoji tokenizers とURLリンク分解日本語による質問・議論はこちら	2	758	November 19, 2018
Combo analyzer - Issue with English and Japanese text being stored in same fields Elasticsearch	5	1706	July 6, 2017
Kuromoji analyzer filters out text in Arabic Elasticsearch	1	165	October 26, 2021

Hashtag searches and Japanese full text search

Related topics