Sorry for the Korean values that could confuse you. But it wouldn't be that hard to understand when you are reading this.
I am using the index named sliced_data and it his milions of documents in it.
I am using Kibana and have set Mecab_Ko(Korean tokenizer, analyzer) as analyzer.
Analyzer is working fine so when I run the command below
POST /sliced_data/_analyze
{
"analyzer": "korean",
"text": "꽃을든남자"
}
These are the results
{
"tokens": [
{
"token": "꽃을",
"start_offset": 0,
"end_offset": 2,
"type": "EOJEOL",
"position": 0
},
{
"token": "꽃",
"start_offset": 0,
"end_offset": 1,
"type": "NNG",
"position": 0
},
{
"token": "든",
"start_offset": 2,
"end_offset": 3,
"type": "INFLECT",
"position": 1
},
{
"token": "들/VV",
"start_offset": 2,
"end_offset": 3,
"type": "VV",
"position": 1
},
{
"token": "남자",
"start_offset": 3,
"end_offset": 5,
"type": "NNG",
"position": 2
}
]
}
I want to collect the tokens that have "NNG" as the value of "type".
So this means that I have to _analyze milions of texts to get the result i want.
It would take a long time to run the query million times.
Is there anyway that elasticsearch provides to _analyze multiple documents?
I found a way to analyze an array of text like below, but it would be hard to paste all the texts since I have millions of data.
POST /sliced_data/_analyze
{
"analyzer": "korean",
"text": ["꽃을 든 남자", "초보개발자", "blashhs", "blahblah", "blahblahblahblah"]
}
Is there a good solution??
Thank you.