Is there a processor that will do shingles or can I make a custom one somehow?
In the pipeline processor below, I split on the space character, but I'd also like to combine words like a shingle analyzer would:
PUT _ingest/pipeline/split
{
"processors": [
{
"split": {
"field": "title",
"target_field": "title_suggest.input",
"separator": "\\s+"
}
}
]
}
Example:
"Senior Business Developer" needs a suggestion field with these terms.
Senior Business Developer
Business Developer
Developer
Any ideas are appreciated, thanks!
Well I just created a script to do it. It's very basic but here it is:
PUT _ingest/pipeline/script
{
"processors": [
{
"script": {
"lang": "painless",
"source": """
if (!ctx.containsKey('title')) { return; }
def title_words = ctx['title'].splitOnToken(' ');
def title_suggest = [];
for (def i = 0; i < title_words.length; i++) {
def shingle = title_words[i];
title_suggest.add(shingle);
for (def j = i + 1; j < title_words.length; j++) {
shingle = shingle + ' ' + title_words[j];
title_suggest.add(shingle);
}
}
ctx['title_suggest']=title_suggest;
"""
}
}
]
}
Usage:
PUT /item
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"suggest_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text"
},
"title_suggest": {
"type": "completion"
}
}
}
}
PUT /item/_doc/1?pipeline=script
{
"title": "Diabetes Mellitus Type 1"
}
Result:
GET /item/_doc/1
{
"_index" : "item",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"_seq_no" : 24,
"_primary_term" : 1,
"found" : true,
"_source" : {
"title_suggest" : [
"Diabetes",
"Diabetes Mellitus",
"Diabetes Mellitus Type",
"Diabetes Mellitus Type 1",
"Mellitus",
"Mellitus Type",
"Mellitus Type 1",
"Type",
"Type 1",
"1"
],
"title" : "Diabetes Mellitus Type 1"
}
}
Note: It sucks that I can't just use the built-in shingle analyzer to break up the text into shingles and then insert that into another field.
system
(system)
Closed
April 2, 2021, 7:40pm
3
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.