POST analyzed, Index

Hi,
Sorry for this basic question, but I can't find any sense to it.

I have an analyzer and wish it to analyze the data I index.
But when I do this:

POST my_index/_analyze
{
  "analyzer": "my_analyzer",
  "text": "This is a test sentence"
}
GET my_index/_search?q=*

I got this:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
Why does my query not return anything?

_analyze just analyze the text and does not index anything.

Merci, @dadoonet
Mais comment puis-je indexer quelque chose en l'analysant au préalable?
Je ne vois pas l'intĂ©rĂȘt de la fonction analyze si le rĂ©sultat n'est pas stocker.

Thanks, but how could I index something while analyze it beforehand?
I don't see the point of the analyze function if the result goes to waste.

I mean when I do this I can index but the analyzer doesn't work:

POST my_index/_analyze
{
  "analyzer": "my_analyzer",
  "field" : "this_field",
  "text": "This is a test sentence"
}

To test how your analyzer works before actually indexing stuff and potentially messing up your index?

@val Thanks I didn't think of that since I just test whatever I want on an index I invent beforehand.

But do you have an example of a POST query that analyze and index something?
It would be very helpfull.

If you want to index your data you can simply POST/PUT it into your index, provided that this_field is properly mapped with your analyzer

So first create your index with the mapping:

PUT my-index
{
    "mappings": {
        "doc": {
            "properties": {
               "this_field": {
                   "type": "text",
                   "analyzer": "my_analyzer"
               }
            }
        }
    }
}

And then index your data

PUT my-index/doc/1
{
    "this_field": "This is a test sentence"
}
2 Likes

Thanks again @val , but I don't seem to make it work.
Could I show it to you?
I define a token filter to clean a phone number, and put a keyword tokenizer to let the whole entry in a single term.

    PUT phone
    {
      "settings": {
        "analysis": {
           "char_filter": {
            "my_char_filter": {
              "type": "mapping",
              "mappings": [
                    "( => ",
                    ") => ",
                    ", => ",
                    ". => ",
                    "; => ",
                    "\\u0020 => ",
                    "+ => 00"
              ]
            }
          },
          "analyzer": {
            "one": {
              "tokenizer": "keyword",
              "char_filter": [
                "my_char_filter"]
            }
          }
        }
      },
      "mapping": {
        "_doc": {
          "properties": {
            "phone_number": {
              "type": "text",
              "analyzer": "one"
            }
          }
        }
      }
    }

PUT /phone/doc/1
{         
  "phone_number": "077 , 1.436;25 "
}

With this, the number is indexed but no filter have been made on it.

The goal would be to index "077143625" instead of "077 , 1.436;25 ".

PS : \u0020 is for space

There's a much simpler way to do it using a pattern_replace character filter and removing all non-digit characters. Try it out

            "digit_only": {
                "type": "pattern_replace",
                "pattern": "\\D+",
                "replacement": ""
            },

Also note that the value you have in your source will never be changed, i.e. the source will still contain 077 , 1.436;25 even though 077143625 is indexed.

Thanks @val, but I prefer my filter since it allow to the exception of this rule: "+ => 00".
But still, when I run this:

PUT /phone/doc/1
{         
  "phone_number": "0777 , 1.436;25 "
}

I manage to index something unanalyzed even thought the field is mapped with the correct analyzer.
And when I run this:

POST /phone/_analyze
{         
  "analyzer":"one",
  "text": "0777 , 1.436;25 "
}

It return an analyzed answer but it's not indexed.

I'm sorry to continue to bother you.

Fair enough

At this point, please show your index settings and mappings, i.e. what you get when running:

GET phone
1 Like
{
  "phone": {
    "aliases": {},
    "mappings": {
      "doc": {
        "properties": {
          "phone_number": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    },
    "settings": {
      "index": {
        "number_of_shards": "5",
        "provided_name": "phone",
        "creation_date": "***",
        "analysis": {
          "analyzer": {
            "one": {
              "char_filter": [
                "my_char_filter"
              ],
              "tokenizer": "two"
            }
          },
          "char_filter": {
            "my_char_filter": {
              "type": "mapping",
              "mappings": [
                "( => ",
                ") => ",
                ", => ",
                ". => ",
                "; => ",
                "\\u0020 => ",
                "+ => 00"
              ]
            }
          },
          "tokenizer": {
            "two": {
              "type": "keyword",
              "min_gram": "3",
              "max_gram": "4"
            }
          }
        },
        "number_of_replicas": "1",
        "uuid": "c-XyKl__Q9GTF2bkNqdgbQ",
        "version": {
          "created": "6020299"
        }
      }
    }
  }
}

Here the whole thing, I just added the ngram tokenizer, but that's out of the subject.

The phone_number field doesn't have any analyzer set, which could explain the problem.

Change your mapping to this instead:

      "phone_number": {
        "type": "text",
        "analyzer": "one",                   <--- add this line
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }

Please ignore the precedent message, I messed up because I was putting in the ngram tokenizer and then going back to keyword to keep it simple for you to read it. But I just put a mix of both in the end, which doesn't make sense.
Here the real thing:

{
  "phone": {
    "aliases": {},
    "mappings": {},
    "settings": {
      "index": {
        "number_of_shards": "5",
        "provided_name": "phone",
        "creation_date": "***",
        "analysis": {
          "analyzer": {
            "one": {
              "char_filter": [
                "my_char_filter"
              ],
              "tokenizer": "two"
            }
          },
          "char_filter": {
            "my_char_filter": {
              "type": "mapping",
              "mappings": [
                "( => ",
                ") => ",
                ", => ",
                ". => ",
                "; => ",
                "\\u0020 => ",
                "+ => 00"
              ]
            }
          },
          "tokenizer": {
            "two": {
              "type": "ngram",
              "min_gram": "3",
              "max_gram": "4"
            }
          }
        },
        "number_of_replicas": "1",
        "uuid": "a6xJMLUOQAeBbI7wFKQYcQ",
        "version": {
          "created": "6020299"
        }
      }
    }
  }
}

I notice that the mapping is empty, and if i PUT this query:

PUT /phone/doc/1
{         
  "phone_number": "0777 , 1.436;25 "
}

then the GET phone returned a mapping that was added dynamically.

{
  "phone": {
    "aliases": {},
    "mappings": {
      "doc": {
        "properties": {
          "phone_number": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    },
    "settings": {
      "index": {
        "number_of_shards": "5",
        "provided_name": "phone",
        "creation_date": "***",
        "analysis": {
          "analyzer": {
            "one": {
              "char_filter": [
                "my_char_filter"
              ],
              "tokenizer": "two"
            }
          },
          "char_filter": {
            "my_char_filter": {
              "type": "mapping",
              "mappings": [
                "( => ",
                ") => ",
                ", => ",
                ". => ",
                "; => ",
                "\\u0020 => ",
                "+ => 00"
              ]
            }
          },
          "tokenizer": {
            "two": {
              "type": "ngram",
              "min_gram": "3",
              "max_gram": "4"
            }
          }
        },
        "number_of_replicas": "1",
        "uuid": "a6xJMLUOQAeBbI7wFKQYcQ",
        "version": {
          "created": "6020299"
        }
      }
    }
  }
}

Thanks again for all of your answers.

There's no mappings in the first code snippet and in the third one there's still no analyzer set on the phone_number field. You need to set the mapping yourself, if you let ES create it for you, there's no way it would know to apply the analyzer to your field.

Yeah, I saw that there was no mapping recognized when i GET phone but one was hardcoded nervertheless. I didn't let ES do the mapping for me, I just saw that it ignore my mapping and put one by itself.

Anyway, I broke my index in two part, one where I put the settings filled with the char_filter and the tokenizer composing the analyzer, and one with the mapping (in this order, it doesn't work the other way since I call an analyzer that's not declared yet, in the mapping).

And the GET phone is good this time. It show my mapping. I don't know why it didn't appear earlier.
But still; my data won't get analyzed.
Here the split index :

PUT phone
{  "settings": {
    "analysis": {
       "char_filter": {
        "my_char_filter": {
          "type": "mapping",
          "mappings": [
                "( => ",
                ") => ",
                ", => ",
                ". => ",
                "; => ",
                "\\u0020 => ",
                "+ => 00"
          ]
        }
      },
       "tokenizer": {
        "two": {
          "type": "ngram",
          "min_gram": 3,
          "max_gram": 4
        }
      },
      "analyzer": {
        "one": {
          "tokenizer": "two",
          "char_filter": [
            "my_char_filter"]
        }
      }
    }
  }}
PUT phone/_mapping/_doc
{
      "properties": {
        "phone_number": {
          "type": "text",
          "analyzer": "one"},
        "favoris": {
          "type":"boolean"}}
}

and the answer for the GET phone command:

{
  "phone": {
    "aliases": {},
    "mappings": {
      "_doc": {
        "properties": {
          "favoris": {
            "type": "boolean"
          },
          "phone_number": {
            "type": "text",
            "analyzer": "one"
          }
        }
      }
    },
    "settings": {
      "index": {
        "number_of_shards": "5",
        "provided_name": "phone",
        "creation_date": "1522240891958",
        "analysis": {
          "analyzer": {
            "one": {
              "char_filter": [
                "my_char_filter"
              ],
              "tokenizer": "two"
            }
          },
          "char_filter": {
            "my_char_filter": {
              "type": "mapping",
              "mappings": [
                "( => ",
                ") => ",
                ", => ",
                ". => ",
                "; => ",
                "\\u0020 => ",
                "+ => 00"
              ]
            }
          },
          "tokenizer": {
            "two": {
              "type": "ngram",
              "min_gram": "3",
              "max_gram": "4"
            }
          }
        },
        "number_of_replicas": "1",
        "uuid": "tEKKrMe7Qu2TKXlPzzCm-Q",
        "version": {
          "created": "6020299"
        }
      }
    }
  }
}

I don't think I'm far from it, I just probably missed a part where I should better declare things.

After modifying the mapping to include the analyzer, you need to reindex your document in order to analyze it again using the new analyzer.

1 Like

I think it work! :grinning:

So for the posterity, splitting the mapping and the settings in two differents index made it work.
Wrong:

PUT phone
{
  "mapping": {
    "_doc": {
      "properties": {
        "phone_number": {
          "type": "text",
          "analyzer": "one"},
        "favoris": {
          "type":"boolean"}
        }
      }
    },
  "settings": {
    "analysis": {
       "char_filter": {
        "my_char_filter": {
          "type": "mapping",
          "mappings": [
                "( => ",
                ") => ",
                ", => ",
                ". => ",
                "; => ",
                "\\u0020 => ",
                "+ => 00"
          ]
        }
      },
       "tokenizer": {
        "two": {
          "type": "ngram",
          "min_gram": 3,
          "max_gram": 4
        }
      },
      "analyzer": {
        "one": {
          "tokenizer": "two",
          "char_filter": [
            "my_char_filter"]
        }
      }
    }
  }
}

Good:

PUT phone
{
    "settings": {
    "analysis": {
       "char_filter": {
        "my_char_filter": {
          "type": "mapping",
          "mappings": [
                "( => ",
                ") => ",
                ", => ",
                ". => ",
                "; => ",
                "\\u0020 => ",
                "+ => 00"
          ]
        }
      },
       "tokenizer": {
        "two": {
          "type": "ngram",
          "min_gram": 3,
          "max_gram": 4
        }
      },
      "analyzer": {
        "one": {
          "tokenizer": "two",
          "char_filter": [
            "my_char_filter"]
        }
      }
    }
  }
}
PUT phone/_mapping/_doc
{
      "properties": {
        "phone_number": {
          "type": "text",
          "analyzer": "one"},
        "favoris": {
          "type":"boolean"}
        }
}

So when i

PUT /phone/_doc/1
{         
  "phone_number": "0777 , 1.436;25 "
}

And then :

GET phone/_search?q=07771

It returned the original number "0777 , 1.436;25 ".

Thanks @val for the help and help me to better declare the mapping!

Awesome, glad it worked. You can make it work in one single call, though, you just must not index a document before installing the mapping. You had a typo mapping should read mappings:

This would work:

PUT phone
{
  "mappings": {                     <--- there was a typo here in your last post
    "_doc": {
      "properties": {
        "phone_number": {
          "type": "text",
          "analyzer": "one"
        },
        "favoris": {
          "type": "boolean"
        }
      }
    }
  },
  "settings": {
    "analysis": {
      "char_filter": {
        "my_char_filter": {
          "type": "mapping",
          "mappings": [
            "( => ",
            ") => ",
            ", => ",
            ". => ",
            "; => ",
            "\\u0020 => ",
            "+ => 00"
          ]
        }
      },
      "tokenizer": {
        "two": {
          "type": "ngram",
          "min_gram": 3,
          "max_gram": 4
        }
      },
      "analyzer": {
        "one": {
          "tokenizer": "two",
          "char_filter": [
            "my_char_filter"
          ]
        }
      }
    }
  }
}

Then only index your document

PUT /phone/_doc/1
{         
  "phone_number": "0777 , 1.436;25 "
}
1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.