PHP client: Check input before indexing

I am trying to use the PHP client to check if the user's input already exists before indexing it. Here is the code I tried.

My idea was:

  • Check if the variables are set (I have four required fields: titel, thema, type and description)
  • If yes, search in the ES index if they already exist
  • If they exist show a message (Should I use do/while instead of íf/else?)
  • If the don't, index the input and show a message

The flow seems entirely reasonable. You may run into troubles with exact vs partial matches to the search request though.

E.g. if someone searches for "foo bar baz" in the description field, that will match any documents saying containing "foo", "bar", or "baz" since you are using a match query. You may have multiple matches to deal with.

Instead, you may want to use a term query, since that performs an exact match. But that also means that case sensitivity matters, as well as spacing and special characters. You could also try using phrase searches but it also has edge-cases.

Basically, search is a bit more nuanced than just a database table lookup, since you need to deal with partial matches.

Perhaps show the results to the user and ask if one of the matching docs was their input? If not they can add it to the index. That saves you a lot of trouble of trying to determine how well partial matches match the input.

1 Like

Thank you so much for your answer! I am still at the beginning of the whole process, so I will take notice of your hints and work with them. The problem right now is, that the search part doesn't notice if a document already exists.

E.g: I used the word "test" on title and description and type1/thema1 and added the document to the index. I got the message that the document has been added and its ID. When I used exactly the same input and pressed add again, I got the same response.

I made a few changes to the code as I am trying to figure it out.

EDIT: I managed to make it work! I will try to improve the search query as you suggested. Any further suggestions are much appreciated :slight_smile:

Ah, good to hear you got it working :slight_smile:

No particular advice, just play around with analyzers and get a feel for how they tokenize/transform the text. You may want to do a combination of exact match (term queries), phrase matching (match_phrase or phrase queries) and partial matching (match) to suit your needs.

You could also structure your document IDs so that they are deterministic, and use simple GETs instead of searches to fetch the documents by ID. May or may not be a possibility for your system.

1 Like

Will do! Thank you for your help :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.