Modelling data and query

Hello there,

Im pretty new to the ES stack and im not quite sure how to model my problem, right now it seems like an endless amount of options for modelling the queries and data.

The case is quite simple: I have a bunch of document, plain text pieces. I want to find the phrase(e.g. sentence) within all documents that matches the query best. I also have some additional information keyword information on document level which I haven't considered completely yet.

In broad term: I would like to find the best phrases, where the context of the document is taken into account. But im not sure both how to model the data and how to create "optimal" queries(as this probably depend on the model structure).

As i see it, I could model the data as just the sentences separately. But that way i feel like there is some context signal that gets lost from the other phrases in the document and the document keywords.

Hope you guys have some input :slightly_smiling_face:

EDIT:

So far I have modelled it as phrase index with the document as a field and then used a multi_match query where I boost the phrase field.

Any inputs are still very welcome!

Best regards
Mathias

Hi Mathias,

As you mentioned in your post, indexing the text of the document as a field and using the multi_match (or match) query with the type: phrase sounds exactly like what you are looking for.

Another thing you could look into if you were interested are shingles, this would allow you to do cheaper phrase queries (assuming you don't need slop), but otherwise, you are on the right track.

Thank you for your reply.

Just knowing that im on the right track is quite helpful with the amount of possibilities available in ES.

Ill look into shingles :slight_smile:

Best regards
Mathias

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.