Hi all,
I would like to index interviews into Elasticsearch.
The interview is made up of paragraphs, a paragraph can be said by the interviewer or the interviewee. My problem is that I am only interested in what the interviewee says, I don't want to index the questions of the interviewer.
As far as I understand, I can achieve this if I add "index":"no"
to a given property when creating a mapping, this field won't be indexed and searchable. However, when run a search query, I need the whole interview as a result, so somehow I need to send the questions of the interviewer in the response.
Let me clarify it with an example:
This is an interview:
- [interviewer]: How are you?
- {interviewee}: I am find. And you?
- [interviewer]: Fine, thanks.
I only want to make the sentence "I am find. And you?" searchable, because it is only paragraph said by the interviewee, but when I Elasticsearch returns a response, I want to get the whole interview as a response.
My question is, who should I go about achieving this? I am not asking for how to create the exact mapping (though any input there is also appreciated), but rather what properties should I have in mapping? How can I make part of the speech searchable, the other part not searchable, while search responses contain the whole interview?
For reference, here is the mapping I came up with so far:
PUT /interviews
{
"mappings": {
"_doc": {
"properties": {
"title": {"type": "text"},
"lead": {"type": "text"},
"body": {
"type": "text",
"analyzer": "hungarian",
"index_phrases": true
},
"interviewerBody" {
"type": "text",
"index":"no"
}
}
}
}
}
The problem with this mapping is that I put the questions and the answers into separate properties, so I do not know in which order the questions by the interviewer and the answers by the interviewee came.
Here is an example interview for reference. The p elements with a direct em child element are said by the interviewer, the p elements without any more children are said by interviewee.