How to index an interview

Hi all,

I would like to index interviews into Elasticsearch.

The interview is made up of paragraphs, a paragraph can be said by the interviewer or the interviewee. My problem is that I am only interested in what the interviewee says, I don't want to index the questions of the interviewer.

As far as I understand, I can achieve this if I add "index":"no" to a given property when creating a mapping, this field won't be indexed and searchable. However, when run a search query, I need the whole interview as a result, so somehow I need to send the questions of the interviewer in the response.

Let me clarify it with an example:

This is an interview:

  • [interviewer]: How are you?
  • {interviewee}: I am find. And you?
  • [interviewer]: Fine, thanks.

I only want to make the sentence "I am find. And you?" searchable, because it is only paragraph said by the interviewee, but when I Elasticsearch returns a response, I want to get the whole interview as a response.

My question is, who should I go about achieving this? I am not asking for how to create the exact mapping (though any input there is also appreciated), but rather what properties should I have in mapping? How can I make part of the speech searchable, the other part not searchable, while search responses contain the whole interview?

For reference, here is the mapping I came up with so far:

PUT /interviews
{
  "mappings": {
    "_doc": {
      "properties": {
        "title": {"type": "text"},
        "lead": {"type": "text"},
        "body": {
          "type": "text",
          "analyzer": "hungarian",
          "index_phrases": true
        },
        "interviewerBody" {
          "type": "text",
          "index":"no"
        }
      }
    }
  }
}

The problem with this mapping is that I put the questions and the answers into separate properties, so I do not know in which order the questions by the interviewer and the answers by the interviewee came.

Here is an example interview for reference. The p elements with a direct em child element are said by the interviewer, the p elements without any more children are said by interviewee.

You probably want to look at something like parent/child. Where the parent is the interview "event", and then the children are the questions and answers. Then you can control things with a lot more ease.

After a bit of research, I found that I needed to use nested type. This is intended to be used when indexing arrays of objects.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.