Site or App Search? Indexing issues

Trying to enable site-wide search across a database of Shakespeare films; each film has slightly different timings for these lines, and the interactive transcripts are stored along with their corresponding video as JSONs in an S3 bucket.

Lines in a script are stored as anchor tags for users to jump to a particular line and share it (eg line 606 of Macbeth has an anchor link of https://scriptspeare.co.uk/Tragedy/Macbeth/#606=line when clicked on). Search bar should display each line as a result (for example, typing in ‘to be’ should bring up ‘to be or not to be, that is the question’ (#1611=line).

Currently stuck on a few bugs in Elastic's Site Search trial (no lines show up! photo attached), but thinking this might not be the correct format since the queries are to individual word indices, rather than the IDs of lines within a page. Highly likely that I'm missing something obvious as well; Shakespeare fans, help appreciated!

https://scriptspeare.co.uk/Tragedy/Macbeth/0/

I believe that the engine is using an english analyzer by default.

GET _analyze
{
  "text": ["to be or not to be"],
  "analyzer": "english"
}

This is producing;

{
  "tokens" : [ ]
}

While a standard analyzer will generate:

GET _analyze
{
  "text": ["to be or not to be"],
  "analyzer": "standard"
}
{
  "tokens" : [
    {
      "token" : "to",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "be",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "or",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "not",
      "start_offset" : 9,
      "end_offset" : 12,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "to",
      "start_offset" : 13,
      "end_offset" : 15,
      "type" : "<ALPHANUM>",
      "position" : 4
    },
    {
      "token" : "be",
      "start_offset" : 16,
      "end_offset" : 18,
      "type" : "<ALPHANUM>",
      "position" : 5
    }
  ]
}

I'm not yet super aware about all the settings you can have in site search. I'm moving your question to #site-search to get more insights.

1 Like

Much appreciated; More is thy due than more than all can pay.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.