Document not being returned by query


(Brian The Coder) #1

So I have a document
{ "_index": "pod_spree", "_type": "product", "_id": "2809", "_version": 1, "found": true, "_source": { "sku": "1PGC2690", "name": "Fancy Scroll Frame", "description": "Invite family and friends to celebrate your big day with the Fancy Scroll Frame wedding invitation from Hallmark.\r\n|Edit the message in the card, or write your own.\r\n|Add a message and photo to the back.\r\n|Change the font style, color and size in many message areas.\r\n|Layout options are available on the back. \r\n|5\" x 7\"\r\n|An envelope is included with every card.|Learn more about our ", "price": "1.99", "inside_message": "", "available_on": "2015-11-01T07:00:00.000Z", "number_of_photos_no_photos_taxon": "No Photos", "number_of_photos_taxon": "Number of photos", "occasion_wedding_taxon": "Wedding", "occasion_taxon": "Occasion", "recipient_for_anyone_taxon": "For Anyone", "recipient_taxon": "Recipient", "format_and_size_5x7_flat_taxon": "5x7 Flat", "format_and_size_taxon": "Format \u0026 Size", "tone_simply_stated_taxon": "Simply Stated", "tone_taxon": "Tone", "wedding_taxon": "Wedding", "wedding_all_invitations_taxon": "Invitations", "wedding_all_invitations_wedding_invitations_taxon": "Wedding Invitations", "invitations_taxon": "Invitations", "invitations_all_wedding_taxon": "Wedding", "invitations_all_wedding_invitations_taxon": "Wedding Invitations" } }

but when I issue the following query
`"query":{"multi_match":{"analyzer":"pod_analyzer","query":"wedding","operator":"or","type":"best_fields","fields":["name","sku","description","inside_message","*_taxon"]}}}``

it comes back with no results. This happens with several other terms too, but I can find documents with the each terms. I am using a custom analyzer with the lowercase and snowball filters. Any ideas?


(Adrien Grand) #2

Likely, the problem is that the terms that are extracted by your index analyzer are different from the terms that are extracted by your pod_analyzer. I would advise to experiment with the analyze API to see what tokens are generated. https://www.elastic.co/guide/en/elasticsearch/reference/2.1/indices-analyze.html


(Brian The Coder) #3

I'm using the same custom analyzer for both index and querying


(Adrien Grand) #4

Can you provide us with a minimal recreation of the problem?


(Brian The Coder) #5

I posted the query and a sample document that should be returned. Not sure what else you need. I was using the snowball, lowercase, and english stopwords filters. Add another one for synonyms, but wedding is not in the synonym list


(Adrien Grand) #6

I would just like to know the exact analyzer and mapping that you used in order to be able to identify the problem. Having a recreation will help ensure I use the exact same commands as you instead of variants that might not expose the problem.


(Brian The Coder) #7

Here's the settings for the index

        index: :pod_spree,
        type: :product,
        body: {
          settings: {
            analysis: {
              filter: {
                pod_synonyms: {
                  type: :synonym,
                  ignore_case: :true,
                  synonyms: YAML.load_file(Rails.root + 'config/synonyms.yml')
                },
                pod_stopwords: {
                  type: :stop,
                  stopwords: YAML.load_file(Rails.root + 'config/stopwords.yml')
                },
                standard_stopwords: {
                  type: :stop,
                  stopwords: '_english_'
                }
              },
              analyzer: {
                pod_analyzer: {
                  tokenizer: :standard,
                  language: 'English',
                  filter: [
                    :lowercase,
                    :pod_synonyms,
                    :standard_stopwords,
                    :pod_stopwords,
                    :snowball
                  ]
                }
              }
            }
          },
          mappings: {
            product: {
              properties: mappings
            }
          }
        }

(Adrien Grand) #8

can you share the mappings as well?


(Brian The Coder) #9

The only mappings I customized are these
mappings = { price: { type: 'double', index: 'not_analyzed' }, available_on: { type: 'date', index: 'not_analyzed' }, sku: { type: 'string', analyzer: 'simple' } }

Everything else is just a string and use the default analyzer specified


(Adrien Grand) #10

Since you did not specify which analyver to use in your mappings, your string fields are using the default standard analyzer while you are using pod_analyzer at query time. I suspect this explains the problems that you are having. You need to explicitly say which analyzer to use in your mappings using the analyzer property. https://www.elastic.co/guide/en/elasticsearch/reference/2.1/analyzer.html


(Brian The Coder) #11

When I was using the standard analyzer that query worked. Once I added the custom analyzer it stopped working. It works if I remove the snowball tokenizer. I'm using the same analyzer on the query too


(system) #12