Strange Highlighting

Along with the matched word elasticsearch is highlighting single letter present in the content field .
I am using elasticsearch1.7.2 . Below are mapping I am using .
I am highlighting content field .

PUT nms_repository
{
 "settings": {
    "analysis": {
      "analyzer": {
        "case_insensitive_sort": {
          "tokenizer": "keyword",
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  },
    "mappings": {
        "document":{
            "properties": {
                "dataclass":{
                    "type": "nested",
                    "properties": {
                        "dataclassname":{"type": "string","index":"not_analyzed","copy_to":"suggestf"},
                        "PickableFields":{
                            "type": "nested",
                            "properties": {
                                "key":{"type": "string","index":"not_analyzed"},
                                "value":{"type": "string","index":"not_analyzed","dynamic":true}
                            }
                         },
                        "NonPickableFields":{
                            "type": "nested",
                            "properties": {
                                "key":{"type": "string","index":"not_analyzed"},
                                "value":{"type": "string","index":"not_analyzed","dynamic":true}
                            }
                        }
                    }
                },
               "globalindex":{
                   "type": "nested",
                   "properties": {
                       "NonPickableFields":{
                           "type": "nested",
                           "properties": {
                               "key":{"type": "string","index":"not_analyzed","copy_to":"suggestf"},
                               "value":{"type": "string","index":"not_analyzed","dynamic":true}
                           }
                       },
            		   "PickableFields":{
                           "type": "nested",
                           "properties": {
                               "key":{"type": "string","index":"not_analyzed","copy_to":"suggestf"},
                               "value":{"type": "string","index":"not_analyzed","dynamic":true}
                           }
                       }
                   }
               },
			   "documentname":{"type": "string","index_analyzer":"case_insensitive_sort","copy_to":"suggestf"},               
               "**content**":{
                   "type": "string"
               },
			   "owner":{"type": "string","index_analyzer":"case_insensitive_sort"},
                  "suggestf":{"type": "completion",
               "analyzer": "simple",
               "search_analyzer": "simple",
               "payloads":true
               }
            }
        },
        "folder":{
            "properties": {
                "dataclass":{
                    "type": "nested",
                    "properties": {
                        "dataclassname":{"type": "string","index":"not_analyzed","copy_to":"suggestf"},
                        "PickableFields":{
                            "type": "nested",
                            "properties": {
                                "key":{"type": "string","index":"not_analyzed"},
                                "value":{"type": "string","index":"not_analyzed","dynamic":true}
                            }
                         },
                        "NonPickableFields":{
                            "type": "nested",
                            "properties": {
                                "key":{"type": "string","index":"not_analyzed"},
                                "value":{"type": "string","index":"not_analyzed","dynamic":true}
                            }
                        }
                    }
                },
               "globalindex":{
                   "type": "nested",
                   "properties": {
                       "NonPickableFields":{
                           "type": "nested",
                           "properties": {
                               "key":{"type": "string","index":"not_analyzed","copy_to":"suggestf"},
                               "value":{"type": "string","index":"not_analyzed","dynamic":true}
                           }
                       },
                	   "PickableFields":{
                           "type": "nested",
                           "properties": {
                               "key":{"type": "string","index":"not_analyzed","copy_to":"suggestf"},
                               "value":{"type": "string","index":"not_analyzed","dynamic":true}
                           }
                       }
                   }
               },
			   "documentname":{"type": "string","index_analyzer":"case_insensitive_sort","copy_to":"suggestf"},
			   "owner":{"type": "string","index_analyzer":"case_insensitive_sort"},
               "suggestf":{"type": "completion",
               "analyzer": "simple",
               "search_analyzer": "simple",
               "payloads":true
               }
            }
        }    
    }    
}

Hi,

Can you give an example query and an example document that matches the query but exhibits the "single letter" highlichting problem? How does the response for that particular document look like?

Thanks for responding !

Actually when I am querying from sense(chrome plugin) results are all good and no single letter is highlighted but while querying through JAVA API I am not happy with results .

Here is Query

response = client
					.prepareSearch(index)
					//.setTypes(type)								
					.setAggregations(builder)// index and type in which query will search for documents
					.setSearchType(SearchType.QUERY_THEN_FETCH).addSort(SortBuilders.fieldSort(sort).order(so))
					.setQuery(query).addHighlightedField("content", 70, 2).setHighlighterPostTags("</b>").setHighlighterPreTags("<b>")
					.setHighlighterOrder("score")
					.setFrom(from).setSize(size).setExplain(false)
					.execute()
					.actionGet();

I am using QueryStringQuery . I searched for term "sheet" (without quotes) and some highlighted fragment returned were good . But in some matched documents fragments were like

  1. Trans_OK_formula L A H
  2. 51 + Worksteps Form GUI Design Form Module s1-3 Modules 4 to 6

And some expected results were also there like

VerbalAdditional Committment Milestone Sheet Delivery Checklist

I am highlighting on content field which looks like this

\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nA Butler Group Technology Audit is an independent assessment of a solution or service, which will be published on Butler Group’s CIO Knowledge Centre Web site, and made available to over 15,000 IT and business decision makers in end-user organisations. \r\nTo assist in gathering factual information for the Technology Audit, we would kindly ask you to complete this pre-briefing questionnaire, and return it to us at least one week prior to the meeting. Completing the questionnaire helps analysts to formulate questions for discussion during the briefing call or meeting.\r\nFollowing the meeting, the analyst(s) working on the project will write a draft document, which we will send to you for factual review. We request that you return this to us with your comments within five working days of receipt.\r\nOnce any changes have been incorporated, we will ask you to confirm final sign-off of the Technology Audit, which will then be passed into our publishing process.\r\n\r\n\r\n\r\n\r\n\r\nVendor Name \t\tNewgen Software Technologies Limited\t\t\r\nName of Product(s) \tVersion No.\t\tOmniDocs™\t6.0\t\t\t\t\t\t\

CAn You please help !

Please be patient. This forum is largely managed by volunteers. If you require SLA for your questions, Elastic offers commercial subscriptions that provide this.

Hi,

there isn't anything obvious that comes to my mind at the moment, I wasn't able to recreate the problem with the information provided which leaves me guessing. You mention querying with the QueryDSL through Sense works, but through Java api it doesn't, which leaves me guessing what might be the difference.
It would also help if you could try to recreate the problem with a minimal example (containing e.g. only one bad document) that can be used to show the behaviour both from Sense and the Java API. Also, please note (as @Christian_Dahlqvist mentioned) that only because someone answers to one of your posts in this forum doesn't necessary mean she can solve your problem. At this point I'm just trying to ask for more information that might help me (or others) to better look into this.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.