How to index substrings

anjireddy543 · August 4, 2018, 1:25am

Hi,

I am having some problem with indexing below data.
Documents :-

CPC, 1908 — Order 43 Rule 1(r ), Order 39 Rule 1, 2
CPC, 1908 — Section 96, 105(1), Order 9 Rule 7
CPC, 1908 — Order 39 Rules 1, Order 39 Rules 1, Order 39 Rule 4, Order 39 Rule 2A, Order 18 Rule 1, Section 1, 151

currently i am using white space analyzer .

Problem :-
if search for section 9 its matching second record but its section 96.

what is the best solution for this problem i have looked at simple pattern split tokenizer but i am thinking it will go complicate.

can i use nested/parent-child/denormalization for all my sections/orders.

appreciate for any suggestions.

spinscale · August 6, 2018, 4:04pm

can you provide a fully reproducible example which contains the documents you are indexing, along with the mapping containing the whitespace analyzer? This way it is possible to follow your steps and see where things go sideways. This also means you should explain your expectations in order to better understand.

anjireddy543 · August 7, 2018, 8:50am

Thanks,
Document data :

(CPC) - Order 2 Rule 2, Order 2 Rule 2(3), Order 22 Rule 9, Order 23 Rule 1, Order 39 Rule 1, Order 39 Rule 2, Order 9 Rule 4, Order 9 Rule 8, Order 9 Rule 9, Section 11,12 , Article 35,38
(CPC) Section 8
(CPC) Order 11 Rule 1

If i search for Section 8 its matching first document also but its need to match only second document.

i think i can achieve this with nested object but their is a issue with highlight in nested objects.

spinscale · August 7, 2018, 11:33am

this is nowhere near the fully reproducible example I asked for, so my answer will be more of guessing.

In order to understand how your data gets indexed, you should run the analyze API against your fields. This will show you, how the data is indexed into elasticsearch and which tokens are being generated. Fiddling around with different analyzers will show you different tokens.

A good analyzer in this case would leave the numbers fully intact and you might be able to use a match phrase query to find proper documents.

Again as you have neither specified how your data is indexed nor which query you are using, it's hard to help.

anjireddy543 · August 8, 2018, 11:03am

Hi Spinscale,

Thanks for the reply, Now my analyzer problem is solved but i am facing highlight issue with nested objects

Mapping :-

 { "mappings" : {
      "jdj_v2" : {
        "properties" : {
          "field1" : {
            "type" : "nested",
            "include_in_parent" : true,
            "properties" : {
              "nested-field1" : {
                "type" : "text",
                "analyzer" : "standard"
              },
              "nested-field2" : {
                "type" : "nested",
                "properties" : {
                  "value" : {
                    "type" : "text",
                    "analyzer" : "standard"
                  }
                }
              },
              "nested-field3" : {
                "type" : "text",
                "analyzer" : "standard"
              },
              "nested-field4" : {
                "type" : "text",
                "analyzer" : "standard"
              },
              "nested-field5" : {
                "type" : "text",
                "analyzer" : "standard"
              }
            }
          }
        }
      }
 }
}

i am able to hightlight individual fields field1.nested-field1,field1.nested-field2...
but i am not able to highlight as one object.

Topic		Replies	Views
Highlight issue in nested objects Elasticsearch	2	3075	February 24, 2018
How highlight works? Elasticsearch	1	754	December 30, 2016
Highlight the nested object Elasticsearch	0	277	February 25, 2012
Nested query and highlight fields Elasticsearch	5	2100	July 27, 2011
Highlighting substrings in matched terms Elasticsearch	3	737	October 11, 2011

How to index substrings

Related topics