How to index substrings

Hi,

I am having some problem with indexing below data.
Documents :-

  1. CPC, 1908 — Order 43 Rule 1(r ), Order 39 Rule 1, 2
  2. CPC, 1908 — Section 96, 105(1), Order 9 Rule 7
  3. CPC, 1908 — Order 39 Rules 1, Order 39 Rules 1, Order 39 Rule 4, Order 39 Rule 2A, Order 18 Rule 1, Section 1, 151

currently i am using white space analyzer .

Problem :-
if search for section 9 its matching second record but its section 96.

what is the best solution for this problem i have looked at simple pattern split tokenizer but i am thinking it will go complicate.

can i use nested/parent-child/denormalization for all my sections/orders.

appreciate for any suggestions.

can you provide a fully reproducible example which contains the documents you are indexing, along with the mapping containing the whitespace analyzer? This way it is possible to follow your steps and see where things go sideways. This also means you should explain your expectations in order to better understand.

Thanks,
Document data :

  1. (CPC) - Order 2 Rule 2, Order 2 Rule 2(3), Order 22 Rule 9, Order 23 Rule 1, Order 39 Rule 1, Order 39 Rule 2, Order 9 Rule 4, Order 9 Rule 8, Order 9 Rule 9, Section 11,12 , Article 35,38
  2. (CPC) Section 8
  3. (CPC) Order 11 Rule 1

If i search for Section 8 its matching first document also but its need to match only second document.

i think i can achieve this with nested object but their is a issue with highlight in nested objects.

this is nowhere near the fully reproducible example I asked for, so my answer will be more of guessing.

In order to understand how your data gets indexed, you should run the analyze API against your fields. This will show you, how the data is indexed into elasticsearch and which tokens are being generated. Fiddling around with different analyzers will show you different tokens.

A good analyzer in this case would leave the numbers fully intact and you might be able to use a match phrase query to find proper documents.

Again as you have neither specified how your data is indexed nor which query you are using, it's hard to help.

Hi Spinscale,

Thanks for the reply, Now my analyzer problem is solved but i am facing highlight issue with nested objects

Mapping :-

 { "mappings" : {
      "jdj_v2" : {
        "properties" : {
          "field1" : {
            "type" : "nested",
            "include_in_parent" : true,
            "properties" : {
              "nested-field1" : {
                "type" : "text",
                "analyzer" : "standard"
              },
              "nested-field2" : {
                "type" : "nested",
                "properties" : {
                  "value" : {
                    "type" : "text",
                    "analyzer" : "standard"
                  }
                }
              },
              "nested-field3" : {
                "type" : "text",
                "analyzer" : "standard"
              },
              "nested-field4" : {
                "type" : "text",
                "analyzer" : "standard"
              },
              "nested-field5" : {
                "type" : "text",
                "analyzer" : "standard"
              }
            }
          }
        }
      }
 }
}

i am able to hightlight individual fields field1.nested-field1,field1.nested-field2...
but i am not able to highlight as one object.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.