Analyzing a list without analyzing the values in the list (for facetting)


(Simon Orr) #1

I've got a Tags field in my index defined as ....

"Tags" : {
"type" : "multi_field",
"fields" : {
"Tags" : {
"type" : "string"
},
"Raw" : {
"type" : "string",
"index" : "not_analyzed",
"omit_norms" : true,
"index_options" : "docs",
"include_in_all" : false
}
}
},

I'm using this field to store a list of tags. 3 sample documents:

{"Tags": ["SomeTag", "SomeOtherTag", "Some.Tag3"]},
{"Tags": ["SomeTag", "Some.Tag3"]},
{"Tags": ["SomeTag"]}

This works fine for storing, retrieving and searching.

The problem occurs when I want to retrieve a full list of all tags in the
system (and a count of their usage).

My initial approach was to facet on the Tags field. If I facet on Tags directly
(effectively, the Analysed field), I get...

"sometag": 3,
"someothertag": 1,
"some": 2,
"tag3": 2

Note that Some.Tag3 has been split by the tokenizer into 2 fields. I also
lose the casing (although this is a lesser issue).

On the other hand, if I facet on the non-analyzed version of the field, I
get:

"["SomeTag", "SomeOtherTag", "Some.Tag3"]" : 1,
"["SomeTag", "Some.Tag3"]": 1,
"["SomeTag"]": 1

Which is, of course, absolutely correct but not what I want.

So... How can I tell elastic to parse the list but not the contents of
each entry?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #2

You did not give the commands how you created the documents, but I think
there must be an error.

In this example https://gist.github.com/jprante/7099277 everythings works
as expected.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Simon Orr) #3

Thanks for the reply.

Your example does indeed work as expected (I had to tweak it slightly as we
have auto-mapping turned off, so changed "default" to "data") but the
main point holds true.

Our index mapping is being created by a Java program based on a document
definition which is shared between multiple services.

What's really interesting is that reading the mappings back from elastic
results in identical definitions for your test index and mine...

  "tags" : {
    "type" : "multi_field",
    "fields" : {
      "tags" : {
        "type" : "string"
      },
      "raw" : {
        "type" : "string",
        "index" : "not_analyzed",
        "omit_norms" : true,
        "index_options" : "docs",
        "include_in_all" : false
      }
    }
  }

"Tags" : {
"type" : "multi_field",
"fields" : {
"Tags" : {
"type" : "string"
},
"Raw" : {
"type" : "string",
"index" : "not_analyzed",
"omit_norms" : true,
"index_options" : "docs",
"include_in_all" : false
}
}
},

In any case, thanks for taking the time to point out that it should be
working - Clearly there's something funky going on, either in the way the
index is being created or in the way documents are being passed for
indexing.

At least I've got an idea where to look now - thanks again.

  • Simon

On Tuesday, October 22, 2013 12:57:55 PM UTC+1, Jörg Prante wrote:

You did not give the commands how you created the documents, but I think
there must be an error.

In this example https://gist.github.com/jprante/7099277 everythings works
as expected.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4