Analyzing a list without analyzing the values in the list (for facetting)

Simon_Orr · October 22, 2013, 11:39am

I've got a Tags field in my index defined as ....

"Tags" : {
"type" : "multi_field",
"fields" : {
"Tags" : {
"type" : "string"
},
"Raw" : {
"type" : "string",
"index" : "not_analyzed",
"omit_norms" : true,
"index_options" : "docs",
"include_in_all" : false
}
}
},

I'm using this field to store a list of tags. 3 sample documents:

{"Tags": ["SomeTag", "SomeOtherTag", "Some.Tag3"]},
{"Tags": ["SomeTag", "Some.Tag3"]},
{"Tags": ["SomeTag"]}

This works fine for storing, retrieving and searching.

The problem occurs when I want to retrieve a full list of all tags in the
system (and a count of their usage).

My initial approach was to facet on the Tags field. If I facet on Tags directly
(effectively, the Analysed field), I get...

"sometag": 3,
"someothertag": 1,
"some": 2,
"tag3": 2

Note that Some.Tag3 has been split by the tokenizer into 2 fields. I also
lose the casing (although this is a lesser issue).

On the other hand, if I facet on the non-analyzed version of the field, I
get:

"["SomeTag", "SomeOtherTag", "Some.Tag3"]" : 1,
"["SomeTag", "Some.Tag3"]": 1,
"["SomeTag"]": 1

Which is, of course, absolutely correct but not what I want.

So... How can I tell elastic to parse the list but not the contents of
each entry?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · October 22, 2013, 11:57am

You did not give the commands how you created the documents, but I think
there must be an error.

In this example https://gist.github.com/jprante/7099277 everythings works
as expected.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Simon_Orr · October 22, 2013, 1:00pm

Thanks for the reply.

Your example does indeed work as expected (I had to tweak it slightly as we
have auto-mapping turned off, so changed "default" to "data") but the
main point holds true.

Our index mapping is being created by a Java program based on a document
definition which is shared between multiple services.

What's really interesting is that reading the mappings back from elastic
results in identical definitions for your test index and mine...

  "tags" : {
    "type" : "multi_field",
    "fields" : {
      "tags" : {
        "type" : "string"
      },
      "raw" : {
        "type" : "string",
        "index" : "not_analyzed",
        "omit_norms" : true,
        "index_options" : "docs",
        "include_in_all" : false
      }
    }
  }

"Tags" : {
"type" : "multi_field",
"fields" : {
"Tags" : {
"type" : "string"
},
"Raw" : {
"type" : "string",
"index" : "not_analyzed",
"omit_norms" : true,
"index_options" : "docs",
"include_in_all" : false
}
}
},

In any case, thanks for taking the time to point out that it should be
working - Clearly there's something funky going on, either in the way the
index is being created or in the way documents are being passed for
indexing.

At least I've got an idea where to look now - thanks again.

Simon

On Tuesday, October 22, 2013 12:57:55 PM UTC+1, Jörg Prante wrote:

You did not give the commands how you created the documents, but I think
there must be an error.

In this example Demonstration of facets with multi_field · GitHub everythings works
as expected.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Elasticsearch array field of keywords - how to index it? Elasticsearch	1	692	July 6, 2017
Problems with filters/facets on not_analyzed fields Elasticsearch	4	463	July 6, 2017
Setting term separators, analyzers with java api Elasticsearch	5	350	July 6, 2017
Faceting Elasticsearch	4	303	July 6, 2017
String is tokenized in terms facet but shouldn't be Elasticsearch	3	402	July 6, 2017

Analyzing a list without analyzing the values in the list (for facetting)

Related topics