Correctly indexing data into one place with multiple analyzers


(Kevin Claggett) #1

I have some documents with ~30 fields, most of which i just want to analyze
with the defaults, a couple i want to use snowballing or other custom
analyzers on.

The recommended way to do this seems to be using the index_name property to
aliase a custom _all field, such as:

curl -XPOST $elasticloc:9200/ss -d '{
"settings" : {
"index.query.default_field" : "newall",
"analysis" : {
"filter" : {
"email" : {
"type" : "pattern_capture",
"preserve_original" : 1,
"patterns" : [
"(\w+)",
"(\p{L}+)",
"(\d+)",
"@(.+)"
]
}
},
"analyzer" : {
"email" : {
"tokenizer" : "uax_url_email",
"filter" : ["email", "lowercase", "unique"]
},
"mysnowball" : {
"type" : "snowball",
"language" : "English"
}
}
}
},
"mappings" : {
"filter" : {
"properties" : {
"ts" : {"type" : "date"},
"cid" : {
"type" : "multi_field",
"path" : "just_name",
"fields" : {
"cid" : {"type" : "string"},
"newall" : {"type" : "string", "index_name" :
"shared"}
}
},
"score" : {
"type" : "multi_field",
"path" : "just_name",
"fields" : {
"spamscore" : {"type" : "integer"},
"newall" : {"type" : "integer", "index_name" :
"shared"}
}
},
"action" : {
"type" : "multi_field",
"path" : "just_name",
"fields" : {
"action" : {"type" : "string"},
"newall" : {"type" : "string", "index_name" :
"shared"}
}
},
"from" : {
"type" : "multi_field",
"path" : "just_name",
"fields" : {
"env_from" : {"type" : "string", "analyzer" :
"email"},
"newall" : {"type" : "string", "analyzer" :
"email", "index_name" : "shared"}
}
},
"rcpt" : {
"type" : "multi_field",
"path" : "just_name",
"fields" : {
"env_rcpt" : {"type" : "string", "analyzer" :
"email"},
"newall" : {"type" : "string", "analyzer" :
"email", "index_name" : "shared"}
}
},
"subject" : {
"type" : "multi_field",
"path" : "just_name",
"fields" : {
"subject" : {"type" : "string", "analyzer" :
"mysnowball"},
"newall" : {"type" : "string", "analyzer" :
"mysnowball", "index_name" : "shared"}
}
}
},
"_ttl" : {"enabled" : true, "default" : "'$datattl'"},
"_timestamp" : {"enabled" : true},
"_all" : {"enabled" : false}
},
}
}'

now my problem is, how do i access this aliased index from kibana (using 3
milestone 4)? Also is there a better way to create this custom _all field
where it includes tokens from a coule different analyzers?

Sorry if this is a terrible question,

Kevin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ba4a7958-5ea8-4db0-aa7e-687f3ccab644%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Binh Ly) #2

I'm not sure your mapping actually does what you think/expect it to do.
Actually, I don't believe you can combine multiple analyzed-already tokens
from different fields into 1 field at all. Your best bet for correctness is
probably just to leave all the multi-fields alone and then run queries like
this in Kibana: field1.newall:blah OR field2.newall:blahblah, etc.

BTW, not sure if this helps at all, but you can still use the _all field,
and then selectively combine fields to it using the "include_in_all"
property. But still this does not aggregate the analyzed-already tokens. It
only aggregates the raw input field values and then it will go through a
single analyzer which is what you assign to the _all field.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4c60e4b1-828d-4ecb-9f7b-3bd059eb4ab3%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Kevin Claggett) #3

So what you are saying is, there is no way to aggregate together into one
place all the tokens generated by one document?

I mostly wanted to do this so that an end user doesn't have to understand
what fields are in the document, or lucene query syntax to get the results
they are looking for.

Thanks
Kevin

On Wednesday, February 12, 2014 6:10:04 AM UTC-8, Binh Ly wrote:

I'm not sure your mapping actually does what you think/expect it to do.
Actually, I don't believe you can combine multiple analyzed-already tokens
from different fields into 1 field at all. Your best bet for correctness is
probably just to leave all the multi-fields alone and then run queries like
this in Kibana: field1.newall:blah OR field2.newall:blahblah, etc.

BTW, not sure if this helps at all, but you can still use the _all field,
and then selectively combine fields to it using the "include_in_all"
property. But still this does not aggregate the analyzed-already tokens. It
only aggregates the raw input field values and then it will go through a
single analyzer which is what you assign to the _all field.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e4a0db18-bc77-4294-a1f8-16e517663ab8%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Binh Ly) #4

Kevin,

Just want to be clear. You certainly can aggregate field values into the
_all field (and selectively field by field too). However, the values that
are aggregated are the raw field values from your source document. They all
appended together and then all analyzed uniformly by whatever is the
analyzer of the _all field (which is the standard analyzer by default). I
do not believe there is a way to aggregate the already-analyzed tokens
(i.e. not the raw source field values) into a single field. Just for
clarification. :slight_smile:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5b8991bb-f315-40a4-adf3-e58d63f3141a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #5