Existing multi-index cluster; now I need to search across it


(Andrew O'Brien) #1

Hi,

I'm working with a cluster that has 20 or so indices whose types are pretty
different from each other in the details, but have a couple of
commonalities across them. The application has "advanced search" forms that
are specific to each index and have a lot of filters on various fields.

Now I need to make a "global search" that's a simple text match without
changing the way that the advanced forms make their queries, minimizing
duplication of data, while getting good relevancy results.

At one time, I proposed a single multi-type index, but that's a lot of
duplication and adds quite an ETL burden. Right now, I'm considering adding
"copy_to" to the mappings to copy the specialized fields into consistently
named ones so that I can do a multi-index search that at least produces
similar result objects, but there's still duplication there (if there were
an "alias_to", I might be able to avoid that... but I'm sure that would add
complexity somewhere).

Another option is to have the application do that alias-mapping/unmapping,
but that would make queries enormous, add complexity to result rendering,
and perhaps worst: would make it almost impossible to figure out how
scoring works (at least, that's what I've argued to anyone who's suggested
it). I'd love to hear some more informed opinions though.

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/55973bfa-59a9-41bd-b24f-a0d69de707ed%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Binh Ly-2) #2

Not sure if this would help, but in your mapping, on a field, you can
specify "index_name", in which case you can refer to that name in your
queries.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d6a24abc-4b84-4327-8964-9fe155a7d2f6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Andrew O'Brien) #3

Thanks Binh. That looks very promising.

I was able to make a multi-index multi_match querying my index_named
fields. Here's my Sense session: https://gist.github.com/AndrewO/10930544

I noticed a slight but consistent increase in size (~200b for 1 document)
for an index with an index_name'd mapping vs. one without. Any idea what
could account for that? Or is this just noise?

There are two problems that are still open for me with this approach:

  1. _source doesn't seem to respect index_names, so I can't limit the
    response fields using my common index_names.
  2. Highlighting doesn't seem to include copy_to fields. I might be missing
    a setting here...

The gist has some examples demonstrating these. Any pointers would be
appreciated.

On Wednesday, April 16, 2014 9:55:45 AM UTC-4, Binh Ly wrote:

Not sure if this would help, but in your mapping, on a field, you can
specify "index_name", in which case you can refer to that name in your
queries.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/16cfded4-eb47-44be-9032-c347d0691dc8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4