Excluding terms from highlighting


(Beau Keogh) #1

Hi, I have an index with documents which contain a "companyid" and some
other text fields. The companyid refers to our clients' id number. So in
our application when presenting results to each client we need to construct
our search to include both the criteria they enter AND their clientid
number so that we're only showing them their records. We also have
highlighting turned on. The issue is that if the client id is "1", the
number 1 gets highlighted in the snippets along with the other valid
keywords.

So, is there a way to exclude a term from highlighting? Or, is our approach
even the best way? We're using couchdb river to index a single couchdb
database and it's working well. I'd rather not get into having a db/index
for each client...

Thanks in advance

--


(David Pilato) #2

I strongly think that it's better to isolate clients by index. There is no additional cost in term of storage or performance.

You can also give a look to aliases:
http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html

It should help with filtered aliases.

HTH

David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 17 août 2012 à 17:06, Beau Keogh beaukeogh@gmail.com a écrit :

Hi, I have an index with documents which contain a "companyid" and some other text fields. The companyid refers to our clients' id number. So in our application when presenting results to each client we need to construct our search to include both the criteria they enter AND their clientid number so that we're only showing them their records. We also have highlighting turned on. The issue is that if the client id is "1", the number 1 gets highlighted in the snippets along with the other valid keywords.

So, is there a way to exclude a term from highlighting? Or, is our approach even the best way? We're using couchdb river to index a single couchdb database and it's working well. I'd rather not get into having a db/index for each client...

Thanks in advance

--


(Beau Keogh) #3

Thanks David,

we chose to use a single index based on feedback from Shay (
https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/beau$20keogh/elasticsearch/HcoeGoPXJ_U/bM5SpYnrH1wJ
,
https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/data$20flow/elasticsearch/49q-_AgQCp8/MRol0t9asEcJ[1-25]).
Also since we're indexing couchdb I thought it would be less of a headache
to maintain one couchdb->es river vs potentially 1000's.

But, the alias idea sounds like it might be the right track, in tandem with
routing. But I'm trying to figure out if/how that would work. From what
I've digested thus far I think the following might work:

  • One large index with 50-100 shards
  • setup a mapping with the _routing field and set the path to our companyid
    field
  • create an alias for each company and set the routing value to the
    companyid and also set the filter to the appropriate companyid

Any input/feedback is much appreciated!

On Friday, August 17, 2012 11:08:24 AM UTC-5, David Pilato wrote:

I strongly think that it's better to isolate clients by index. There is no
additional cost in term of storage or performance.

You can also give a look to aliases:
http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html

It should help with filtered aliases.

HTH

David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 17 août 2012 à 17:06, Beau Keogh <beau...@gmail.com <javascript:>> a
écrit :

Hi, I have an index with documents which contain a "companyid" and some
other text fields. The companyid refers to our clients' id number. So in
our application when presenting results to each client we need to construct
our search to include both the criteria they enter AND their clientid
number so that we're only showing them their records. We also have
highlighting turned on. The issue is that if the client id is "1", the
number 1 gets highlighted in the snippets along with the other valid
keywords.

So, is there a way to exclude a term from highlighting? Or, is our
approach even the best way? We're using couchdb river to index a single
couchdb database and it's working well. I'd rather not get into having a
db/index for each client...

Thanks in advance

--

--


(Clinton Gormley) #4

Hi David

On Fri, 2012-08-17 at 18:08 +0200, David Pilato wrote:

I strongly think that it's better to isolate clients by index. There
is no additional cost in term of storage or performance.

It depends how big each client is. If you have 10,000 small clients,
you're going to struggle to have one real index per client.

I'd say much better to use filtered aliases instead.

clint

--


(Clinton Gormley) #5

On Fri, 2012-08-17 at 08:06 -0700, Beau Keogh wrote:

Hi, I have an index with documents which contain a "companyid" and
some other text fields. The companyid refers to our clients' id
number. So in our application when presenting results to each client
we need to construct our search to include both the criteria they
enter AND their clientid number so that we're only showing them their
records. We also have highlighting turned on. The issue is that if the
client id is "1", the number 1 gets highlighted in the snippets along
with the other valid keywords.

Don't include the companyid in the _all field. Or, better, just search
on the fields you really want to search on, instead of the _all field

clint

--


(system) #6