Filtering substring

To filter substring I have tried regexp filtering with ".my-string."expression. Is there faster way to filter substring? I know some developers
try to avid re treating them as "very slow".

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Andrew,
could you elaborate a little more about what you want to achieve? Could you
maybe post your current query?

Anyways, yes it's usually recommended to index data in such a way that
regular expressions or similar operations that might slow down your
searches are not required at query time.

On the other hand, it all depends on your performance expectation. It might
be that with your real data, your queries (not only the type but also the
amount of them) and your hardware (together with the number of available
nodes)...performance is satisfactory anyway. But you still could make it
even faster indexing data to facilitate the queries that you are going to
execute against your index.

Cheers
Luca

On Tuesday, July 23, 2013 10:25:08 AM UTC+2, Andrew Gaydenko wrote:

To filter substring I have tried regexp filtering with ".my-string."expression. Is there faster way to filter substring? I know some developers
try to avid re treating them as "very slow".

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Luca, hi!

Every document in the index has from zero to 7-8 associated keys, and last
ones are available at index time. Every key is just a string with given
alphabet (digits + latin letters + few another punctuation symbols) and
given length (~40). At the moment I just have added "chain" field where
space-separated associated keys are stored as string. I'd want to avoid
analyzing the field, i.e. to stay in filtering, as far as there are another
fields for analyzing and querying.

Filtering is simple: show documents having given key in document's list of
keys.

could you elaborate a little more about what you want to achieve? Could
you maybe post your current query?

At the moment in my experiments it is just a filter with regexp:

{"filter":{"and":[{"regexp":{"chain":".key-to-filter."}}]},"sort":[{"_id":{"order":"asc"}}]

Potentially this filter is just one in the long array :slight_smile: - but all others
are as simple as term or range filter.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Andrew,
well the best way to index that information would be to analyse those
fields and tokenize on whitespaces, so that evey key is a separate token
and querying by key becomes way faster than keeping a single string,
needing to run a regexp to find partial matches.

I didn't get why you'd prefer not to analyse that field, and maybe I'm
making it easier than it is with your real data, but that's what I would
try to do.

Cheers
Luca

P.S. I noticed that you are sorting on the _id field. Maybe you already
know, but the _id field is not indexed by default, thus you need to change
its mapping (index: not_analyzed) otherwise your sorting directive won't
have any effect I'm afraid.

On Tue, Jul 23, 2013 at 2:08 PM, Andrew Gaydenko
andrew.gaydenko@gmail.comwrote:

Luca, hi!

Every document in the index has from zero to 7-8 associated keys, and last
ones are available at index time. Every key is just a string with given
alphabet (digits + latin letters + few another punctuation symbols) and
given length (~40). At the moment I just have added "chain" field where
space-separated associated keys are stored as string. I'd want to avoid
analyzing the field, i.e. to stay in filtering, as far as there are another
fields for analyzing and querying.

Filtering is simple: show documents having given key in document's list of
keys.

could you elaborate a little more about what you want to achieve? Could
you maybe post your current query?

At the moment in my experiments it is just a filter with regexp:

{"filter":{"and":[{"regexp":{"chain":".key-to-filter."}}]},"sort":[{"_id":{"order":"asc"}}]

Potentially this filter is just one in the long array :slight_smile: - but all others
are as simple as term or range filter.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/f1XEiiAwOJ8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Luca, thanks! OK, I'll estimate this field analyzing again. And thanks -
I'll repair sorting.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Cool, one more small thing about sorting on _id. If you query only a single
type you can rely on the _uid field which is indexed by default. It
contains both type and the _id. On the other hand remember that both _id
and _uid are treated as strings by default, not sure what you expect when
sorting on them.

Cheers
Luca

On Tue, Jul 23, 2013 at 2:47 PM, Andrew Gaydenko
andrew.gaydenko@gmail.comwrote:

Luca, thanks! OK, I'll estimate this field analyzing again. And thanks -
I'll repair sorting.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/f1XEiiAwOJ8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Tuesday, July 23, 2013 4:52:28 PM UTC+4, Luca Cavanna wrote:

Cool, one more small thing about sorting on _id. If you query only a
single type you can rely on the _uid field which is indexed by default. It
contains both type and the _id. On the other hand remember that both _id
and _uid are treated as strings by default, not sure what you expect when
sorting on them.

In fact _id is the only field I'm not going to use in query or search (_id
is a return value) :slight_smile: At that fragment I have just added sorting to compare
different search variants, but not used them yet. Now with you help I know
this sorting doesn't work.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.