Filtering substring

Andrew_Gaydenko · July 23, 2013, 8:25am

To filter substring I have tried regexp filtering with ".my-string."expression. Is there faster way to filter substring? I know some developers
try to avid re treating them as "very slow".

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

javanna · July 23, 2013, 11:50am

Hi Andrew,
could you elaborate a little more about what you want to achieve? Could you
maybe post your current query?

Anyways, yes it's usually recommended to index data in such a way that
regular expressions or similar operations that might slow down your
searches are not required at query time.

On the other hand, it all depends on your performance expectation. It might
be that with your real data, your queries (not only the type but also the
amount of them) and your hardware (together with the number of available
nodes)...performance is satisfactory anyway. But you still could make it
even faster indexing data to facilitate the queries that you are going to
execute against your index.

Cheers
Luca

On Tuesday, July 23, 2013 10:25:08 AM UTC+2, Andrew Gaydenko wrote:

To filter substring I have tried regexp filtering with ".my-string."expression. Is there faster way to filter substring? I know some developers
try to avid re treating them as "very slow".

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Andrew_Gaydenko · July 23, 2013, 12:08pm

Luca, hi!

Every document in the index has from zero to 7-8 associated keys, and last
ones are available at index time. Every key is just a string with given
alphabet (digits + latin letters + few another punctuation symbols) and
given length (~40). At the moment I just have added "chain" field where
space-separated associated keys are stored as string. I'd want to avoid
analyzing the field, i.e. to stay in filtering, as far as there are another
fields for analyzing and querying.

Filtering is simple: show documents having given key in document's list of
keys.

could you elaborate a little more about what you want to achieve? Could
you maybe post your current query?

At the moment in my experiments it is just a filter with regexp:

{"filter":{"and":[{"regexp":{"chain":".key-to-filter."}}]},"sort":[{"_id":{"order":"asc"}}]

Potentially this filter is just one in the long array - but all others
are as simple as term or range filter.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

javanna · July 23, 2013, 12:32pm

Hi Andrew,
well the best way to index that information would be to analyse those
fields and tokenize on whitespaces, so that evey key is a separate token
and querying by key becomes way faster than keeping a single string,
needing to run a regexp to find partial matches.

I didn't get why you'd prefer not to analyse that field, and maybe I'm
making it easier than it is with your real data, but that's what I would
try to do.

Cheers
Luca

P.S. I noticed that you are sorting on the _id field. Maybe you already
know, but the _id field is not indexed by default, thus you need to change
its mapping (index: not_analyzed) otherwise your sorting directive won't
have any effect I'm afraid.

On Tue, Jul 23, 2013 at 2:08 PM, Andrew Gaydenko
andrew.gaydenko@gmail.comwrote:

Luca, hi!

Every document in the index has from zero to 7-8 associated keys, and last
ones are available at index time. Every key is just a string with given
alphabet (digits + latin letters + few another punctuation symbols) and
given length (~40). At the moment I just have added "chain" field where
space-separated associated keys are stored as string. I'd want to avoid
analyzing the field, i.e. to stay in filtering, as far as there are another
fields for analyzing and querying.

Filtering is simple: show documents having given key in document's list of
keys.

could you elaborate a little more about what you want to achieve? Could
you maybe post your current query?

At the moment in my experiments it is just a filter with regexp:

{"filter":{"and":[{"regexp":{"chain":".key-to-filter."}}]},"sort":[{"_id":{"order":"asc"}}]

Potentially this filter is just one in the long array - but all others
are as simple as term or range filter.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/f1XEiiAwOJ8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Andrew_Gaydenko · July 23, 2013, 12:47pm

Luca, thanks! OK, I'll estimate this field analyzing again. And thanks -
I'll repair sorting.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

javanna · July 23, 2013, 12:52pm

Cool, one more small thing about sorting on _id. If you query only a single
type you can rely on the _uid field which is indexed by default. It
contains both type and the _id. On the other hand remember that both _id
and _uid are treated as strings by default, not sure what you expect when
sorting on them.

Cheers
Luca

On Tue, Jul 23, 2013 at 2:47 PM, Andrew Gaydenko
andrew.gaydenko@gmail.comwrote:

Luca, thanks! OK, I'll estimate this field analyzing again. And thanks -
I'll repair sorting.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/f1XEiiAwOJ8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Andrew_Gaydenko · July 23, 2013, 1:04pm

On Tuesday, July 23, 2013 4:52:28 PM UTC+4, Luca Cavanna wrote:

Cool, one more small thing about sorting on _id. If you query only a
single type you can rely on the _uid field which is indexed by default. It
contains both type and the _id. On the other hand remember that both _id
and _uid are treated as strings by default, not sure what you expect when
sorting on them.

In fact _id is the only field I'm not going to use in query or search (_id
is a return value) At that fragment I have just added sorting to compare
different search variants, but not used them yet. Now with you help I know
this sorting doesn't work.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Mappings & Analyzer Elasticsearch	2	267	July 6, 2017
Java API substring matching Elasticsearch	2	670	July 6, 2017
Finding substring achored at beginning of field Elasticsearch	5	426	July 6, 2017
Speed of query with many filters Elasticsearch	6	400	July 6, 2017
Better effective substring query idea? Elasticsearch	13	1529	July 6, 2017

Filtering substring

Related topics