Optimise fuzzy search

I have about 20 million documents with book titles. I want to find similar
titles by applying edit distance on the tittles.
By the document of fuzzy-query does the same.
http://www.elasticsearch.org/guide/reference/query-dsl/fuzzy-query/
The problem now is search result are very slow. It takes about 3sec to
search 5-6 letter title in Es with fuzzy-query.
Is there any other query type I can use which can make it faster and give
the same/similar results ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

What are your memory settings?
How many nodes do you have?
How many shards?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 19 juin 2013 à 04:50, Arjit Gupta arjit292@gmail.com a écrit :

I have about 20 million documents with book titles. I want to find similar titles by applying edit distance on the tittles.
By the document of fuzzy-query does the same.
http://www.elasticsearch.org/guide/reference/query-dsl/fuzzy-query/
The problem now is search result are very slow. It takes about 3sec to search 5-6 letter title in Es with fuzzy-query.
Is there any other query type I can use which can make it faster and give the same/similar results ?

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

What are your memory settings?

How many nodes do you have?
How many shards?

I have given about 16G of ram in 4 node cluster. I have 5 shards. My total
document size is about 10G. So I am sure everything is in memory.

Thanks ,
Arjit

On Wed, Jun 19, 2013 at 8:35 AM, David Pilato david@pilato.fr wrote:

What are your memory settings?
How many nodes do you have?
How many shards?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 19 juin 2013 à 04:50, Arjit Gupta arjit292@gmail.com a écrit :

I have about 20 million documents with book titles. I want to find similar
titles by applying edit distance on the tittles.
By the document of fuzzy-query does the same.
http://www.elasticsearch.org/guide/reference/query-dsl/fuzzy-query/
The problem now is search result are very slow. It takes about 3sec to
search 5-6 letter title in Es with fuzzy-query.
Is there any other query type I can use which can make it faster and give
the same/similar results ?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/Q2vC2N1Gcrg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

16Gb of 32 Gb available per box, right?
Do you see anything in logs?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 19 juin 2013 à 05:20, Arjit Gupta arjit292@gmail.com a écrit :

What are your memory settings?

How many nodes do you have?
How many shards?

I have given about 16G of ram in 4 node cluster. I have 5 shards. My total document size is about 10G. So I am sure everything is in memory.

Thanks ,
Arjit

On Wed, Jun 19, 2013 at 8:35 AM, David Pilato david@pilato.fr wrote:

What are your memory settings?
How many nodes do you have?
How many shards?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 19 juin 2013 à 04:50, Arjit Gupta arjit292@gmail.com a écrit :

I have about 20 million documents with book titles. I want to find similar titles by applying edit distance on the tittles.
By the document of fuzzy-query does the same.
http://www.elasticsearch.org/guide/reference/query-dsl/fuzzy-query/
The problem now is search result are very slow. It takes about 3sec to search 5-6 letter title in Es with fuzzy-query.
Is there any other query type I can use which can make it faster and give the same/similar results ?

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/Q2vC2N1Gcrg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

Yes 16Gb of 32Gb is available on the box. I cant see anything in the logs.
Their are no exception in the logs.
And I think 2-3 sec is lot of time.

Thanks,
Arjit

Thanks ,
Arjit

On Wed, Jun 19, 2013 at 12:16 PM, David Pilato david@pilato.fr wrote:

16Gb of 32 Gb available per box, right?
Do you see anything in logs?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 19 juin 2013 à 05:20, Arjit Gupta arjit292@gmail.com a écrit :

What are your memory settings?

How many nodes do you have?
How many shards?

I have given about 16G of ram in 4 node cluster. I have 5 shards. My total
document size is about 10G. So I am sure everything is in memory.

Thanks ,
Arjit

On Wed, Jun 19, 2013 at 8:35 AM, David Pilato david@pilato.fr wrote:

What are your memory settings?
How many nodes do you have?
How many shards?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 19 juin 2013 à 04:50, Arjit Gupta arjit292@gmail.com a écrit :

I have about 20 million documents with book titles. I want to find
similar titles by applying edit distance on the tittles.
By the document of fuzzy-query does the same.
http://www.elasticsearch.org/guide/reference/query-dsl/fuzzy-query/
The problem now is search result are very slow. It takes about 3sec to
search 5-6 letter title in Es with fuzzy-query.
Is there any other query type I can use which can make it faster and give
the same/similar results ?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/Q2vC2N1Gcrg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/Q2vC2N1Gcrg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

What does your query look like?
Can you reduce a bit the dataset to query on with your use case?
I mean filter first on dates for example. Depends on your use case.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 19 juin 2013 à 11:07, Arjit Gupta arjit292@gmail.com a écrit :

Hi,

Yes 16Gb of 32Gb is available on the box. I cant see anything in the logs. Their are no exception in the logs.
And I think 2-3 sec is lot of time.

Thanks,
Arjit

Thanks ,
Arjit

On Wed, Jun 19, 2013 at 12:16 PM, David Pilato david@pilato.fr wrote:

16Gb of 32 Gb available per box, right?
Do you see anything in logs?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 19 juin 2013 à 05:20, Arjit Gupta arjit292@gmail.com a écrit :

What are your memory settings?

How many nodes do you have?
How many shards?

I have given about 16G of ram in 4 node cluster. I have 5 shards. My total document size is about 10G. So I am sure everything is in memory.

Thanks ,
Arjit

On Wed, Jun 19, 2013 at 8:35 AM, David Pilato david@pilato.fr wrote:

What are your memory settings?
How many nodes do you have?
How many shards?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 19 juin 2013 à 04:50, Arjit Gupta arjit292@gmail.com a écrit :

I have about 20 million documents with book titles. I want to find similar titles by applying edit distance on the tittles.
By the document of fuzzy-query does the same.
http://www.elasticsearch.org/guide/reference/query-dsl/fuzzy-query/
The problem now is search result are very slow. It takes about 3sec to search 5-6 letter title in Es with fuzzy-query.
Is there any other query type I can use which can make it faster and give the same/similar results ?

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/Q2vC2N1Gcrg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/Q2vC2N1Gcrg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.