Search similar words in a big text


(Jorge von Rudno) #1

Hi everybody!!!

first of all I want to comment that at the moment I am a beginner in
elasticsearch. so, I have the following situation and I don't know if it is
possible to solve with elasticsearch.

I have a Index that contains for every document the content of a web page.
I want to develop a query where I give a text and it return the part of the
field that contains the text. Perhaps with one example I can explain me
better.

document 1:
field_Id: 1
content : "This is one example to search a text in a long string"

document 2:
field_id: 2
content : "The second example help us to texting the function to search"

If I send the word "tex" I will expect to have the return: "text",
"texting".

Please tell me if elasticsearch is as powerful that can solve this or if I
have to implement an algorithm to solve it.

In advance thanks a lot.

Regards

Jorge von Rudno

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/571daeee-5889-4a01-8ce8-3f3fc91eee92%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(vineeth mohan-2) #2

Hello Jorge ,

Here please expand your requirement specifications.
There are couple of things out there which will help you

  1. Stemmers - It will convert all words like "Run , running , ran" ->
    into its baseform . That is run. ES has support for snowball and porterstem
    stemmers -
    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-snowball-tokenfilter.html#analysis-snowball-tokenfilter
  2. Find similar words like spell check and all using fuzzy query -
    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-fuzzy-query.html#query-dsl-fuzzy-query
  3. Find words which will look similar when pronounced - Like cool and
    kool. Phonetic tokenizer -
    https://github.com/elasticsearch/elasticsearch-analysis-phonetic
  4. And the last one , the particular example you have quoted , i believe
    you need to use shingle or NGram (
    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-shingle-tokenfilter.html#analysis-shingle-tokenfilter
    ,
    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-ngram-tokenfilter.html
    )

Thanks
Vineeth

On Wed, Jul 9, 2014 at 12:40 PM, Jorge von Rudno <
jorge.vonrudno.gp@googlemail.com> wrote:

Hi everybody!!!

first of all I want to comment that at the moment I am a beginner in
elasticsearch. so, I have the following situation and I don't know if it is
possible to solve with elasticsearch.

I have a Index that contains for every document the content of a web page.
I want to develop a query where I give a text and it return the part of the
field that contains the text. Perhaps with one example I can explain me
better.

document 1:
field_Id: 1
content : "This is one example to search a text in a long string"

document 2:
field_id: 2
content : "The second example help us to texting the function to search"

If I send the word "tex" I will expect to have the return: "text",
"texting".

Please tell me if elasticsearch is as powerful that can solve this or if I
have to implement an algorithm to solve it.

In advance thanks a lot.

Regards

Jorge von Rudno

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/571daeee-5889-4a01-8ce8-3f3fc91eee92%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/571daeee-5889-4a01-8ce8-3f3fc91eee92%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kEDV9bJbEqcjaQP-UDNF0VYPQT68YH%2BkPzG7V0ox6Lcw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Jorge von Rudno) #3

On 07/09/2014 09:26 AM, vineeth mohan wrote:

Hello Jorge ,

Here please expand your requirement specifications.
There are couple of things out there which will help you

  1. Stemmers - It will convert all words like "Run , running , ran" ->
    into its baseform . That is run. ES has support for snowball and
    porterstem stemmers
  2. Find similar words like spell check and all using fuzzy query
  3. Find words which will look similar when pronounced - Like cool and
    kool. Phonetic tokenizer
  4. And the last one , the particular example you have quoted , i
    believe you need to use shingle or NGram
    (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-shingle-tokenfilter.html#analysis-shingle-tokenfilter
    , http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-ngram-tokenfilter.html)

Thanks
Vineeth

On Wed, Jul 9, 2014 at 12:40 PM, Jorge von Rudno
<jorge.vonrudno.gp@googlemail.com
mailto:jorge.vonrudno.gp@googlemail.com> wrote:

Hi everybody!!!

first of all I want to comment that at the moment I am a beginner
in elasticsearch. so, I have the following situation and I don't
know if it is possible to solve with elasticsearch.

I have a Index that contains for every document the content of a
web page. I want to develop a query where I give a text and it
return the part of the field that contains the text. Perhaps with
one example I can explain me better.

document 1:
field_Id: 1
content : "This is one example to search a _*text*_ in a long string"

document 2:
field_id: 2
content : "The second example help us to _*texting*_ the function
to search"

If I send the word "tex" I will expect to have the return: "text",
"texting".

Please tell me if elasticsearch is as powerful that can solve this
or if I have to implement an algorithm to solve it.

In advance thanks a lot.

Regards

Jorge von Rudno

-- 
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com
<mailto:elasticsearch+unsubscribe@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/571daeee-5889-4a01-8ce8-3f3fc91eee92%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/571daeee-5889-4a01-8ce8-3f3fc91eee92%40googlegroups.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/jraNffR8iAA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com
mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kEDV9bJbEqcjaQP-UDNF0VYPQT68YH%2BkPzG7V0ox6Lcw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kEDV9bJbEqcjaQP-UDNF0VYPQT68YH%2BkPzG7V0ox6Lcw%40mail.gmail.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

hello Vieeeth, thanks a lot for your great support, I will work with
your suggestions and I will tell you about the result.

Best regards

Jorge von Rudno

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53BCFEFE.2000708%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4