Need help to apply Analyzers


(aps) #1

Hi,

We are indexing data which has various fields like Address, Phone Number,
Street Number etc. The problem which we are facing is Address field has
data space separated for .e.g "Oak Avenue", "Pretoria Avenue", "B Street",
"A Street" etc.

When we search for Pretoria Avenue it gives all set of documents which has
Pretoria as well as Avenue. But we need specific search to Pretoria Avenue
only.

Same problem with Phone Number and Street Number it has data with "-", "."
characters, and the document separate the data into 2 whenever it has "-" .

How and which analyzers we should use to get proper and specific results in
that case?

Any help would be appreciated.

Thanks in advanced.


(Shaun Farrell) #2

In your mapping you can put

"index" : "not_analyzed"

You can can also do this through your custom analyzer if you have one
of those too.

Not Analyzed will not tokenize the string at all. The only issue I
would see in your senerio is if i searched for "Oak Ave" that wouldn't
come back. You might need to setup some synonyms or create a custom
analyzer.

Shaun Farrell

On Thu, Jul 5, 2012 at 4:41 AM, aps pushkardkumar@gmail.com wrote:

Hi,

We are indexing data which has various fields like Address, Phone Number,
Street Number etc. The problem which we are facing is Address field has data
space separated for .e.g "Oak Avenue", "Pretoria Avenue", "B Street", "A
Street" etc.

When we search for Pretoria Avenue it gives all set of documents which has
Pretoria as well as Avenue. But we need specific search to Pretoria Avenue
only.

Same problem with Phone Number and Street Number it has data with "-", "."
characters, and the document separate the data into 2 whenever it has "-" .

How and which analyzers we should use to get proper and specific results in
that case?

Any help would be appreciated.

Thanks in advanced.


(aps) #3

Thanks for the prompt response. Use of synonyms could be difficult because
the data is huge and we cannot predict the data in advance.

We will look into custom analyzers for this, because we need documents to
be searchable if I write only "pretoria".

On Thursday, 5 July 2012 14:27:28 UTC+2, Shaun Farrell wrote:

In your mapping you can put

"index" : "not_analyzed"

You can can also do this through your custom analyzer if you have one
of those too.

Not Analyzed will not tokenize the string at all. The only issue I
would see in your senerio is if i searched for "Oak Ave" that wouldn't
come back. You might need to setup some synonyms or create a custom
analyzer.

Shaun Farrell

On Thu, Jul 5, 2012 at 4:41 AM, aps pushkardkumar@gmail.com wrote:

Hi,

We are indexing data which has various fields like Address, Phone
Number,
Street Number etc. The problem which we are facing is Address field has
data
space separated for .e.g "Oak Avenue", "Pretoria Avenue", "B Street", "A
Street" etc.

When we search for Pretoria Avenue it gives all set of documents which
has
Pretoria as well as Avenue. But we need specific search to Pretoria
Avenue
only.

Same problem with Phone Number and Street Number it has data with "-",
"."
characters, and the document separate the data into 2 whenever it has
"-" .

How and which analyzers we should use to get proper and specific results
in
that case?

Any help would be appreciated.

Thanks in advanced.


(Shaun Farrell) #4

I would also have a look at Multi-Fields… It stores data different ways. http://www.elasticsearch.org/guide/reference/mapping/multi-field-type.html

--
Shaun Farrell
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)

On Thursday, July 5, 2012 at 9:52 AM, aps wrote:

Thanks for the prompt response. Use of synonyms could be difficult because the data is huge and we cannot predict the data in advance.

We will look into custom analyzers for this, because we need documents to be searchable if I write only "pretoria".

On Thursday, 5 July 2012 14:27:28 UTC+2, Shaun Farrell wrote:

In your mapping you can put

"index" : "not_analyzed"

You can can also do this through your custom analyzer if you have one
of those too.

Not Analyzed will not tokenize the string at all. The only issue I
would see in your senerio is if i searched for "Oak Ave" that wouldn't
come back. You might need to setup some synonyms or create a custom
analyzer.

Shaun Farrell

On Thu, Jul 5, 2012 at 4:41 AM, aps <pushkardkumar@gmail.com (mailto:pushkardkumar@gmail.com)> wrote:

Hi,

We are indexing data which has various fields like Address, Phone Number,
Street Number etc. The problem which we are facing is Address field has data
space separated for .e.g "Oak Avenue", "Pretoria Avenue", "B Street", "A
Street" etc.

When we search for Pretoria Avenue it gives all set of documents which has
Pretoria as well as Avenue. But we need specific search to Pretoria Avenue
only.

Same problem with Phone Number and Street Number it has data with "-", "."
characters, and the document separate the data into 2 whenever it has "-" .

How and which analyzers we should use to get proper and specific results in
that case?

Any help would be appreciated.

Thanks in advanced.


(aps) #5

Hi Shaun,

The multi-field-type needs two searchable fields like in example one is
Name and second one is untouched.name. This will be not suitable in our
condition because we are searching based on one field.

Is there any other alternative which will satisfy our condition? I tried to
make custom analyzer, what are the filters will make it not_analyzed as
well as searchable when give parts of the field string.

Thanks in advance.

On Thursday, 5 July 2012 16:24:14 UTC+2, Shaun Farrell wrote:

I would also have a look at Multi-Fields… It stores data different ways.
http://www.elasticsearch.org/guide/reference/mapping/multi-field-type.html

--
Shaun Farrell
Sent with Sparrow http://www.sparrowmailapp.com/?sig

On Thursday, July 5, 2012 at 9:52 AM, aps wrote:

Thanks for the prompt response. Use of synonyms could be difficult because
the data is huge and we cannot predict the data in advance.

We will look into custom analyzers for this, because we need documents to
be searchable if I write only "pretoria".

On Thursday, 5 July 2012 14:27:28 UTC+2, Shaun Farrell wrote:

In your mapping you can put

"index" : "not_analyzed"

You can can also do this through your custom analyzer if you have one
of those too.

Not Analyzed will not tokenize the string at all. The only issue I
would see in your senerio is if i searched for "Oak Ave" that wouldn't
come back. You might need to setup some synonyms or create a custom
analyzer.

Shaun Farrell

On Thu, Jul 5, 2012 at 4:41 AM, aps pushkardkumar@gmail.com wrote:

Hi,

We are indexing data which has various fields like Address, Phone
Number,
Street Number etc. The problem which we are facing is Address field has
data
space separated for .e.g "Oak Avenue", "Pretoria Avenue", "B Street", "A
Street" etc.

When we search for Pretoria Avenue it gives all set of documents which
has
Pretoria as well as Avenue. But we need specific search to Pretoria
Avenue
only.

Same problem with Phone Number and Street Number it has data with "-",
"."
characters, and the document separate the data into 2 whenever it has
"-" .

How and which analyzers we should use to get proper and specific results
in
that case?

Any help would be appreciated.

Thanks in advanced.


(Igor Motov) #6

Assuming that you are using text querieshttp://www.elasticsearch.org/guide/reference/query-dsl/text-query.html,
you might want to take a look at changing default operator from OR to AND
or using text_phrase queries.

On Monday, July 9, 2012 5:50:04 AM UTC-4, aps wrote:

Hi Shaun,

The multi-field-type needs two searchable fields like in example one is
Name and second one is untouched.name. This will be not suitable in our
condition because we are searching based on one field.

Is there any other alternative which will satisfy our condition? I tried
to make custom analyzer, what are the filters will make it not_analyzed as
well as searchable when give parts of the field string.

Thanks in advance.

On Thursday, 5 July 2012 16:24:14 UTC+2, Shaun Farrell wrote:

I would also have a look at Multi-Fields… It stores data different ways.
http://www.elasticsearch.org/guide/reference/mapping/multi-field-type.html

--
Shaun Farrell
Sent with Sparrow http://www.sparrowmailapp.com/?sig

On Thursday, July 5, 2012 at 9:52 AM, aps wrote:

Thanks for the prompt response. Use of synonyms could be difficult
because the data is huge and we cannot predict the data in advance.

We will look into custom analyzers for this, because we need documents to
be searchable if I write only "pretoria".

On Thursday, 5 July 2012 14:27:28 UTC+2, Shaun Farrell wrote:

In your mapping you can put

"index" : "not_analyzed"

You can can also do this through your custom analyzer if you have one
of those too.

Not Analyzed will not tokenize the string at all. The only issue I
would see in your senerio is if i searched for "Oak Ave" that wouldn't
come back. You might need to setup some synonyms or create a custom
analyzer.

Shaun Farrell

On Thu, Jul 5, 2012 at 4:41 AM, aps pushkardkumar@gmail.com wrote:

Hi,

We are indexing data which has various fields like Address, Phone
Number,
Street Number etc. The problem which we are facing is Address field has
data
space separated for .e.g "Oak Avenue", "Pretoria Avenue", "B Street",
"A
Street" etc.

When we search for Pretoria Avenue it gives all set of documents which
has
Pretoria as well as Avenue. But we need specific search to Pretoria
Avenue
only.

Same problem with Phone Number and Street Number it has data with "-",
"."
characters, and the document separate the data into 2 whenever it has
"-" .

How and which analyzers we should use to get proper and specific
results in
that case?

Any help would be appreciated.

Thanks in advanced.


(system) #7