I'm new to ES, and struggling with something simple?


(Edward Fjellskål) #1

Hi list,

Im new to ES but I have googled and played with this for a while now and
tried out different stuff...
without getting the results Im looking for....

Im trying to store domain names, and be able to search them as I would in
MySQL myisam indexed field.

So putting this in to ES using ruby+tire:

create :mappings => {
:logline => {
:properties => {
:id => { :type => 'string', :index => 'not_analyzed', :store =>
true },
:query => { :type => 'string', :analyzer => 'whitespace'}
}
}
}

So, executing:

curl 'localhost:9200/_analyze?pretty=1&analyzer=whitespace' -d '

this.is.a.very.very.very.long.domain.name.com'
{
"tokens" : [ {
"token" : "this.is.a.very.very.very.long.domain.name.com",
"start_offset" : 0,
"end_offset" : 45,
"type" : "word",
"position" : 1
} ]

If I understand this correct, this should index the entire domain name, as
one big token ?

Searching for "this.is.a.very.very.very.long.domain.name.com." or even "*
this.is.a.very.very.very.long.domain.name.com*"
gives no results :frowning:

Searching for parts of the domain name does hit on the domain name.

How can I make ES behave more like I want it to, like the full text search
in MySQL, if that
is possible at all...?

If I search for "this.is.a.very.very.very.long.domain.name.com", I would
like to hit documents that just has that
exact string for etc. if I search for "very.long", its should also to
give me the documents there that string is a part of the
a domain name.

I have tried out different analyzers (standard, whitespace, customs,
others), but it does not seem to work
the way I was hoping :frowning:

Searching for smaller domains, like "www.google.com" works.

Is there a max_token_length that is playing me a trick maybe? Should that
not show in my:

curl 'localhost:9200/_analyze?pretty=1&analyzer=whitespace' -d '

this.is.a.very.very.very.long.domain.name.com'

?

Any help on the subject would be helpfull.
I can verify with "/_mapping" query that my field is using the right
analyzer etc.

Regards,
Edward

--


(David Pilato) #2

What kind of search are you doing? QueryString? Term? Match?

A full gist with curl recreation can be useful. See: http://www.elasticsearch.org/help/

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 17 oct. 2012 à 22:05, Edward Fjellskål edwardfjellskaal@gmail.com a écrit :

Hi list,

Im new to ES but I have googled and played with a for a while now and tried out different stuff...
without getting the results Im looking for....

Im trying to store domain names, and be able to search them as I would in MySQL myisam indexed field.

So putting this in to ES using ruby+tire:

create :mappings => {
:logline => {
:properties => {
:id => { :type => 'string', :index => 'not_analyzed', :store => true },
:query => { :type => 'string', :analyzer => 'whitespace'}
}
}
}

So, executing:

curl 'localhost:9200/_analyze?pretty=1&analyzer=whitespace' -d 'this.is.a.very.very.very.long.domain.name.com'

{
"tokens" : [ {
"token" : "this.is.a.very.very.very.long.domain.name.com",
"start_offset" : 0,
"end_offset" : 45,
"type" : "word",
"position" : 1
} ]

If I understand this correct, this should index the entire domain name, as one big token ?

Searching for "this.is.a.very.very.very.long.domain.name.com." or even "this.is.a.very.very.very.long.domain.name.com"
gives no results :frowning:

Searching for parts of the domain name does hit on the domain name.

How can I make ES behave more like I want it to, like the full text search in MySQL, if that
is possible at all...?

If I search for "this.is.a.very.very.very.long.domain.name.com", I would like to hit documents that just has that
exact string for etc. if I search for "very.long", its should also to give me the documents there that string is a part of the
a domain name.

I have tried out different analyzers (standard, whitespace, customs, others), but it does not seem to work
the way I was hoping :frowning:

Searching for smaller domains, like "www.google.com" works.

Is there a max_token_length that is playing me a trick maybe? Should that not show in my:

curl 'localhost:9200/_analyze?pretty=1&analyzer=whitespace' -d 'this.is.a.very.very.very.long.domain.name.com'

?

Any help on the subject would be helpfull.
I can verify with "/_mapping" query that my field is using the right analyzer etc.

Regards,
Edward

--


(Edward Fjellskål) #3

Well, if it was that simple :slight_smile:

I made a small test case (5 loglines), then it works fine, but when I
import 1000 loglines, it fails :frowning:

my curl is like:

curl 'http://localhost:9200/test/_search?q=query:*

this.is.a.very.very.very.long.domain.name.com*&pretty=true'

To demonstrate it better, I wrapped up a small script to import my log, and
along with it a log with 1K entries.
$ wget http://networktotal.com/tmp/es-fail.tgz

In my test case, i use google.com domains, and the case I use is to search
for:
"p2.jlnq5gukp245a.2rz6qtl7mhjfqwck.if.v4.ipv6-exp.l.google.com."

It seems like ES whitespace is struggling here with the "-" char.

Please try it out, any feedback would be super nice :slight_smile:

Regards,
Edward

PS: Im new to ruby too, so the code might not be that 1337 :slight_smile:

On Wed, Oct 17, 2012 at 10:21 PM, David Pilato david@pilato.fr wrote:

What kind of search are you doing? QueryString? Term? Match?

A full gist with curl recreation can be useful. See:
http://www.elasticsearch.org/help/

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 17 oct. 2012 à 22:05, Edward Fjellskål edwardfjellskaal@gmail.com a
écrit :

Hi list,

Im new to ES but I have googled and played with a for a while now and
tried out different stuff...
without getting the results Im looking for....

Im trying to store domain names, and be able to search them as I would in
MySQL myisam indexed field.

So putting this in to ES using ruby+tire:

create :mappings => {
:logline => {
:properties => {
:id => { :type => 'string', :index => 'not_analyzed', :store
=> true },
:query => { :type => 'string', :analyzer => 'whitespace'}
}
}
}

So, executing:

curl 'localhost:9200/_analyze?pretty=1&analyzer=whitespace' -d '

this.is.a.very.very.very.long.domain.name.com'
{
"tokens" : [ {
"token" : "this.is.a.very.very.very.long.domain.name.com",
"start_offset" : 0,
"end_offset" : 45,
"type" : "word",
"position" : 1
} ]

If I understand this correct, this should index the entire domain name, as
one big token ?

Searching for "this.is.a.very.very.very.long.domain.name.com." or even "*
this.is.a.very.very.very.long.domain.name.com*"
gives no results :frowning:

Searching for parts of the domain name does hit on the domain name.

How can I make ES behave more like I want it to, like the full text search
in MySQL, if that
is possible at all...?

If I search for "this.is.a.very.very.very.long.domain.name.com", I would
like to hit documents that just has that
exact string for etc. if I search for "very.long", its should also to
give me the documents there that string is a part of the
a domain name.

I have tried out different analyzers (standard, whitespace, customs,
others), but it does not seem to work
the way I was hoping :frowning:

Searching for smaller domains, like "www.google.com" works.

Is there a max_token_length that is playing me a trick maybe? Should that
not show in my:

curl 'localhost:9200/_analyze?pretty=1&analyzer=whitespace' -d '

this.is.a.very.very.very.long.domain.name.com'

?

Any help on the subject would be helpfull.
I can verify with "/_mapping" query that my field is using the right
analyzer etc.

Regards,
Edward

--

--

--
Edward Bjarte Fjellskål
Senior Security Analyst
http://www.gamelinux.org/

--


(David Pilato) #4

I think that your problem is that you are doing a queryString search and
search is applied on _all field. See
http://www.elasticsearch.org/guide/reference/query-dsl/query-string-query.ht
ml

_all has its own analyzer. When the search is performed, your query is
analyzed with the same analyzer you use for the field you are searching in:
default here.

That’s, IMHO, why it does not work.

You can try with a “more complex” query such as:

{

"query_string" : {

    "query" :

"p2.jfns4i7euxjsi.vj4i455oywujlfz2.if.v4.ipv6-exp.l.google.com.",

    "analyzer" : "whitespace"

}

}

Or

{

"match" : {

    "query" :

"p2.jfns4i7euxjsi.vj4i455oywujlfz2.if.v4.ipv6-exp.l.google.com."

}

}

Note that query is here your field name.

HTH

David

De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com]
De la part de Edward Fjellskål
Envoyé : jeudi 18 octobre 2012 12:35
À : elasticsearch@googlegroups.com
Objet : Re: I'm new to ES, and struggling with something simple?...

Well, if it was that simple :slight_smile:

I made a small test case (5 loglines), then it works fine, but when I import
1000 loglines, it fails :frowning:

my curl is like:

curl

'http://localhost:9200/test/_search?q=query:this.is.a.very.very.very.long.d
omain.name.com
&pretty=true'

To demonstrate it better, I wrapped up a small script to import my log, and
along with it a log with 1K entries.
$ wget http://networktotal.com/tmp/es-fail.tgz

In my test case, i use google.com domains, and the case I use is to search
for:
"p2.jlnq5gukp245a.2rz6qtl7mhjfqwck.if.v4.ipv6-exp.l.google.com."

It seems like ES whitespace is struggling here with the "-" char.

Please try it out, any feedback would be super nice :slight_smile:

Regards,
Edward

PS: Im new to ruby too, so the code might not be that 1337 :slight_smile:

On Wed, Oct 17, 2012 at 10:21 PM, David Pilato david@pilato.fr wrote:

What kind of search are you doing? QueryString? Term? Match?

A full gist with curl recreation can be useful. See:
http://www.elasticsearch.org/help/

--

David :wink:

Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 17 oct. 2012 à 22:05, Edward Fjellskål edwardfjellskaal@gmail.com a
écrit :

Hi list,

Im new to ES but I have googled and played with a for a while now and tried
out different stuff...

without getting the results Im looking for....

Im trying to store domain names, and be able to search them as I would in
MySQL myisam indexed field.

So putting this in to ES using ruby+tire:

create :mappings => {

:logline => {

  :properties => {

    :id     => { :type => 'string', :index => 'not_analyzed', :store =>

true },
:query => { :type => 'string', :analyzer => 'whitespace'}

  }

}

}

So, executing:

curl 'localhost:9200/_analyze?pretty=1&analyzer=whitespace' -d

'this.is.a.very.very.very.long.domain.name.com'
{
"tokens" : [ {
"token" : "this.is.a.very.very.very.long.domain.name.com",
"start_offset" : 0,
"end_offset" : 45,
"type" : "word",
"position" : 1
} ]

If I understand this correct, this should index the entire domain name, as
one big token ?

Searching for "this.is.a.very.very.very.long.domain.name.com." or even
"this.is.a.very.very.very.long.domain.name.com"
gives no results :frowning:

Searching for parts of the domain name does hit on the domain name.

How can I make ES behave more like I want it to, like the full text search
in MySQL, if that

is possible at all...?

If I search for "this.is.a.very.very.very.long.domain.name.com", I would
like to hit documents that just has that

exact string for etc. if I search for "very.long", its should also to give
me the documents there that string is a part of the
a domain name.

I have tried out different analyzers (standard, whitespace, customs,
others), but it does not seem to work
the way I was hoping :frowning:

Searching for smaller domains, like "www.google.com" works.

Is there a max_token_length that is playing me a trick maybe? Should that
not show in my:

curl 'localhost:9200/_analyze?pretty=1&analyzer=whitespace' -d

'this.is.a.very.very.very.long.domain.name.com'

?

Any help on the subject would be helpfull.
I can verify with "/_mapping" query that my field is using the right
analyzer etc.

Regards,
Edward

--

--

--
Edward Bjarte Fjellskål
Senior Security Analyst
http://www.gamelinux.org/

--

--


(David Pilato) #5

Or, you can also change default analyzer for _all field and use a whitespace
analyzer.

David.

De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com]
De la part de David Pilato
Envoyé : jeudi 18 octobre 2012 13:37
À : elasticsearch@googlegroups.com
Objet : RE: I'm new to ES, and struggling with something simple?...

I think that your problem is that you are doing a queryString search and
search is applied on _all field. See
http://www.elasticsearch.org/guide/reference/query-dsl/query-string-query.ht
ml

_all has its own analyzer. When the search is performed, your query is
analyzed with the same analyzer you use for the field you are searching in:
default here.

That’s, IMHO, why it does not work.

You can try with a “more complex” query such as:

{

"query_string" : {

    "query" :

"p2.jfns4i7euxjsi.vj4i455oywujlfz2.if.v4.ipv6-exp.l.google.com.",

    "analyzer" : "whitespace"

}

}

Or

{

"match" : {

    "query" :

"p2.jfns4i7euxjsi.vj4i455oywujlfz2.if.v4.ipv6-exp.l.google.com."

}

}

Note that query is here your field name.

HTH

David

De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com]
De la part de Edward Fjellskål
Envoyé : jeudi 18 octobre 2012 12:35
À : elasticsearch@googlegroups.com
Objet : Re: I'm new to ES, and struggling with something simple?...

Well, if it was that simple :slight_smile:

I made a small test case (5 loglines), then it works fine, but when I import
1000 loglines, it fails :frowning:

my curl is like:

curl

'http://localhost:9200/test/_search?q=query:this.is.a.very.very.very.long.d
omain.name.com
&pretty=true'

To demonstrate it better, I wrapped up a small script to import my log, and
along with it a log with 1K entries.
$ wget http://networktotal.com/tmp/es-fail.tgz

In my test case, i use google.com domains, and the case I use is to search
for:
"p2.jlnq5gukp245a.2rz6qtl7mhjfqwck.if.v4.ipv6-exp.l.google.com."

It seems like ES whitespace is struggling here with the "-" char.

Please try it out, any feedback would be super nice :slight_smile:

Regards,
Edward

PS: Im new to ruby too, so the code might not be that 1337 :slight_smile:

On Wed, Oct 17, 2012 at 10:21 PM, David Pilato david@pilato.fr wrote:

What kind of search are you doing? QueryString? Term? Match?

A full gist with curl recreation can be useful. See:
http://www.elasticsearch.org/help/

--

David :wink:

Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 17 oct. 2012 à 22:05, Edward Fjellskål edwardfjellskaal@gmail.com a
écrit :

Hi list,

Im new to ES but I have googled and played with a for a while now and tried
out different stuff...

without getting the results Im looking for....

Im trying to store domain names, and be able to search them as I would in
MySQL myisam indexed field.

So putting this in to ES using ruby+tire:

create :mappings => {

:logline => {

  :properties => {

    :id     => { :type => 'string', :index => 'not_analyzed', :store =>

true },
:query => { :type => 'string', :analyzer => 'whitespace'}

  }

}

}

So, executing:

curl 'localhost:9200/_analyze?pretty=1&analyzer=whitespace' -d

'this.is.a.very.very.very.long.domain.name.com'
{
"tokens" : [ {
"token" : "this.is.a.very.very.very.long.domain.name.com",
"start_offset" : 0,
"end_offset" : 45,
"type" : "word",
"position" : 1
} ]

If I understand this correct, this should index the entire domain name, as
one big token ?

Searching for "this.is.a.very.very.very.long.domain.name.com." or even
"this.is.a.very.very.very.long.domain.name.com"
gives no results :frowning:

Searching for parts of the domain name does hit on the domain name.

How can I make ES behave more like I want it to, like the full text search
in MySQL, if that

is possible at all...?

If I search for "this.is.a.very.very.very.long.domain.name.com", I would
like to hit documents that just has that

exact string for etc. if I search for "very.long", its should also to give
me the documents there that string is a part of the
a domain name.

I have tried out different analyzers (standard, whitespace, customs,
others), but it does not seem to work
the way I was hoping :frowning:

Searching for smaller domains, like "www.google.com" works.

Is there a max_token_length that is playing me a trick maybe? Should that
not show in my:

curl 'localhost:9200/_analyze?pretty=1&analyzer=whitespace' -d

'this.is.a.very.very.very.long.domain.name.com'

?

Any help on the subject would be helpfull.
I can verify with "/_mapping" query that my field is using the right
analyzer etc.

Regards,
Edward

--

--

--
Edward Bjarte Fjellskål
Senior Security Analyst
http://www.gamelinux.org/

--

--

--


(Edward Fjellskål) #6

David,

Your reply is much appreciated , I learned something new :slight_smile:

curl -XGET http://localhost:9200/pdns/_search?pretty=1 -d '
{
"query" : {
"query_string": {
"query": "
p2.jfns4i7euxjsi.vj4i455oywujlfz2.if.v4.ipv6-exp.l.google.com.",
"analyzer" : "whitespace"
}
}
}'

And:
curl -XGET http://localhost:9200/pdns/_search?pretty=1 -d '
{
"query" : {
"query_string": {
"query": "
p2.jfns4i7euxjsi.vj4i455oywujlfz2.if.v4.ipv6-exp.l.google.com.",
"fields": [ "query" ]
}
}
}'

Dont seem to work.
Changing the default analyzer do, so thats my workaround for now.

I could not seem how to implement your examples in Tire either :confused:
According to karmi on IRC, match is not implemented in tire yet.

And I could not find out how to set the analyzer for a search either.

Ill keep on looking.

E

On Thu, Oct 18, 2012 at 1:37 PM, David Pilato david@pilato.fr wrote:

I think that your problem is that you are doing a queryString search and
search is applied on _all field. See
http://www.elasticsearch.org/guide/reference/query-dsl/query-string-query.html



_all has its own analyzer. When the search is performed, your query is
analyzed with the same analyzer you use for the field you are searching in:
default here.****

That’s, IMHO, why it does not work.****


You can try with a “more complex” query such as:****


{****

"query_string" : {****

    "query" : "

p2.jfns4i7euxjsi.vj4i455oywujlfz2.if.v4.ipv6-exp.l.google.com.", ****

    "analyzer" : "whitespace"****

}****

}****


Or****


{****

"match" : {****

    "query" : "

p2.jfns4i7euxjsi.vj4i455oywujlfz2.if.v4.ipv6-exp.l.google.com."****

}****

}****


Note that query is here your field name.****


HTH****

David****


De : elasticsearch@googlegroups.com [mailto:
elasticsearch@googlegroups.com] De la part de Edward Fjellskål
Envoyé : jeudi 18 octobre 2012 12:35
À : elasticsearch@googlegroups.com
Objet : Re: I'm new to ES, and struggling with something simple?...****


Well, if it was that simple :slight_smile:

I made a small test case (5 loglines), then it works fine, but when I
import 1000 loglines, it fails :frowning:

my curl is like:

curl 'http://localhost:9200/test/_search?q=query:*

this.is.a.very.very.very.long.domain.name.com*&pretty=true'

To demonstrate it better, I wrapped up a small script to import my log,
and along with it a log with 1K entries.
$ wget http://networktotal.com/tmp/es-fail.tgz

In my test case, i use google.com domains, and the case I use is to
search for:
"p2.jlnq5gukp245a.2rz6qtl7mhjfqwck.if.v4.ipv6-exp.l.google.com."

It seems like ES whitespace is struggling here with the "-" char.

Please try it out, any feedback would be super nice :slight_smile:

Regards,
Edward

PS: Im new to ruby too, so the code might not be that 1337 :slight_smile:


On Wed, Oct 17, 2012 at 10:21 PM, David Pilato david@pilato.fr wrote:***
*

What kind of search are you doing? QueryString? Term? Match?****



A full gist with curl recreation can be useful. See:
http://www.elasticsearch.org/help/****

--****

David ;-)****

Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs****


Le 17 oct. 2012 à 22:05, Edward Fjellskål edwardfjellskaal@gmail.com a
écrit :****

Hi list,****


Im new to ES but I have googled and played with a for a while now and
tried out different stuff...****

without getting the results Im looking for....****

Im trying to store domain names, and be able to search them as I would in
MySQL myisam indexed field.****

So putting this in to ES using ruby+tire:****


create :mappings => {****

:logline => {****

  :properties => {****

    :id     => { :type => 'string', :index => 'not_analyzed', :store

=> true },
:query => { :type => 'string', :analyzer => 'whitespace'}****

  }****

}****

}****


So, executing:****


curl 'localhost:9200/_analyze?pretty=1&analyzer=whitespace' -d '

this.is.a.very.very.very.long.domain.name.com'
{
"tokens" : [ {
"token" : "this.is.a.very.very.very.long.domain.name.com",
"start_offset" : 0,
"end_offset" : 45,
"type" : "word",
"position" : 1
} ]

If I understand this correct, this should index the entire domain name, as
one big token ?****


Searching for "this.is.a.very.very.very.long.domain.name.com." or even "*
this.is.a.very.very.very.long.domain.name.com*"
gives no results :frowning:

Searching for parts of the domain name does hit on the domain name.****

How can I make ES behave more like I want it to, like the full text search
in MySQL, if that****

is possible at all...?****


If I search for "this.is.a.very.very.very.long.domain.name.com", I would
like to hit documents that just has that****

exact string for etc. if I search for "very.long", its should also to
give me the documents there that string is a part of the
a domain name.****


I have tried out different analyzers (standard, whitespace, customs,
others), but it does not seem to work
the way I was hoping :(****

Searching for smaller domains, like "www.google.com" works.

Is there a max_token_length that is playing me a trick maybe? Should that
not show in my:

curl 'localhost:9200/_analyze?pretty=1&analyzer=whitespace' -d '

this.is.a.very.very.very.long.domain.name.com'

?

Any help on the subject would be helpfull.
I can verify with "/_mapping" query that my field is using the right
analyzer etc.

Regards,
Edward****

--


--


--
Edward Bjarte Fjellskål
Senior Security Analyst
http://www.gamelinux.org/****

--


--

--
Edward Bjarte Fjellskål
Senior Security Analyst
http://www.gamelinux.org/

--


(David Pilato) #7

Ok. Just a note. MatchQuery is the new name for TextQuery which is
deprecated now.

I think (but Karmi can confirm or not) that Text exists in TIRE, isn’t it?

De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com]
De la part de Edward Fjellskål
Envoyé : jeudi 18 octobre 2012 19:04
À : elasticsearch@googlegroups.com
Objet : Re: I'm new to ES, and struggling with something simple?...

David,

Your reply is much appreciated , I learned something new :slight_smile:

curl -XGET http://localhost:9200/pdns/_search?pretty=1 -d '
{
"query" : {
"query_string": {
"query":
"p2.jfns4i7euxjsi.vj4i455oywujlfz2.if.v4.ipv6-exp.l.google.com.",
"analyzer" : "whitespace"
}
}
}'

And:
curl -XGET http://localhost:9200/pdns/_search?pretty=1 -d '
{
"query" : {
"query_string": {
"query":
"p2.jfns4i7euxjsi.vj4i455oywujlfz2.if.v4.ipv6-exp.l.google.com.",
"fields": [ "query" ]
}
}
}'

Dont seem to work.
Changing the default analyzer do, so thats my workaround for now.

I could not seem how to implement your examples in Tire either :confused:
According to karmi on IRC, match is not implemented in tire yet.

And I could not find out how to set the analyzer for a search either.

Ill keep on looking.

E

On Thu, Oct 18, 2012 at 1:37 PM, David Pilato david@pilato.fr wrote:

I think that your problem is that you are doing a queryString search and
search is applied on _all field. See
http://www.elasticsearch.org/guide/reference/query-dsl/query-string-query.ht
ml

_all has its own analyzer. When the search is performed, your query is
analyzed with the same analyzer you use for the field you are searching in:
default here.

That’s, IMHO, why it does not work.

You can try with a “more complex” query such as:

{

"query_string" : {

    "query" :

"p2.jfns4i7euxjsi.vj4i455oywujlfz2.if.v4.ipv6-exp.l.google.com.",

    "analyzer" : "whitespace"

}

}

Or

{

"match" : {

    "query" :

"p2.jfns4i7euxjsi.vj4i455oywujlfz2.if.v4.ipv6-exp.l.google.com."

}

}

Note that query is here your field name.

HTH

David

De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com]
De la part de Edward Fjellskål
Envoyé : jeudi 18 octobre 2012 12:35
À : elasticsearch@googlegroups.com
Objet : Re: I'm new to ES, and struggling with something simple?...

Well, if it was that simple :slight_smile:

I made a small test case (5 loglines), then it works fine, but when I import
1000 loglines, it fails :frowning:

my curl is like:

curl

'http://localhost:9200/test/_search?q=query:this.is.a.very.very.very.long.d
omain.name.com
&pretty=true'

To demonstrate it better, I wrapped up a small script to import my log, and
along with it a log with 1K entries.
$ wget http://networktotal.com/tmp/es-fail.tgz

In my test case, i use google.com domains, and the case I use is to search
for:
"p2.jlnq5gukp245a.2rz6qtl7mhjfqwck.if.v4.ipv6-exp.l.google.com."

It seems like ES whitespace is struggling here with the "-" char.

Please try it out, any feedback would be super nice :slight_smile:

Regards,
Edward

PS: Im new to ruby too, so the code might not be that 1337 :slight_smile:

On Wed, Oct 17, 2012 at 10:21 PM, David Pilato david@pilato.fr wrote:

What kind of search are you doing? QueryString? Term? Match?

A full gist with curl recreation can be useful. See:
http://www.elasticsearch.org/help/

--

David :wink:

Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 17 oct. 2012 à 22:05, Edward Fjellskål edwardfjellskaal@gmail.com a
écrit :

Hi list,

Im new to ES but I have googled and played with a for a while now and tried
out different stuff...

without getting the results Im looking for....

Im trying to store domain names, and be able to search them as I would in
MySQL myisam indexed field.

So putting this in to ES using ruby+tire:

create :mappings => {

:logline => {

  :properties => {

    :id     => { :type => 'string', :index => 'not_analyzed', :store =>

true },
:query => { :type => 'string', :analyzer => 'whitespace'}

  }

}

}

So, executing:

curl 'localhost:9200/_analyze?pretty=1&analyzer=whitespace' -d

'this.is.a.very.very.very.long.domain.name.com'
{
"tokens" : [ {
"token" : "this.is.a.very.very.very.long.domain.name.com",
"start_offset" : 0,
"end_offset" : 45,
"type" : "word",
"position" : 1
} ]

If I understand this correct, this should index the entire domain name, as
one big token ?

Searching for "this.is.a.very.very.very.long.domain.name.com." or even
"this.is.a.very.very.very.long.domain.name.com"
gives no results :frowning:

Searching for parts of the domain name does hit on the domain name.

How can I make ES behave more like I want it to, like the full text search
in MySQL, if that

is possible at all...?

If I search for "this.is.a.very.very.very.long.domain.name.com", I would
like to hit documents that just has that

exact string for etc. if I search for "very.long", its should also to give
me the documents there that string is a part of the
a domain name.

I have tried out different analyzers (standard, whitespace, customs,
others), but it does not seem to work
the way I was hoping :frowning:

Searching for smaller domains, like "www.google.com" works.

Is there a max_token_length that is playing me a trick maybe? Should that
not show in my:

curl 'localhost:9200/_analyze?pretty=1&analyzer=whitespace' -d

'this.is.a.very.very.very.long.domain.name.com'

?

Any help on the subject would be helpfull.
I can verify with "/_mapping" query that my field is using the right
analyzer etc.

Regards,
Edward

--

--

--
Edward Bjarte Fjellskål
Senior Security Analyst
http://www.gamelinux.org/

--

--

--
Edward Bjarte Fjellskål
Senior Security Analyst
http://www.gamelinux.org/

--

--


(Edward Fjellskål) #8

He confirmed that Text exists on IRC earlier today.

Thanks

On Thu, Oct 18, 2012 at 7:36 PM, David Pilato david@pilato.fr wrote:

Ok. Just a note. MatchQuery is the new name for TextQuery which is
deprecated now.****

I think (but Karmi can confirm or not) that Text exists in TIRE, isn’t it?





De : elasticsearch@googlegroups.com [mailto:
elasticsearch@googlegroups.com] De la part de Edward Fjellskål
Envoyé : jeudi 18 octobre 2012 19:04
À : elasticsearch@googlegroups.com
Objet : Re: I'm new to ES, and struggling with something simple?...****


David,

Your reply is much appreciated , I learned something new :slight_smile:

curl -XGET http://localhost:9200/pdns/_search?pretty=1 -d '
{
"query" : {
"query_string": {
"query": "
p2.jfns4i7euxjsi.vj4i455oywujlfz2.if.v4.ipv6-exp.l.google.com.",
"analyzer" : "whitespace"
}
}
}'

And:
curl -XGET http://localhost:9200/pdns/_search?pretty=1 -d '
{
"query" : {
"query_string": {
"query": "
p2.jfns4i7euxjsi.vj4i455oywujlfz2.if.v4.ipv6-exp.l.google.com.",
"fields": [ "query" ]
}
}
}'

Dont seem to work.
Changing the default analyzer do, so thats my workaround for now.

I could not seem how to implement your examples in Tire either :confused:
According to karmi on IRC, match is not implemented in tire yet.

And I could not find out how to set the analyzer for a search either.

Ill keep on looking.

E


On Thu, Oct 18, 2012 at 1:37 PM, David Pilato david@pilato.fr wrote:****

I think that your problem is that you are doing a queryString search and
search is applied on _all field. See
http://www.elasticsearch.org/guide/reference/query-dsl/query-string-query.html



_all has its own analyzer. When the search is performed, your query is
analyzed with the same analyzer you use for the field you are searching in:
default here.****

That’s, IMHO, why it does not work.****


You can try with a “more complex” query such as:****


{****

"query_string" : {****

    "query" : "

p2.jfns4i7euxjsi.vj4i455oywujlfz2.if.v4.ipv6-exp.l.google.com.", ****

    "analyzer" : "whitespace"****

}****

}****


Or****


{****

"match" : {****

    "query" : "

p2.jfns4i7euxjsi.vj4i455oywujlfz2.if.v4.ipv6-exp.l.google.com."****

}****

}****


Note that query is here your field name.****


HTH****

David****


De : elasticsearch@googlegroups.com [mailto:
elasticsearch@googlegroups.com] De la part de Edward Fjellskål
Envoyé : jeudi 18 octobre 2012 12:35
À : elasticsearch@googlegroups.com
Objet : Re: I'm new to ES, and struggling with something simple?...****


Well, if it was that simple :slight_smile:

I made a small test case (5 loglines), then it works fine, but when I
import 1000 loglines, it fails :frowning:

my curl is like:

curl 'http://localhost:9200/test/_search?q=query:*

this.is.a.very.very.very.long.domain.name.com*&pretty=true'

To demonstrate it better, I wrapped up a small script to import my log,
and along with it a log with 1K entries.
$ wget http://networktotal.com/tmp/es-fail.tgz

In my test case, i use google.com domains, and the case I use is to
search for:
"p2.jlnq5gukp245a.2rz6qtl7mhjfqwck.if.v4.ipv6-exp.l.google.com."

It seems like ES whitespace is struggling here with the "-" char.

Please try it out, any feedback would be super nice :slight_smile:

Regards,
Edward

PS: Im new to ruby too, so the code might not be that 1337 :)****

On Wed, Oct 17, 2012 at 10:21 PM, David Pilato david@pilato.fr wrote:***
*

What kind of search are you doing? QueryString? Term? Match?****



A full gist with curl recreation can be useful. See:
http://www.elasticsearch.org/help/****

--****

David ;-)****

Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs****


Le 17 oct. 2012 à 22:05, Edward Fjellskål edwardfjellskaal@gmail.com a
écrit :****

Hi list,****


Im new to ES but I have googled and played with a for a while now and
tried out different stuff...****

without getting the results Im looking for....****

Im trying to store domain names, and be able to search them as I would in
MySQL myisam indexed field.****

So putting this in to ES using ruby+tire:****


create :mappings => {****

:logline => {****

  :properties => {****

    :id     => { :type => 'string', :index => 'not_analyzed', :store

=> true },
:query => { :type => 'string', :analyzer => 'whitespace'}****

  }****

}****

}****


So, executing:****


curl 'localhost:9200/_analyze?pretty=1&analyzer=whitespace' -d '

this.is.a.very.very.very.long.domain.name.com'
{
"tokens" : [ {
"token" : "this.is.a.very.very.very.long.domain.name.com",
"start_offset" : 0,
"end_offset" : 45,
"type" : "word",
"position" : 1
} ]

If I understand this correct, this should index the entire domain name, as
one big token ?****


Searching for "this.is.a.very.very.very.long.domain.name.com." or even "*
this.is.a.very.very.very.long.domain.name.com*"
gives no results :frowning:

Searching for parts of the domain name does hit on the domain name.****

How can I make ES behave more like I want it to, like the full text search
in MySQL, if that****

is possible at all...?****


If I search for "this.is.a.very.very.very.long.domain.name.com", I would
like to hit documents that just has that****

exact string for etc. if I search for "very.long", its should also to
give me the documents there that string is a part of the
a domain name.****


I have tried out different analyzers (standard, whitespace, customs,
others), but it does not seem to work
the way I was hoping :(****

Searching for smaller domains, like "www.google.com" works.

Is there a max_token_length that is playing me a trick maybe? Should that
not show in my:

curl 'localhost:9200/_analyze?pretty=1&analyzer=whitespace' -d '

this.is.a.very.very.very.long.domain.name.com'

?

Any help on the subject would be helpfull.
I can verify with "/_mapping" query that my field is using the right
analyzer etc.

Regards,
Edward****

--


--


--
Edward Bjarte Fjellskål
Senior Security Analyst
http://www.gamelinux.org/****

--


--


--
Edward Bjarte Fjellskål
Senior Security Analyst
http://www.gamelinux.org/****

--


--

--
Edward Bjarte Fjellskål
Senior Security Analyst
http://www.gamelinux.org/

--


(system) #9