Huge recovery tine at start-up for just 1.5 Lack / 3 GB documents

Hi ,

I have a index which have just 1 1.5 Lack / 3 GB documents documents and
its damn slow.
Indexing single feed takes like 5 minutes , and start up takes like hell
lot of time ( 15 minutes) .
With 36 GB of data , it recovers like in one second.

One thing about my slow index is that , its schema is quite weird.
To make facets happen , its fields structure is like X.Y for a feed but ,
X.Z for another and X.W for some other.
In short possible absolute field paths is enormous.

Is this a reason for the slowness ?

Please find the log attached.

Thanks
Vineeth

Between its lakh and not lack.

Thanks
Vineeth

On Fri, Aug 3, 2012 at 7:32 PM, Vineeth Mohan vineethmohan@algotree.comwrote:

Hi ,

I have a index which have just 1 1.5 Lack / 3 GB documents documents and
its damn slow.
Indexing single feed takes like 5 minutes , and start up takes like hell
lot of time ( 15 minutes) .
With 36 GB of data , it recovers like in one second.

One thing about my slow index is that , its schema is quite weird.
To make facets happen , its fields structure is like X.Y for a feed but ,
X.Z for another and X.W for some other.
In short possible absolute field paths is enormous.

Is this a reason for the slowness ?

Please find the log attached.

Thanks
Vineeth

This is the jstack when ES was busy recovering the index which takes like
15 to 20 minutes - gist:3248219 · GitHub

Thanks
Vineeth

On Fri, Aug 3, 2012 at 7:58 PM, Vineeth Mohan vineethmohan@algotree.comwrote:

Between its lakh and not lack.

Thanks
Vineeth

On Fri, Aug 3, 2012 at 7:32 PM, Vineeth Mohan vineethmohan@algotree.comwrote:

Hi ,

I have a index which have just 1 1.5 Lack / 3 GB documents documents and
its damn slow.
Indexing single feed takes like 5 minutes , and start up takes like hell
lot of time ( 15 minutes) .
With 36 GB of data , it recovers like in one second.

One thing about my slow index is that , its schema is quite weird.
To make facets happen , its fields structure is like X.Y for a feed but ,
X.Z for another and X.W for some other.
In short possible absolute field paths is enormous.

Is this a reason for the slowness ?

Please find the log attached.

Thanks
Vineeth

Can you gist your mappings for the slow case?

On Aug 3, 2012, at 5:40 PM, Vineeth Mohan vineethmohan@algotree.com wrote:

This is the jstack when ES was busy recovering the index which takes like 15 to 20 minutes - gist:3248219 · GitHub

Thanks
Vineeth

On Fri, Aug 3, 2012 at 7:58 PM, Vineeth Mohan vineethmohan@algotree.com wrote:
Between its lakh and not lack.

Thanks
Vineeth

On Fri, Aug 3, 2012 at 7:32 PM, Vineeth Mohan vineethmohan@algotree.com wrote:
Hi ,

I have a index which have just 1 1.5 Lack / 3 GB documents documents and its damn slow.
Indexing single feed takes like 5 minutes , and start up takes like hell lot of time ( 15 minutes) .
With 36 GB of data , it recovers like in one second.

One thing about my slow index is that , its schema is quite weird.
To make facets happen , its fields structure is like X.Y for a feed but , X.Z for another and X.W for some other.
In short possible absolute field paths is enormous.

Is this a reason for the slowness ?

Please find the log attached.

Thanks
Vineeth

Shay, the mapping is in the log file and it's about 10mb long. Vineeth,
could you elaborate a little bit on the reasons for such mapping? Some
examples of your data and requests that you are running would be helpful.
Not sure I fully understand the reason for having so many fields.

On Friday, August 3, 2012 10:51:26 AM UTC-4, kimchy wrote:

Can you gist your mappings for the slow case?

On Aug 3, 2012, at 5:40 PM, Vineeth Mohan vineethmohan@algotree.com
wrote:

This is the jstack when ES was busy recovering the index which takes like
15 to 20 minutes - gist:3248219 · GitHub

Thanks
Vineeth

On Fri, Aug 3, 2012 at 7:58 PM, Vineeth Mohan vineethmohan@algotree.comwrote:

Between its lakh and not lack.

Thanks
Vineeth

On Fri, Aug 3, 2012 at 7:32 PM, Vineeth Mohan vineethmohan@algotree.comwrote:

Hi ,

I have a index which have just 1 1.5 Lack / 3 GB documents documents and
its damn slow.
Indexing single feed takes like 5 minutes , and start up takes like hell
lot of time ( 15 minutes) .
With 36 GB of data , it recovers like in one second.

One thing about my slow index is that , its schema is quite weird.
To make facets happen , its fields structure is like X.Y for a feed but
, X.Z for another and X.W for some other.
In short possible absolute field paths is enormous.

Is this a reason for the slowness ?

Please find the log attached.

Thanks
Vineeth

Agree, it would be interesting to understand why the need for such mappings, in any case though, I identified an optimisation that we can perform that can greatly improve the performance in such a case, opened an issue: Improve recovery time when processing large mappings · Issue #2138 · elastic/elasticsearch · GitHub.

On Aug 3, 2012, at 6:20 PM, Igor Motov imotov@gmail.com wrote:

Shay, the mapping is in the log file and it's about 10mb long. Vineeth, could you elaborate a little bit on the reasons for such mapping? Some examples of your data and requests that you are running would be helpful. Not sure I fully understand the reason for having so many fields.

On Friday, August 3, 2012 10:51:26 AM UTC-4, kimchy wrote:
Can you gist your mappings for the slow case?

On Aug 3, 2012, at 5:40 PM, Vineeth Mohan vineethmohan@algotree.com wrote:

This is the jstack when ES was busy recovering the index which takes like 15 to 20 minutes - gist:3248219 · GitHub

Thanks
Vineeth

On Fri, Aug 3, 2012 at 7:58 PM, Vineeth Mohan vineethmohan@algotree.com wrote:
Between its lakh and not lack.

Thanks
Vineeth

On Fri, Aug 3, 2012 at 7:32 PM, Vineeth Mohan vineethmohan@algotree.com wrote:
Hi ,

I have a index which have just 1 1.5 Lack / 3 GB documents documents and its damn slow.
Indexing single feed takes like 5 minutes , and start up takes like hell lot of time ( 15 minutes) .
With 36 GB of data , it recovers like in one second.

One thing about my slow index is that , its schema is quite weird.
To make facets happen , its fields structure is like X.Y for a feed but , X.Z for another and X.W for some other.
In short possible absolute field paths is enormous.

Is this a reason for the slowness ?

Please find the log attached.

Thanks
Vineeth

Ok the story goes like this ...

I have a field called categories where i store all the parent tag , its
child tag and and any number of grant chldrent tag JSON.
One such example would be

Cateogires : [ { tag : europe
europe : [{
tag : germany
germany : [{
tag : place in
germany
place_in_germany :
[{ // this can go recursively to any number of times },{},{}]
}], .....
}], .....
}]

Now i want to see what all contitents are there , and when i press europe ,
i want to see what all places come under europe
and when i press any of those place like germany , i want to further drill
down.
Here the depth of drilling is not defined. It can go into any number of
times. Note that this is just an example to explain the use case.

So based on the current tag value , the path to the next facet path is
computed.

I would like to know if there is any better way to do the same.

Thanks
Vineeth

On Fri, Aug 3, 2012 at 9:12 PM, Shay Banon kimchy@gmail.com wrote:

Agree, it would be interesting to understand why the need for such
mappings, in any case though, I identified an optimisation that we can
perform that can greatly improve the performance in such a case, opened an
issue: Improve recovery time when processing large mappings · Issue #2138 · elastic/elasticsearch · GitHub.

On Aug 3, 2012, at 6:20 PM, Igor Motov imotov@gmail.com wrote:

Shay, the mapping is in the log file and it's about 10mb long. Vineeth,
could you elaborate a little bit on the reasons for such mapping? Some
examples of your data and requests that you are running would be helpful.
Not sure I fully understand the reason for having so many fields.

On Friday, August 3, 2012 10:51:26 AM UTC-4, kimchy wrote:

Can you gist your mappings for the slow case?

On Aug 3, 2012, at 5:40 PM, Vineeth Mohan vineethmohan@algotree.com
wrote:

This is the jstack when ES was busy recovering the index which takes like
15 to 20 minutes - https://gist.github.com/**3248219https://gist.github.com/3248219

Thanks
Vineeth

On Fri, Aug 3, 2012 at 7:58 PM, Vineeth Mohan vineethmohan@algotree.comwrote:

Between its lakh and not lack.

Thanks
Vineeth

On Fri, Aug 3, 2012 at 7:32 PM, Vineeth Mohan <vineethmohan@algotree.com

wrote:

Hi ,

I have a index which have just 1 1.5 Lack / 3 GB documents documents
and its damn slow.
Indexing single feed takes like 5 minutes , and start up takes like
hell lot of time ( 15 minutes) .
With 36 GB of data , it recovers like in one second.

One thing about my slow index is that , its schema is quite weird.
To make facets happen , its fields structure is like X.Y for a feed but
, X.Z for another and X.W for some other.
In short possible absolute field paths is enormous.

Is this a reason for the slowness ?

Please find the log attached.

Thanks
Vineeth

I think I understand the reasons for such a complex schema. One alternative
solution that comes to mind is to index tags in a flat structure. Assuming
that a record has two tags: "Europe/Germany/Frankfurt" and
"Europe/France/Lyon", we can index it like this:

tag_level_1: ["Europe"]
tag_level_2: ["Europe/Germany", "Europe/France"]
tag_level_3: ["Europe/Germany/Frankfurt", "Europe/France/Lyon"]

During search, on the first level we can just do faceted search on
tag_level_1. If the user clicks on "Europe", we can perform another search
with additional filter tag_level_1:Europe with faceted query on tag_level_2
that would filter out all tags that dont start with "Europe/" using Regex
pattern or Terms script. If user clicks on Germany then, we can repeat
search with tag_level_1:Europe/Germany with facets on tag_level_3 filtered
for tags that start with "Europe/Germany/" and so one.

On Friday, August 3, 2012 12:32:51 PM UTC-4, Vineeth Mohan wrote:

Ok the story goes like this ...

I have a field called categories where i store all the parent tag , its
child tag and and any number of grant chldrent tag JSON.
One such example would be

Cateogires : [ { tag : europe
europe : [{
tag : germany
germany : [{
tag : place in
germany
place_in_germany :
[{ // this can go recursively to any number of times },{},{}]
}], .....
}], .....
}]

Now i want to see what all contitents are there , and when i press europe
, i want to see what all places come under europe
and when i press any of those place like germany , i want to further drill
down.
Here the depth of drilling is not defined. It can go into any number of
times. Note that this is just an example to explain the use case.

So based on the current tag value , the path to the next facet path is
computed.

I would like to know if there is any better way to do the same.

Thanks
Vineeth

On Fri, Aug 3, 2012 at 9:12 PM, Shay Banon kimchy@gmail.com wrote:

Agree, it would be interesting to understand why the need for such
mappings, in any case though, I identified an optimisation that we can
perform that can greatly improve the performance in such a case, opened an
issue: Improve recovery time when processing large mappings · Issue #2138 · elastic/elasticsearch · GitHub.

On Aug 3, 2012, at 6:20 PM, Igor Motov imotov@gmail.com wrote:

Shay, the mapping is in the log file and it's about 10mb long. Vineeth,
could you elaborate a little bit on the reasons for such mapping? Some
examples of your data and requests that you are running would be helpful.
Not sure I fully understand the reason for having so many fields.

On Friday, August 3, 2012 10:51:26 AM UTC-4, kimchy wrote:

Can you gist your mappings for the slow case?

On Aug 3, 2012, at 5:40 PM, Vineeth Mohan vineethmohan@algotree.com
wrote:

This is the jstack when ES was busy recovering the index which takes
like 15 to 20 minutes - https://gist.github.com/**3248219https://gist.github.com/3248219

Thanks
Vineeth

On Fri, Aug 3, 2012 at 7:58 PM, Vineeth Mohan <vineethmohan@algotree.com

wrote:

Between its lakh and not lack.

Thanks
Vineeth

On Fri, Aug 3, 2012 at 7:32 PM, Vineeth Mohan <
vineethmohan@algotree.com> wrote:

Hi ,

I have a index which have just 1 1.5 Lack / 3 GB documents documents
and its damn slow.
Indexing single feed takes like 5 minutes , and start up takes like
hell lot of time ( 15 minutes) .
With 36 GB of data , it recovers like in one second.

One thing about my slow index is that , its schema is quite weird.
To make facets happen , its fields structure is like X.Y for a feed
but , X.Z for another and X.W for some other.
In short possible absolute field paths is enormous.

Is this a reason for the slowness ?

Please find the log attached.

Thanks
Vineeth