How to correctly store documents with page numbers?

Hi,

I was wondering if someone could help me with one issue. What would be
the best way to store document in a way that I could get original
document page number from elastic search? What happens now is I store
page number as field name in elastic search. Everything works well,
except one issue: highlighting. As you know in order for highlighting
to work, you have to supply field name, and in my case I'm not sure on
which page the result will be -> no way to supply field name.

So quickly what I'm trying to achieve:

I need to store document in elastic search, get corresponding page
number in results and have search hits highlighted.

Cheers,
Vitaly

Hi,

not sure I fully understand where is your problem. Do you think you can
share more info about how you model your domain objects? For example: you
break the book by pages and index each page as a separated document into ES?

Regards,
Lukas

On Thu, Apr 12, 2012 at 12:08 PM, Vitaly vshestovskiy@gmail.com wrote:

Hi,

I was wondering if someone could help me with one issue. What would be
the best way to store document in a way that I could get original
document page number from Elasticsearch? What happens now is I store
page number as field name in Elasticsearch. Everything works well,
except one issue: highlighting. As you know in order for highlighting
to work, you have to supply field name, and in my case I'm not sure on
which page the result will be -> no way to supply field name.

So quickly what I'm trying to achieve:

I need to store document in Elasticsearch, get corresponding page
number in results and have search hits highlighted.

Cheers,
Vitaly

Hi Lukas,

Right now I create new document in Elasticsearch for each book, and
then store separate pages in the document, with page number as field
name. Something like this:

{
"_index" : "data",
"_type" : "documents",
"_id" : "1",
"_source" : {
"title" : "Document name",
"1" : "Page 1 data",
"2" : "Page 2 data"
}
}

Cheers,
Vitaly

On Apr 12, 1:39 pm, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

not sure I fully understand where is your problem. Do you think you can
share more info about how you model your domain objects? For example: you
break the book by pages and index each page as a separated document into ES?

Regards,
Lukas

On Thu, Apr 12, 2012 at 12:08 PM, Vitaly vshestovs...@gmail.com wrote:

Hi,

I was wondering if someone could help me with one issue. What would be
the best way to store document in a way that I could get original
document page number from Elasticsearch? What happens now is I store
page number as field name in Elasticsearch. Everything works well,
except one issue: highlighting. As you know in order for highlighting
to work, you have to supply field name, and in my case I'm not sure on
which page the result will be -> no way to supply field name.

So quickly what I'm trying to achieve:

I need to store document in Elasticsearch, get corresponding page
number in results and have search hits highlighted.

Cheers,
Vitaly

Hi Vitaly
Why not separate it? One page per doc in es.
{
"_index" : "data",
"_type" : "documents",
"_id" : "1",
"_source" : {
"title" : "Document name",
"content" : "Page 1 data",
"pageNum" : 1
}
}

But in that case I will get results weighted by page, not by document,
no?

On Apr 12, 4:22 pm, chenguohui trgo...@gmail.com wrote:

Hi Vitaly
Why not separate it? One page per doc in es.
{
"_index" : "data",
"_type" : "documents",
"_id" : "1",
"_source" : {
"title" : "Document name",
"content" : "Page 1 data",
"pageNum" : 1
}

}

What about using parent/chlid feature :
Parent is the document
Children are pages

Not sure that it will help :wink:

David.

-----Message d'origine-----
De : elasticsearch@googlegroups.com
[mailto:elasticsearch@googlegroups.com] De la part de Vitaly
Envoyé : vendredi 13 avril 2012 15:54
À : elasticsearch
Objet : Re: How to correctly store documents with page numbers?

But in that case I will get results weighted by page, not by document,
no?

On Apr 12, 4:22 pm, chenguohui trgo...@gmail.com wrote:

Hi Vitaly
Why not separate it? One page per doc in es.
{
"_index" : "data",
"_type" : "documents",
"_id" : "1",
"_source" : {
"title" : "Document name",
"content" : "Page 1 data",
"pageNum" : 1
}

}