How to correctly store documents with page numbers?


(Vitaly) #1

Hi,

I was wondering if someone could help me with one issue. What would be
the best way to store document in a way that I could get original
document page number from elastic search? What happens now is I store
page number as field name in elastic search. Everything works well,
except one issue: highlighting. As you know in order for highlighting
to work, you have to supply field name, and in my case I'm not sure on
which page the result will be -> no way to supply field name.

So quickly what I'm trying to achieve:

I need to store document in elastic search, get corresponding page
number in results and have search hits highlighted.

Cheers,
Vitaly


(Lukáš Vlček) #2

Hi,

not sure I fully understand where is your problem. Do you think you can
share more info about how you model your domain objects? For example: you
break the book by pages and index each page as a separated document into ES?

Regards,
Lukas

On Thu, Apr 12, 2012 at 12:08 PM, Vitaly vshestovskiy@gmail.com wrote:

Hi,

I was wondering if someone could help me with one issue. What would be
the best way to store document in a way that I could get original
document page number from elastic search? What happens now is I store
page number as field name in elastic search. Everything works well,
except one issue: highlighting. As you know in order for highlighting
to work, you have to supply field name, and in my case I'm not sure on
which page the result will be -> no way to supply field name.

So quickly what I'm trying to achieve:

I need to store document in elastic search, get corresponding page
number in results and have search hits highlighted.

Cheers,
Vitaly


(Vitaly) #3

Hi Lukas,

Right now I create new document in elastic search for each book, and
then store separate pages in the document, with page number as field
name. Something like this:

{
"_index" : "data",
"_type" : "documents",
"_id" : "1",
"_source" : {
"title" : "Document name",
"1" : "Page 1 data",
"2" : "Page 2 data"
}
}

Cheers,
Vitaly

On Apr 12, 1:39 pm, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

not sure I fully understand where is your problem. Do you think you can
share more info about how you model your domain objects? For example: you
break the book by pages and index each page as a separated document into ES?

Regards,
Lukas

On Thu, Apr 12, 2012 at 12:08 PM, Vitaly vshestovs...@gmail.com wrote:

Hi,

I was wondering if someone could help me with one issue. What would be
the best way to store document in a way that I could get original
document page number from elastic search? What happens now is I store
page number as field name in elastic search. Everything works well,
except one issue: highlighting. As you know in order for highlighting
to work, you have to supply field name, and in my case I'm not sure on
which page the result will be -> no way to supply field name.

So quickly what I'm trying to achieve:

I need to store document in elastic search, get corresponding page
number in results and have search hits highlighted.

Cheers,
Vitaly


(chenguohui) #4

Hi Vitaly
Why not separate it? One page per doc in es.
{
"_index" : "data",
"_type" : "documents",
"_id" : "1",
"_source" : {
"title" : "Document name",
"content" : "Page 1 data",
"pageNum" : 1
}
}


(Vitaly) #5

But in that case I will get results weighted by page, not by document,
no?

On Apr 12, 4:22 pm, chenguohui trgo...@gmail.com wrote:

Hi Vitaly
Why not separate it? One page per doc in es.
{
"_index" : "data",
"_type" : "documents",
"_id" : "1",
"_source" : {
"title" : "Document name",
"content" : "Page 1 data",
"pageNum" : 1
}

}


(David Pilato) #6

What about using parent/chlid feature :
Parent is the document
Children are pages

Not sure that it will help :wink:

David.

-----Message d'origine-----
De : elasticsearch@googlegroups.com
[mailto:elasticsearch@googlegroups.com] De la part de Vitaly
Envoyé : vendredi 13 avril 2012 15:54
À : elasticsearch
Objet : Re: How to correctly store documents with page numbers?

But in that case I will get results weighted by page, not by document,
no?

On Apr 12, 4:22 pm, chenguohui trgo...@gmail.com wrote:

Hi Vitaly
Why not separate it? One page per doc in es.
{
"_index" : "data",
"_type" : "documents",
"_id" : "1",
"_source" : {
"title" : "Document name",
"content" : "Page 1 data",
"pageNum" : 1
}

}


(system) #7