I did provide information, or atleast I thought has all the info. Pasting
my previous mail:
I have a large xml which consists of set of forms that users fill out. So
essentially each json doc has a list of forms and each for has list of
fields and each field has a value that user types. This comes as xml doc.
This is what I would then need to convert to json. User requirements:
- User want every field to be searchable.
- User want to know just the filename that a particular field might be in.
- They might want to pull the entire doc, if needed. But in most cases
they may not need that.
What do you suggest I should do?
Typical Json looks like this:
Json doc 1
{
"fileName":"filename",
"createdDate":"05/20/12 16:21:56",
"setModel":[
{
"id":"1",
"compliance":false,
"forms":[
{
"id":"40",
"copy":null,
"tpsId":null,
"forms":[
{
"id":"F40_SW_2",
"copy":null,
"tpsId":"1/F40",
"forms":[
],
"tables":[
],
"fields":[
{
"id":"L31A",
"security":null,
"value":"3000."
},
{
"id":"MRSSN1",
"security":null,
"value":"656465464"
}
]
}
]
}
}
Json doc 2
{
"fileName":"filename",
"createdDate":"05/20/12 16:21:56",
"setModel":[
{
"id":"1",
"compliance":false,
"forms":[
{
"id":"50",
"copy":null,
"tpsId":null,
"forms":[
{
"id":"F50_SW_2",
"copy":null,
"tpsId":"1/F50",
"forms":[
],
"tables":[
],
"fields":[
{
"id":"L31A",
"security":null,
"value":"3000."
},
{
"id":"MRSSN1",
"security":null,
"value":"656465464"
}
]
}
]
}
}
On Mon, May 21, 2012 at 11:48 AM, Patrick patrick@eefy.net wrote:
Hi Mohit,
Unfortunately you've not provided answers to all of the questions listed
in your thread. While I'm certain this is something you eagerly want to get
going, unfortunately this is not paid support, and responses are due in a
best effort fashion. You have some of the smartest search guys in the open
source community here, but alas, they have day jobs, and while they'll do
their best job to respond to you, it's not usually the best etiquette to
poke for responses within a few hours of your last mail.
I think for us to best assist you, we should probably get an example
document (or documents), some idea on statistics you're looking for
(inserts? searches? etc....), perhaps example searches you're planning on
running, and a breakdown of hardware (which you've partially given). The
more information you can provide, the better the answers you'll get would
be.
Patrick
Patrick Ancillotti - New York | about.me
patrick eefy net
On Mon, May 21, 2012 at 12:12 PM, Mohit Anchlia mohitanchlia@gmail.comwrote:
Could someone help answer my questions?
On Sun, May 20, 2012 at 4:35 PM, Mohit Anchlia mohitanchlia@gmail.comwrote:
I have a large xml which consists of set of forms that users fill out.
So essentially each json doc has a list of forms and each for has list of
fields and each field has a value that user types. This comes as xml doc.
This is what I would then need to convert to json. User requirements:
- User want every field to be searchable.
- User want to know just the filename that a particular field might be
in.
- They might want to pull the entire doc, if needed. But in most cases
they may not need that.
What do you suggest I should do?
Typical Json looks like this:
Json doc 1
{
"fileName":"filename",
"createdDate":"05/20/12 16:21:56",
"setModel":[
{
"id":"1",
"compliance":false,
"forms":[
{
"id":"40",
"copy":null,
"tpsId":null,
"forms":[
{
"id":"F40_SW_2",
"copy":null,
"tpsId":"1/F40",
"forms":[
],
"tables":[
],
"fields":[
{
"id":"L31A",
"security":null,
"value":"3000."
},
{
"id":"MRSSN1",
"security":null,
"value":"656465464"
}
]
}
]
}
}
Json doc 2
{
"fileName":"filename",
"createdDate":"05/20/12 16:21:56",
"setModel":[
{
"id":"1",
"compliance":false,
"forms":[
{
"id":"50",
"copy":null,
"tpsId":null,
"forms":[
{
"id":"F50_SW_2",
"copy":null,
"tpsId":"1/F50",
"forms":[
],
"tables":[
],
"fields":[
{
"id":"L31A",
"security":null,
"value":"3000."
},
{
"id":"MRSSN1",
"security":null,
"value":"656465464"
}
]
}
]
}
}
On Sun, May 20, 2012 at 11:29 AM, Randy randall.mcree@gmail.com wrote:
One suggestion is to avoid indexing large docs. Eg break them into
smaller units like chapters paragraphs, even sentence groups. Elastic's
parent-child feature is a natural in this context.
Do you really want to present a 500k doc in response to a phrase
search? Probably not. You want to present the matching context.
Of course we need ro more about the requirements in order to be more
helpful.
Sent from my iPhone
On May 20, 2012, at 8:10 AM, Mohit Anchlia mohitanchlia@gmail.com
wrote:
Are there any performance suggestion when indexing documents of size
300k-500k? I am planning to do the following:
- Increase the default shard from 5 to 20
- Increase the heap size
- Enable compression
- Use 4 nodes
We expect volume of 100 documents/sec. Is there anything else I
should do?