Index design for very dynamic Form/Fields

Hi,

We have a system very much like Google Forms, which allow users to design
their own forms with various fields (single line text, paragraph, number,
address etc, you can imagine that.) Without any doubt, it's running on top
of MongoDB. Now it has 120K forms with nearly 10 million entries.

Recently we found a performance bottleneck on the query. After we have done
every possible performance tuning on mongoDB side, we decide to index the
form entries into elastic search. And there is a trouble:

Given there is a Form A, has field_1 as string type, field_2 as number, the
data entry might look like: { field_1: "hello", field_2: 100}

Form B could be field_1 as number, field_2 as number, the date entry will
look like { field_1: 100, field_2: "hello form"}

We have successfully create an index "entries" in ES, and can index the
first entry successfully. But the second one failed for an obvious reason:
type mismatch.

I am not sure how to deal with this problem. I definitely don't want to
create 120K indices for every single form. And I am not sure it's doable to
write custom transform script to change the index type identical across all
entries.

Any suggestion? Much appreciate any response.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4a6d47d7-ae0e-44f5-bd3a-756ea94e3899%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

You could have one type per form although the cluster state will be very big.
But you should test that option.

Or if you don't really search for numbers as numbers (I mean with Range queries/filters), you could force each field to be a String and do the transformation at a client level.

My 2 cents

--
David Pilato | Technical Advocate | elasticsearch.com
david.pilato@elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 19 septembre 2014 à 05:31:38, Michael Chen (mechiland@gmail.com) a écrit:

Hi,

We have a system very much like Google Forms, which allow users to design their own forms with various fields (single line text, paragraph, number, address etc, you can imagine that.) Without any doubt, it's running on top of MongoDB. Now it has 120K forms with nearly 10 million entries.

Recently we found a performance bottleneck on the query. After we have done every possible performance tuning on mongoDB side, we decide to index the form entries into elastic search. And there is a trouble:

Given there is a Form A, has field_1 as string type, field_2 as number, the data entry might look like: { field_1: "hello", field_2: 100}

Form B could be field_1 as number, field_2 as number, the date entry will look like { field_1: 100, field_2: "hello form"}

We have successfully create an index "entries" in ES, and can index the first entry successfully. But the second one failed for an obvious reason: type mismatch.

I am not sure how to deal with this problem. I definitely don't want to create 120K indices for every single form. And I am not sure it's doable to write custom transform script to change the index type identical across all entries.

Any suggestion? Much appreciate any response.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4a6d47d7-ae0e-44f5-bd3a-756ea94e3899%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.541bc9fd.66334873.18d1%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

Thanks David. Based on the system behavior, having all type as string is
fine for queries. But for the aggregation level it might be trouble. For
example a type of address is a complex JSON object:

{ field_1: { country: "US", province: "CA", city: "New York", address:
"Street Address"} }

If we transform this type into any form of string, and trying to aggregate
based on country/state, it will be VERY hard even not possible.

On Fri, Sep 19, 2014 at 2:15 PM, David Pilato david@pilato.fr wrote:

You could have one type per form although the cluster state will be very
big.
But you should test that option.

Or if you don't really search for numbers as numbers (I mean with Range
queries/filters), you could force each field to be a String and do the
transformation at a client level.

My 2 cents

--
David Pilato | Technical Advocate | elasticsearch.com
http://elasticsearch.com

david.pilato@elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr
https://twitter.com/elasticsearchfr | @scrutmydocs
http://twitter.com/scrutmydocs
https://twitter.com/scrutmydocs

Le 19 septembre 2014 à 05:31:38, Michael Chen (mechiland@gmail.com) a
écrit:

Hi,

We have a system very much like Google Forms, which allow users to design
their own forms with various fields (single line text, paragraph, number,
address etc, you can imagine that.) Without any doubt, it's running on top
of MongoDB. Now it has 120K forms with nearly 10 million entries.

Recently we found a performance bottleneck on the query. After we have
done every possible performance tuning on mongoDB side, we decide to index
the form entries into Elasticsearch. And there is a trouble:

Given there is a Form A, has field_1 as string type, field_2 as number,
the data entry might look like: { field_1: "hello", field_2: 100}

Form B could be field_1 as number, field_2 as number, the date entry will
look like { field_1: 100, field_2: "hello form"}

We have successfully create an index "entries" in ES, and can index the
first entry successfully. But the second one failed for an obvious reason:
type mismatch.

I am not sure how to deal with this problem. I definitely don't want to
create 120K indices for every single form. And I am not sure it's doable to
write custom transform script to change the index type identical across all
entries.

Any suggestion? Much appreciate any response.

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4a6d47d7-ae0e-44f5-bd3a-756ea94e3899%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4a6d47d7-ae0e-44f5-bd3a-756ea94e3899%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.541bc9fd.66334873.18d1%40MacBook-Air-de-David.local
https://groups.google.com/d/msgid/elasticsearch/etPan.541bc9fd.66334873.18d1%40MacBook-Air-de-David.local?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Michael Chen

Blog: http://michael.nona.name
GTalk/Twitter/Facebook/Yahoo/Skype: mechiland

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAAzF%2BWj0jFb642x_hgko%2Bs9mfM3MCNqeS5Jr7TxtWq5XXAPvXg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

I don't get it.

If field_1.country is a String why you can not aggregate on it?

--
David Pilato | Technical Advocate | elasticsearch.com
david.pilato@elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 19 septembre 2014 à 08:27:19, Michael Chen (mechiland@gmail.com) a écrit:

Thanks David. Based on the system behavior, having all type as string is fine for queries. But for the aggregation level it might be trouble. For example a type of address is a complex JSON object:

{ field_1: { country: "US", province: "CA", city: "New York", address: "Street Address"} }

If we transform this type into any form of string, and trying to aggregate based on country/state, it will be VERY hard even not possible.

On Fri, Sep 19, 2014 at 2:15 PM, David Pilato david@pilato.fr wrote:
You could have one type per form although the cluster state will be very big.
But you should test that option.

Or if you don't really search for numbers as numbers (I mean with Range queries/filters), you could force each field to be a String and do the transformation at a client level.

My 2 cents

--
David Pilato | Technical Advocate | elasticsearch.com
david.pilato@elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 19 septembre 2014 à 05:31:38, Michael Chen (mechiland@gmail.com) a écrit:

Hi,

We have a system very much like Google Forms, which allow users to design their own forms with various fields (single line text, paragraph, number, address etc, you can imagine that.) Without any doubt, it's running on top of MongoDB. Now it has 120K forms with nearly 10 million entries.

Recently we found a performance bottleneck on the query. After we have done every possible performance tuning on mongoDB side, we decide to index the form entries into elastic search. And there is a trouble:

Given there is a Form A, has field_1 as string type, field_2 as number, the data entry might look like: { field_1: "hello", field_2: 100}

Form B could be field_1 as number, field_2 as number, the date entry will look like { field_1: 100, field_2: "hello form"}

We have successfully create an index "entries" in ES, and can index the first entry successfully. But the second one failed for an obvious reason: type mismatch.

I am not sure how to deal with this problem. I definitely don't want to create 120K indices for every single form. And I am not sure it's doable to write custom transform script to change the index type identical across all entries.

Any suggestion? Much appreciate any response.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4a6d47d7-ae0e-44f5-bd3a-756ea94e3899%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.541bc9fd.66334873.18d1%40MacBook-Air-de-David.local.

For more options, visit https://groups.google.com/d/optout.

--
Michael Chen

Blog: http://michael.nona.name
GTalk/Twitter/Facebook/Yahoo/Skype: mechiland

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAAzF%2BWj0jFb642x_hgko%2Bs9mfM3MCNqeS5Jr7TxtWq5XXAPvXg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.541bfc4e.1f16e9e8.18d1%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

We cannot guarantee that field_1 is always address. In Form 1, field_1
might be address while in another form it might be string or number
whatever. Thinking about designing the storage for Google Forms and it's
data entries.

Re "you could force each field to be a String and do the transformation at
a client level."

Forcing means serialize all data into a string right? In the example JSON
mentioned in previous email, it will transformed to something like

{ field_1: "{country: "US", province: "CA", city: "New York",
address: "Street Address"}" }

Then not able to do the aggregation.

On Fri, Sep 19, 2014 at 5:50 PM, David Pilato david@pilato.fr wrote:

I don't get it.

If field_1.country is a String why you can not aggregate on it?

--
David Pilato | Technical Advocate | elasticsearch.com
http://elasticsearch.com

david.pilato@elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr
https://twitter.com/elasticsearchfr | @scrutmydocs
http://twitter.com/scrutmydocs
https://twitter.com/scrutmydocs

Le 19 septembre 2014 à 08:27:19, Michael Chen (mechiland@gmail.com) a
écrit:

Thanks David. Based on the system behavior, having all type as string is
fine for queries. But for the aggregation level it might be trouble. For
example a type of address is a complex JSON object:

{ field_1: { country: "US", province: "CA", city: "New York", address:
"Street Address"} }

If we transform this type into any form of string, and trying to aggregate
based on country/state, it will be VERY hard even not possible.

On Fri, Sep 19, 2014 at 2:15 PM, David Pilato david@pilato.fr wrote:

You could have one type per form although the cluster state will be
very big.
But you should test that option.

Or if you don't really search for numbers as numbers (I mean with Range
queries/filters), you could force each field to be a String and do the
transformation at a client level.

My 2 cents

--

Le 19 septembre 2014 à 05:31:38, Michael Chen (mechiland@gmail.com) a
écrit:

Hi,

We have a system very much like Google Forms, which allow users to design
their own forms with various fields (single line text, paragraph, number,
address etc, you can imagine that.) Without any doubt, it's running on top
of MongoDB. Now it has 120K forms with nearly 10 million entries.

Recently we found a performance bottleneck on the query. After we have
done every possible performance tuning on mongoDB side, we decide to index
the form entries into Elasticsearch. And there is a trouble:

Given there is a Form A, has field_1 as string type, field_2 as number,
the data entry might look like: { field_1: "hello", field_2: 100}

Form B could be field_1 as number, field_2 as number, the date entry will
look like { field_1: 100, field_2: "hello form"}

We have successfully create an index "entries" in ES, and can index the
first entry successfully. But the second one failed for an obvious reason:
type mismatch.

I am not sure how to deal with this problem. I definitely don't want to
create 120K indices for every single form. And I am not sure it's doable to
write custom transform script to change the index type identical across all
entries.

Any suggestion? Much appreciate any response.

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4a6d47d7-ae0e-44f5-bd3a-756ea94e3899%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4a6d47d7-ae0e-44f5-bd3a-756ea94e3899%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.541bc9fd.66334873.18d1%40MacBook-Air-de-David.local
https://groups.google.com/d/msgid/elasticsearch/etPan.541bc9fd.66334873.18d1%40MacBook-Air-de-David.local?utm_medium=email&utm_source=footer.

For more options, visit https://groups.google.com/d/optout.

--
Michael Chen

Blog: http://michael.nona.name
GTalk/Twitter/Facebook/Yahoo/Skype: mechiland

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAAzF%2BWj0jFb642x_hgko%2Bs9mfM3MCNqeS5Jr7TxtWq5XXAPvXg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAAzF%2BWj0jFb642x_hgko%2Bs9mfM3MCNqeS5Jr7TxtWq5XXAPvXg%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.541bfc4e.1f16e9e8.18d1%40MacBook-Air-de-David.local
https://groups.google.com/d/msgid/elasticsearch/etPan.541bfc4e.1f16e9e8.18d1%40MacBook-Air-de-David.local?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Michael Chen

Blog: http://michael.nona.name
GTalk/Twitter/Facebook/Yahoo/Skype: mechiland

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAAzF%2BWjYweFv5RtR5diezwE4pfuCEd1%2BGKf%3DpH-JfUMmSjeKrw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Let me make the question more clear. The challenge we have now, is how to
index a EAV[1] model database.

Let's take Google Form as an example. Every user can create a form. They
can choose from various field types including text, number, choice etc.
They construct one form like this:

Form 1: a survery

  • field_1: type=text
  • field_2: type=number
  • field_3: type=choice

And people submit data entry into this form with data like:

{
field_1: "hello",
field_2: 20,
field_3: ["red"]
}

And you can imagine that all this data entries saved into one single mongo
collection "entries".

Well, the second user might create another form like this:

Form 2: a questionare

  • field_1: type=number
  • field_2: type=text
  • field_3: type=number
  • field_4: type=text

the data submission might like this:

{
field_1: 100,
field_2: "hello questionare",
field_3: 20,
field_4: "this is my answer"
}

Indexing the second data entry while we have the first one in ES will throw
NumberFormatException because field_2 was guessed by ES it should be
number. Then the transforming all value into string make sense but...

Any thoughts?

[1]EAV: Entity–attribute–value model,

On Fri, Sep 19, 2014 at 6:01 PM, Michael Chen mechiland@gmail.com wrote:

We cannot guarantee that field_1 is always address. In Form 1, field_1
might be address while in another form it might be string or number
whatever. Thinking about designing the storage for Google Forms and it's
data entries.

Re "you could force each field to be a String and do the transformation
at a client level."

Forcing means serialize all data into a string right? In the example JSON
mentioned in previous email, it will transformed to something like

{ field_1: "{country: "US", province: "CA", city: "New York",
address: "Street Address"}" }

Then not able to do the aggregation.

On Fri, Sep 19, 2014 at 5:50 PM, David Pilato david@pilato.fr wrote:

I don't get it.

If field_1.country is a String why you can not aggregate on it?

--
David Pilato | Technical Advocate | elasticsearch.com
http://elasticsearch.com

david.pilato@elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr
https://twitter.com/elasticsearchfr | @scrutmydocs
http://twitter.com/scrutmydocs
https://twitter.com/scrutmydocs

Le 19 septembre 2014 à 08:27:19, Michael Chen (mechiland@gmail.com) a
écrit:

Thanks David. Based on the system behavior, having all type as string is
fine for queries. But for the aggregation level it might be trouble. For
example a type of address is a complex JSON object:

{ field_1: { country: "US", province: "CA", city: "New York", address:
"Street Address"} }

If we transform this type into any form of string, and trying to
aggregate based on country/state, it will be VERY hard even not possible.

On Fri, Sep 19, 2014 at 2:15 PM, David Pilato david@pilato.fr wrote:

You could have one type per form although the cluster state will be
very big.
But you should test that option.

Or if you don't really search for numbers as numbers (I mean with Range
queries/filters), you could force each field to be a String and do the
transformation at a client level.

My 2 cents

--

Le 19 septembre 2014 à 05:31:38, Michael Chen (mechiland@gmail.com) a
écrit:

Hi,

We have a system very much like Google Forms, which allow users to
design their own forms with various fields (single line text, paragraph,
number, address etc, you can imagine that.) Without any doubt, it's running
on top of MongoDB. Now it has 120K forms with nearly 10 million entries.

Recently we found a performance bottleneck on the query. After we have
done every possible performance tuning on mongoDB side, we decide to index
the form entries into Elasticsearch. And there is a trouble:

Given there is a Form A, has field_1 as string type, field_2 as number,
the data entry might look like: { field_1: "hello", field_2: 100}

Form B could be field_1 as number, field_2 as number, the date entry
will look like { field_1: 100, field_2: "hello form"}

We have successfully create an index "entries" in ES, and can index the
first entry successfully. But the second one failed for an obvious reason:
type mismatch.

I am not sure how to deal with this problem. I definitely don't want to
create 120K indices for every single form. And I am not sure it's doable to
write custom transform script to change the index type identical across all
entries.

Any suggestion? Much appreciate any response.

You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4a6d47d7-ae0e-44f5-bd3a-756ea94e3899%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4a6d47d7-ae0e-44f5-bd3a-756ea94e3899%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.541bc9fd.66334873.18d1%40MacBook-Air-de-David.local
https://groups.google.com/d/msgid/elasticsearch/etPan.541bc9fd.66334873.18d1%40MacBook-Air-de-David.local?utm_medium=email&utm_source=footer.

For more options, visit https://groups.google.com/d/optout.

--
Michael Chen

Blog: http://michael.nona.name
GTalk/Twitter/Facebook/Yahoo/Skype: mechiland

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAAzF%2BWj0jFb642x_hgko%2Bs9mfM3MCNqeS5Jr7TxtWq5XXAPvXg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAAzF%2BWj0jFb642x_hgko%2Bs9mfM3MCNqeS5Jr7TxtWq5XXAPvXg%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.541bfc4e.1f16e9e8.18d1%40MacBook-Air-de-David.local
https://groups.google.com/d/msgid/elasticsearch/etPan.541bfc4e.1f16e9e8.18d1%40MacBook-Air-de-David.local?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Michael Chen

Blog: http://michael.nona.name
GTalk/Twitter/Facebook/Yahoo/Skype: mechiland

--
Michael Chen

Blog: http://michael.nona.name
GTalk/Twitter/Facebook/Yahoo/Skype: mechiland

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAAzF%2BWiB3Z%3D_wtzrqtqRObPLucp_SULNjYeOVPbFs8pb-tPETg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Michael, did you ever solve this? I'm about to encounter this very same
issue and I'm looking into what solutions are available to me.

On Friday, September 19, 2014 6:21:32 AM UTC-4, Michael Chen wrote:

Let me make the question more clear. The challenge we have now, is how to
index a EAV[1] model database.

Let's take Google Form as an example. Every user can create a form. They
can choose from various field types including text, number, choice etc.
They construct one form like this:

Form 1: a survery

  • field_1: type=text
  • field_2: type=number
  • field_3: type=choice

And people submit data entry into this form with data like:

{
field_1: "hello",
field_2: 20,
field_3: ["red"]
}

And you can imagine that all this data entries saved into one single mongo
collection "entries".

Well, the second user might create another form like this:

Form 2: a questionare

  • field_1: type=number
  • field_2: type=text
  • field_3: type=number
  • field_4: type=text

the data submission might like this:

{
field_1: 100,
field_2: "hello questionare",
field_3: 20,
field_4: "this is my answer"
}

Indexing the second data entry while we have the first one in ES will
throw NumberFormatException because field_2 was guessed by ES it should be
number. Then the transforming all value into string make sense but...

Any thoughts?

[1]EAV: Entity–attribute–value model,
Entity–attribute–value model - Wikipedia

On Fri, Sep 19, 2014 at 6:01 PM, Michael Chen <mech...@gmail.com
<javascript:>> wrote:

We cannot guarantee that field_1 is always address. In Form 1, field_1
might be address while in another form it might be string or number
whatever. Thinking about designing the storage for Google Forms and it's
data entries.

Re "you could force each field to be a String and do the transformation
at a client level."

Forcing means serialize all data into a string right? In the example JSON
mentioned in previous email, it will transformed to something like

{ field_1: "{country: "US", province: "CA", city: "New York",
address: "Street Address"}" }

Then not able to do the aggregation.

On Fri, Sep 19, 2014 at 5:50 PM, David Pilato <da...@pilato.fr
<javascript:>> wrote:

I don't get it.

If field_1.country is a String why you can not aggregate on it?

--
David Pilato | Technical Advocate | elasticsearch.com
http://elasticsearch.com

david....@elasticsearch.com <javascript:>
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr
https://twitter.com/elasticsearchfr | @scrutmydocs
http://twitter.com/scrutmydocs
https://twitter.com/scrutmydocs

Le 19 septembre 2014 à 08:27:19, Michael Chen (mech...@gmail.com
<javascript:>) a écrit:

Thanks David. Based on the system behavior, having all type as string is
fine for queries. But for the aggregation level it might be trouble. For
example a type of address is a complex JSON object:

{ field_1: { country: "US", province: "CA", city: "New York", address:
"Street Address"} }

If we transform this type into any form of string, and trying to
aggregate based on country/state, it will be VERY hard even not possible.

On Fri, Sep 19, 2014 at 2:15 PM, David Pilato <da...@pilato.fr
<javascript:>> wrote:

You could have one type per form although the cluster state will be
very big.
But you should test that option.

Or if you don't really search for numbers as numbers (I mean with
Range queries/filters), you could force each field to be a String and do
the transformation at a client level.

My 2 cents

--

Le 19 septembre 2014 à 05:31:38, Michael Chen (mech...@gmail.com
<javascript:>) a écrit:

Hi,

We have a system very much like Google Forms, which allow users to
design their own forms with various fields (single line text, paragraph,
number, address etc, you can imagine that.) Without any doubt, it's running
on top of MongoDB. Now it has 120K forms with nearly 10 million entries.

Recently we found a performance bottleneck on the query. After we have
done every possible performance tuning on mongoDB side, we decide to index
the form entries into Elasticsearch. And there is a trouble:

Given there is a Form A, has field_1 as string type, field_2 as number,
the data entry might look like: { field_1: "hello", field_2: 100}

Form B could be field_1 as number, field_2 as number, the date entry
will look like { field_1: 100, field_2: "hello form"}

We have successfully create an index "entries" in ES, and can index the
first entry successfully. But the second one failed for an obvious reason:
type mismatch.

I am not sure how to deal with this problem. I definitely don't want to
create 120K indices for every single form. And I am not sure it's doable to
write custom transform script to change the index type identical across all
entries.

Any suggestion? Much appreciate any response.

You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4a6d47d7-ae0e-44f5-bd3a-756ea94e3899%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4a6d47d7-ae0e-44f5-bd3a-756ea94e3899%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.541bc9fd.66334873.18d1%40MacBook-Air-de-David.local
https://groups.google.com/d/msgid/elasticsearch/etPan.541bc9fd.66334873.18d1%40MacBook-Air-de-David.local?utm_medium=email&utm_source=footer.

For more options, visit https://groups.google.com/d/optout.

--
Michael Chen

Blog: http://michael.nona.name
GTalk/Twitter/Facebook/Yahoo/Skype: mechiland

You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAAzF%2BWj0jFb642x_hgko%2Bs9mfM3MCNqeS5Jr7TxtWq5XXAPvXg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAAzF%2BWj0jFb642x_hgko%2Bs9mfM3MCNqeS5Jr7TxtWq5XXAPvXg%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.541bfc4e.1f16e9e8.18d1%40MacBook-Air-de-David.local
https://groups.google.com/d/msgid/elasticsearch/etPan.541bfc4e.1f16e9e8.18d1%40MacBook-Air-de-David.local?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Michael Chen

Blog: http://michael.nona.name
GTalk/Twitter/Facebook/Yahoo/Skype: mechiland

--
Michael Chen

Blog: http://michael.nona.name
GTalk/Twitter/Facebook/Yahoo/Skype: mechiland

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0c7013de-7549-4717-a83c-17cc7b496f16%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.