Definitely!
It's a cool idea.
So you agree that you will have to create a document on your side based on
entry_data.
You should be aware that you will probably need to define some things
using mapping. Otherwise, the default analysis process could give you
undesired results.
For example, for email, you should define an email analyzer.
Be careful of field names and types. I would try to avoid to have a field
name for example answer1 and answer1 in type1 is a Number and answer1 in
type2 is a String.
Regarding to your first question: I also think that #2 is one of the best
option here.
I also like #3 as I can imagine that after some months, you may want to
remove old surveys? It's really more efficient to remove a full index than
removing documents.
Also, you can think of mixing #2 and #3 and have a fine tuning about the
number of shards needed for each index. Let's say that you have a very big
Survey with 10 000 000 of answers. You probably want to index it in its own
index. Let's say that you have 240 000 small surveys. You probably want
to share the same index…
Make sense?
--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs
Le 11 avr. 2013 à 15:54, mmattax <matt...@gmail.com <javascript:>> a
écrit :
The entry and entry_data tables represent form responses that customers
have built (I work for Formstack http://formstack.com/). The data can
represent anything from a survey, event registration forms, order forms,
etc. Ideally everything in entry_data becomes searchable and sortable:
entry_data might have a few rows in MySQL:
id entry_id key value
1 100 name first=Michael&last=Mattax
2 100 email mic...@example.com <javascript:>
3 100 sex male
4 100 birthday 1985-08-15
.. there could be hundreds more ...
Using Elasticsearch we want to make all of this data searchable so the
document might look like
curl -XPUT 'http://localhost:9200/entries/<FORM_ID>/100' -d '{
name : { first : 'Michael', last : 'Mattax' },
email : 'mic...@example.com <javascript:>',
sex : 'male',
'birthday' : '1985-08-15'
}'
That way we can use Elasticsearch power and not do any joins (which we do
now for complex searches). Note that each form has a different schema
(besides some common shared metadata).
Does that clarify things at all?
On Thursday, April 11, 2013 4:24:24 AM UTC-4, David Pilato wrote:
IMHO, the first question should be always user centric and not technical
centric.
I mean that you should ask yourself: "what my users want to find?"
I don't think that they want to search for entries or for data entries,
do they?
Probably, you have behind this flexible design a real use case that you
could share with us.
I would probably try to create real business documents instead of
technical ones in my application and then send to Elasticsearch real
documents.
That said, I don't have any idea of your use case and I might be wrong.
My 2 cents
--
David Pilato | Technical Advocate | *Elasticsearch.comhttp://elasticsearch.com/
*
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs
Le 11 avr. 2013 à 04:17, mmattax matt...@gmail.com a écrit :
Hi,
I'm just getting started with Elasticsearch and need some advice. We
have 3 tables in MySQL that we'd like to combine into a simpler schema
with elasticsearch:
+---------------------+
| form +
+---------------------+
| id |
| name |
+---------------------+
+---------------------+
| entry +
+---------------------+
| id |
| form_id |
| timestamp |
+---------------------+
+---------------------+
| entry_data +
+---------------------+
| id |
| entry_id |
| key |
| value |
+---------------------+
Constraints as follows: A form can have many entries, an entry can
have many entry_data's. Each form has it's own schema. Our DB
currently has 250,000 forms and 47,000,000 entries.
A few ways that I think we can accomplish this within elasticsearch:
- Keep the same structure we have in MySQL: create an entry index,
the "form_id" will be a part of the entry document.
- Have a entry index, each form becomes a type, entry is the
document.
- Each form is an index.
My main questions:
Is one of the schemes above better than the other? I'm thinking that
#2 will be the simplest. Is there a performance cost to having that
many types (250k)?
Would you architect this differently?
Thanks,
Michael
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.