Design question


(Mohit Anchlia) #1

I am just starting to use elasticsearch. We have some documents in EDI
format and XML format that we need to index. For eg EDI format is
something like:

0120TRANA 770034661 PREPARER'S
AGENTE20080522010080014302AV901005 TZ #
0120
TRANB 7700346616220 GREENWICH DR SAN DIEGO CA
92122 8585258010 #
0120ACK 5618383330100800143020001000000000000C0004
200805220090100500838801 1 NJ#
0120
ACKR 561838333 01FORM 1040 00001000000100100504
#
0120****RECAP
000000000001010080014302000000000000000001000000000000000000000001000000
#

and everyone know xml :slight_smile:

Is there a best practice or some best way of how it should be indexed
in terms of "JSON format", index fields or settings? Or just throw in
the document say with field name as "document"?


(David Pilato) #2

For the same use case, I use the mapper-attachment plugin.
I send the XML file as an attachment.

HTH
David :wink:
@dadoonet

Le 11 janv. 2012 à 02:33, Mohit Anchlia mohitanchlia@gmail.com a écrit :

I am just starting to use elasticsearch. We have some documents in EDI
format and XML format that we need to index. For eg EDI format is
something like:

0120TRANA 770034661 PREPARER'S
AGENTE20080522010080014302AV901005 TZ #
0120
TRANB 7700346616220 GREENWICH DR SAN DIEGO CA
92122 8585258010 #
0120ACK 5618383330100800143020001000000000000C0004
200805220090100500838801 1 NJ#
0120
ACKR 561838333 01FORM 1040 00001000000100100504
#
0120****RECAP
000000000001010080014302000000000000000001000000000000000000000001000000
#

and everyone know xml :slight_smile:

Is there a best practice or some best way of how it should be indexed
in terms of "JSON format", index fields or settings? Or just throw in
the document say with field name as "document"?


(Mohit Anchlia) #3

Does this plugin index the text in the document with different terms
or it just stores as an attachment. for eg: If I wanted to say get me
all xml documents that has value "xyz", would that be possible?

On Tue, Jan 10, 2012 at 10:37 PM, David Pilato david@pilato.fr wrote:

For the same use case, I use the mapper-attachment plugin.
I send the XML file as an attachment.

HTH
David :wink:
@dadoonet

Le 11 janv. 2012 à 02:33, Mohit Anchlia mohitanchlia@gmail.com a écrit :

I am just starting to use elasticsearch. We have some documents in EDI
format and XML format that we need to index. For eg EDI format is
something like:

0120TRANA 770034661 PREPARER'S
AGENTE20080522010080014302AV901005 TZ #
0120
TRANB 7700346616220 GREENWICH DR SAN DIEGO CA
92122 8585258010 #
0120ACK 5618383330100800143020001000000000000C0004
200805220090100500838801 1 NJ#
0120
ACKR 561838333 01FORM 1040 00001000000100100504
#
0120****RECAP
000000000001010080014302000000000000000001000000000000000000000001000000
#

and everyone know xml :slight_smile:

Is there a best practice or some best way of how it should be indexed
in terms of "JSON format", index fields or settings? Or just throw in
the document say with field name as "document"?


(David Pilato) #4

Yes, It index the content, as for pdf, ooo files and so on.

HTH
David :wink:
@dadoonet

Le 11 janv. 2012 à 17:41, Mohit Anchlia mohitanchlia@gmail.com a écrit :

Does this plugin index the text in the document with different terms
or it just stores as an attachment. for eg: If I wanted to say get me
all xml documents that has value "xyz", would that be possible?

On Tue, Jan 10, 2012 at 10:37 PM, David Pilato david@pilato.fr wrote:

For the same use case, I use the mapper-attachment plugin.
I send the XML file as an attachment.

HTH
David :wink:
@dadoonet

Le 11 janv. 2012 à 02:33, Mohit Anchlia mohitanchlia@gmail.com a écrit :

I am just starting to use elasticsearch. We have some documents in EDI
format and XML format that we need to index. For eg EDI format is
something like:

0120TRANA 770034661 PREPARER'S
AGENTE20080522010080014302AV901005 TZ #
0120
TRANB 7700346616220 GREENWICH DR SAN DIEGO CA
92122 8585258010 #
0120ACK 5618383330100800143020001000000000000C0004
200805220090100500838801 1 NJ#
0120
ACKR 561838333 01FORM 1040 00001000000100100504
#
0120****RECAP
000000000001010080014302000000000000000001000000000000000000000001000000
#

and everyone know xml :slight_smile:

Is there a best practice or some best way of how it should be indexed
in terms of "JSON format", index fields or settings? Or just throw in
the document say with field name as "document"?


(Shay Banon) #5

I suggest just convert the XML to json, it should be simple as they are
quite close in format.

On Wed, Jan 11, 2012 at 7:24 PM, David Pilato david@pilato.fr wrote:

Yes, It index the content, as for pdf, ooo files and so on.

HTH
David :wink:
@dadoonet

Le 11 janv. 2012 à 17:41, Mohit Anchlia mohitanchlia@gmail.com a écrit :

Does this plugin index the text in the document with different terms
or it just stores as an attachment. for eg: If I wanted to say get me
all xml documents that has value "xyz", would that be possible?

On Tue, Jan 10, 2012 at 10:37 PM, David Pilato david@pilato.fr wrote:

For the same use case, I use the mapper-attachment plugin.
I send the XML file as an attachment.

HTH
David :wink:
@dadoonet

Le 11 janv. 2012 à 02:33, Mohit Anchlia mohitanchlia@gmail.com a
écrit :

I am just starting to use elasticsearch. We have some documents in EDI
format and XML format that we need to index. For eg EDI format is
something like:

0120TRANA 770034661 PREPARER'S
AGENTE20080522010080014302AV901005 TZ #
0120
TRANB 7700346616220 GREENWICH DR SAN DIEGO CA
92122 8585258010 #
0120ACK 5618383330100800143020001000000000000C0004
200805220090100500838801 1 NJ#
0120
ACKR 561838333 01FORM 1040 00001000000100100504
#
0120****RECAP

000000000001010080014302000000000000000001000000000000000000000001000000

                   #

and everyone know xml :slight_smile:

Is there a best practice or some best way of how it should be indexed
in terms of "JSON format", index fields or settings? Or just throw in
the document say with field name as "document"?


(system) #6