XML with multiple fields to Elastic Search

Hi, I am a newbie to ELK and have been browsing these forums and other tutorials in search of some comprehensive guide for handling XML files.Now most of the examples i saw are handling simple XMLs ( and pretty small ones ).

Basically my aim is to load up some big XML file data to ElasticSearch. Here's a small part of one of my XMLs :

    <PubmedArticleSet>
  <PubmedArticle>
    <MedlineCitation Status="PubMed-not-MEDLINE" Owner="NLM">
      <PMID Version="1">25871255</PMID>
      <DateCompleted>
        <Year>2015</Year>
        <Month>09</Month>
        <Day>10</Day>
      </DateCompleted>
      <DateRevised>
        <Year>2015</Year>
        <Month>04</Month>
        <Day>15</Day>
      </DateRevised>
      <Article PubModel="Print-Electronic">
        <Journal>
          <ISSN IssnType="Electronic">1550-2376</ISSN>
          <JournalIssue CitedMedium="Internet">
            <Volume>91</Volume>
            <Issue>3</Issue>
            <PubDate>
              <Year>2015</Year>
              <Month>Mar</Month>
            </PubDate>
          </JournalIssue>
          <Title>Physical review. E, Statistical, nonlinear, and soft matter physics</Title>
          <ISOAbbreviation>Phys Rev E Stat Nonlin Soft Matter Phys</ISOAbbreviation>
        </Journal>
        <ArticleTitle>Comment on "Acoustical observation of bubble oscillations induced by bubble popping".</ArticleTitle>
        <Pagination>
          <MedlinePgn>036401</MedlinePgn>
        </Pagination>
        <Abstract>
<AbstractText>We have reproduced the experiment of acoustic monitoring of spontaneous popping of single soap bubbles standing in air reporte
d by Ding et al. [2aaPhys. Rev. E 75, 041601 (2007)]. By using a single microphone and two different signal acquisition systems recording in parallel 
the signal at the microphone output, among them the system used by Ding et al., we have experimentally evidenced that the acoustic precursors of bubble popping events detected by Ding et al. actually result from an acausal artifact of the signal processing performed by their acquisition system which lies outside of its prescribed working frequency range. No acoustic precursor of popping could be evidenced with the microphone used in these experiments, whose sensitivity is 1VPa-1 and frequency range is 500 Hz-100 kHz.</AbstractText>
        </Abstract>
        <AuthorList CompleteYN="Y">
          <Author ValidYN="Y">
            <LastName>Blanc</LastName>
            <ForeName>É</ForeName>
            <Initials>É</Initials>
            <AffiliationInfo>
              <Affiliation>Sorbonne Universités, UPMC Univ Paris 06, UMR 7190, Institut Jean Le Rond d'Alembert, F-75005 Paris, France.</Affiliation>
            </AffiliationInfo>
            <AffiliationInfo>
              <Affiliation>and CNRS, UMR 7190, Institut Jean Le Rond d'Alembert, F-75005 Paris, France.</Affiliation>
            </AffiliationInfo>
          </Author>
          <Author ValidYN="Y">
            <LastName>Ollivier</LastName>
            <ForeName>F</ForeName>
            <Initials>F</Initials>
            <AffiliationInfo>
              <Affiliation>Sorbonne Universités, UPMC Univ Paris 06, UMR 7190, Institut Jean Le Rond d'Alembert, F-75005 Paris, France.</Affiliation>
            </AffiliationInfo>
            <AffiliationInfo>
              <Affiliation>and CNRS, UMR 7190, Institut Jean Le Rond d'Alembert, F-75005 Paris, France.</Affiliation>
            </AffiliationInfo>
          </Author>
          <Author ValidYN="Y">
            <LastName>Antkowiak</LastName>
            <ForeName>A</ForeName>
            <Initials>A</Initials>
  • So kindly suggest me best possible way to handle these files.
  • Do i have to manually specify all the fields in filter or Is there any shortcut ( using dtd ? ) ?
  • How to handle multiple fields , for example : AuthorList has multiple authors.

Please guide me in the right direction. Thanks for any help!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.