2

I am trying to parse an XML file in Logstash. I want to use XPath to do the parsing of documents in XML. So when I run my config file the data loads into elasticsearch but It is not in the way I want to load the data. The data loaded in elasticsearch is each line in xml document

Structure of my XML file

enter image description here

What I want to achieve:

create fields in elasticsearch that stores the follwing

ID =1
Name = "Finch"

My Config file:

input{
    file{
        path => "C:\Users\186181152\Downloads\stations.xml"
        start_position => "beginning"
        sincedb_path => "/dev/null"
        exclude => "*.gz"
        type => "xml"
    }
}
filter{
    xml{
        source => "message"
        store_xml => false
        target => "stations"
        xpath => [
            "/stations/station/id/text()", "station_id",
            "/stations/station/name/text()", "station_name"
        ]
    }
}

output{
    elasticsearch{
        codec => json
        hosts => "localhost"
        index => "xmlns"
    }
    stdout{
        codec => rubydebug
    }
}

Output in Logstash:

{
    "station_name" => "%{station_name}",
    "path" => "C:\Users\186181152\Downloads\stations.xml",
    "@timestamp" => 2018-02-09T04:03:12.908Z,
    "station_id" => "%{station_id}",
    "@version" => "1",
    "host" => "BW",
    "message" => "\t\r",
    "type" => "xml"
}
14
  • I don't think dev/null is supported on Windows. Commented Feb 9, 2018 at 17:02
  • Is the whole xml file on the same line, i.e. no line break? Because if it's not the case, the file will be treated line by line (as indicated in the doc), thus causing the empty station_id and station_name. Commented Feb 9, 2018 at 17:06
  • @baudsp Dev/null works fine. I tried a csv file and it loaded the data correctly Commented Feb 9, 2018 at 17:11
  • @baudsp the whole xml file is not on same line. the file follows standard xml file conventions. one tag on one line Commented Feb 9, 2018 at 17:13
  • 1
    The file input read line by line, creating one message per line, explaining your result. You'll have to use the multiline codec on your input. See this stackoverflow.com/questions/34800559/… Commented Feb 9, 2018 at 17:18

1 Answer 1

6

The multiline filter allows to create xml file as a single event and we can use xml-filter or xpath to parse the xml to ingest data in elasticsearch. In the multiline filter, we mention a pattern( in below example) that is used by logstash to scan your xml file. Once the pattern matches all the entries after that will be considered as a single event.

The following is an example of working config file for my data

input {
    file {
        path => "C:\Users\186181152\Downloads\stations3.xml"
        start_position => "beginning"
        sincedb_path => "/dev/null"
        exclude => "*.gz"
        type => "xml"
        codec => multiline {
            pattern => "<stations>" 
            negate => "true"
            what => "previous"
        }
    }
}

filter {
    xml {
        source => "message"
        store_xml => false
        target => "stations"
        xpath => [
            "/stations/station/id/text()", "station_id",
            "/stations/station/name/text()", "station_name"
        ]
    }
}

output {
    elasticsearch {
        codec => json
        hosts => "localhost"
        index => "xmlns24"
    }
    stdout {
        codec => rubydebug
    }
}   
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.