5

I am in the process of trying to use Logstash to convert an XML into JSON for ElasticSearch. I am able to get the the values read and sent to ElasticSearch. The issue is that all the values come out as arrays. I would like to make them come out as just strings. I know I can do a replace for each field individually, but then I run into an issue with nested fields being 3 levels deep.

XML

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<acs2:SubmitTestResult xmlns:acs2="http://tempuri.org/" xmlns:acs="http://schemas.sompleace.org" xmlns:acs1="http://schemas.someplace.org">
    <acs2:locationId>Location Id</acs2:locationId>
    <acs2:userId>User Id</acs2:userId>
    <acs2:TestResult>
        <acs1:CreatedBy>My Name</acs1:CreatedBy>
        <acs1:CreatedDate>2015-08-07</acs1:CreatedDate>
        <acs1:Output>10.5</acs1:Output>
    </acs2:TestResult>
</acs2:SubmitTestResult>

Logstash Config

input {
    file {
        path => "/var/log/logstash/test.xml"
    }
}
filter {
    multiline {
        pattern => "^\s\s(\s\s|\<\/acs2:SubmitTestResult\>)"
        what => "previous"
    }
    if "multiline" in [tags] {
        mutate {
            replace => ["message", '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>%{message}']
        }
        xml {
            target => "SubmitTestResult"
            source => "message"
        }
        mutate {
            remove_field => ["message", "@version", "host", "@timestamp", "path", "tags", "type"]
            remove_field => ["entry", "[SubmitTestResult][xmlns:acs2]", "[SubmitTestResult][xmlns:acs]", "[SubmitTestResult][xmlns:acs1]"]

            # This works
            replace => [ "[SubmitTestResult][locationId]", "%{[SubmitTestResult][locationId]}" ]

            # This does NOT work
            replace => [ "[SubmitTestResult][TestResult][CreatedBy]", "%{[SubmitTestResult][TestResult][CreatedBy]}" ]
        }
    }
}
output {
    stdout {
        codec => "rubydebug"
    }
    elasticsearch {
        index => "xmltest"
        cluster => "logstash"
    }
}

Example Output

{
   "_index": "xmltest",
   "_type": "logs",
   "_id": "AU8IZBURkkRvuur_3YDA",
   "_version": 1,
   "found": true,
   "_source": {
      "SubmitTestResult": {
         "locationId": "Location Id",
         "userId": [
            "User Id"
         ],
         "TestResult": [
            {
               "CreatedBy": [
                  "My Name"
               ],
               "CreatedDate": [
                  "2015-08-07"
               ],
               "Output": [
                  "10.5"
               ]
            }
         ]
      }
    }
}

As you can see, the output is an array for each element (except for the locationId I replaced with). I am trying to not have to do the replace for each element. Is there a way to adjust the config to make the output come put properly? If not, how do I get 3 levels deep in the replace?

--UPDATE--

I figured out how to get to the 3rd level in Test Results. The replace is:

replace => [ "[SubmitTestResult][TestResult][0][CreatedBy]", "%{[SubmitTestResult][TestResult][0][CreatedBy]}" ]
1
  • 1
    Looks like you found your own answer. Thanks for posting both, I found this useful. Commented Jan 27, 2016 at 14:19

1 Answer 1

1

I figured it out. Here is the solution.

replace => [ "[SubmitTestResult][TestResult][0][CreatedBy]", "%{[SubmitTestResult][TestResult][0][CreatedBy]}" ]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.