0

I am trying to extract data from an XML file, sample structure below:

<pwx creator="PerfPRO" version="1.0">
<workout>
<athlete></athlete>
<title></title>
<sportType>Bike</sportType>
<cmt></cmt>
<device id=""></device>
<time>2016-01-19T08:01:00</time>
<summarydata>
    <beginning>0</beginning>
    <duration>3600.012</duration>
</summarydata>
<segment>
    <summarydata>
        <beginning>0</beginning>
        <duration>120</duration>
    </summarydata>
</segment>
<segment>
    <summarydata>
        <beginning>120</beginning>
        <duration>120</duration>
    </summarydata>
</segment>
<segment>
    <summarydata>
        <beginning>240</beginning>
        <duration>120</duration>
    </summarydata>
</segment>

I would like to access the data in the 'segment' blocks (both beginning and duration) ideally as a data frame. There are numerous segment blocks.

I have tried numerous things and still can't seem to extract it, all I get is an empty list. Here is what I have done (pwx is the file name):

xmlData <- xmlInternalTreeParse(pwx, useInternalNodes = TRUE)
xmltop = xmlRoot(XMLdata)

d <- xpathSApply(doc = xmlData, path = "//pwx/workout/segment/summarydata/beginning", fun = xmlValue)

I can also seem to access all the segments through:

segment <- xmltop[[1]]["segment"]

but can't seem to get the values. I have tried numerous variations on the above.

Any help greatly appreciated, thanks.

edit:

> summary(xmlData)
$nameCounts

    cad        dist          hr         pwr      sample         spd  timeoffset   beginning 
   3274        3274        3274        3274        3274        3274        3274          16 


duration summarydata     segment     athlete         cmt      device        make       model 
         16          16          15           1           1           1           1           1 
       name         pwx   sportType        time       title     workout 
          1           1           1           1           1           1 

$numNodes
[1] 22992
0

3 Answers 3

6

Here's some raw xml2 processing with a little purrr thrown in:

library(xml2)
library(purrr)

nodes <- xml_find_all(doc, ".//segment/summarydata")

map_df(nodes, function(x) {
  kids <- xml_children(x)
  setNames(as.list(type.convert(xml_text(kids))), xml_name(kids))
})

## Source: local data frame [3 x 2]
## 
##   beginning duration
##       (int)    (int)
## 1         0      120
## 2       120      120
## 3       240      120
Sign up to request clarification or add additional context in comments.

Comments

3

You should check out rvest. The following may not be the most elegant way to use it, but it works.

some_xml <- paste0(your_xml,'</workout></pwx>') # your example plus end of data to complete

library('rvest')
some_xml %>% read_xml %>% xml_nodes('summarydata')  -> nodes
nodes %>% xml_nodes('beginning') %>% xml_text -> beginning
nodes %>% xml_nodes('duration') %>% xml_text -> duration
data.frame(beginning, duration, stringsAsFactors = FALSE)
#   beginning duration
# 1         0 3600.012
# 2         0      120
# 3       120      120
# 4       240      120

1 Comment

I added the ending line to the original question, just fyi.
0

Thank you to all who replied and offered an answer. I couldn't get the suggested answers to work as they are above (possibly my own failings).

For completeness and reference I managed to get this to work:

pwx <- "myfile.pwx"
xmlData <- xmlInternalTreeParse(pwx, useInternalNodes = TRUE)
xmltop = xmlRoot(xmlData)
nodes <- getNodeSet(xmltop, '//as:summarydata', namespaces = c(as=xmlNamespace(xmltop)))
df <-xmlToDataFrame(nodes)

Outputs:

     beginning duration
1          0 3600.012
2          0      120
3        120      120
4        240      120
5        360      120
6        480      600
7       1080      180
8       1260      300
9       1560      300
10      1860      180
11      2040      300
12      2340      300
13      2640      180
14      2820      300
15      3120      300
16      3420  180.015

Thanks,

M

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.