0

I have the following code to extract data from XML to CSV file, but there is an error and I don't know how to solve it.

if anyone can help, please.

url = "http://90.161.233.78:65519/services/user/records.xml?begin=04052022?end=06052022?var=EDSLINEEMBEDDED.Module2.VI1?var=EDSLINEEMBEDDED.Module2.API1?period=900"

s = unescape(requests.get(url).text)[5:-6]
df = pd.read_xml(s, xpath="//record/* | //dateTime")
df["field"] = df["field"].ffill()
df.to_csv('output0.csv')

The Error is

  doc = fromstring(
  File "src\lxml\etree.pyx", line 3252, in lxml.etree.fromstring
  File "src\lxml\parser.pxi", line 1913, in lxml.etree._parseMemoryDocument
  File "src\lxml\parser.pxi", line 1800, in lxml.etree._parseDoc
  File "src\lxml\parser.pxi", line 1141, in lxml.etree._BaseParser._parseDoc
  File "src\lxml\parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc
  File "src\lxml\parser.pxi", line 725, in lxml.etree._handleParseResult
  File "src\lxml\parser.pxi", line 654, in lxml.etree._raiseParseError
  File "<string>", line 1
lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 1, column 2
0

1 Answer 1

1

Consider reading URL without requests or escaping content directly into pandas.read_xml(). Per docs, emphasis added:

path_or_buffer: str, path object, or file-like object

String, path object (implementing os.PathLike[str]), or file-like object implementing a read() function. The string can be any valid XML string or a path. The string can further be a URL. Valid URL schemes include http, ftp, s3, and file.

import pandas as pd

url = (
    "http://90.161.233.78:65519/services/user/records.xml?"
    "begin=04052022?end=06052022?var=EDSLINEEMBEDDED.Module2.VI1?"
    "var=EDSLINEEMBEDDED.Module2.API1?period=900"
)

df = pd.read_xml(url, xpath="//record/* | //dateTime")

# FILL PARENT TEXT FORWARD TO CHILD ITEMS
df["dateTime"] = df["dateTime"].ffill()

# DROP UNNEEDED ROWS
df[(pd.notnull(df["id"])) & (pd.notnull(df["value"]))]

df.to_csv('output0.csv')
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, Thanks. I got the following data : dateTime,id,value 1,9052022000000000.0,EDSLINEEMBEDDED.Module2.VI1,240.944444 2,9052022000000000.0,EDSLINEEMBEDDED.Module2.API1,10.981189 4,9052022001500000.0,EDSLINEEMBEDDED.Module2.VI1,240.155556 ` ` But, Please if you can how I can edit the format of the DateTime column to be like 2022-05-09 00:15:00. you can help!
Simply, convert integer timestamps with pandas.to_datetime().

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.