1

I’m trying to split downloaded data to an 2D array into different datatypes. The downloaded data looks like this:

000|17:40
000|17:45
010|17:50
025|17:55
056|18:00
178|18:05
202|18:10
203|18:15
190|18:20
072|18:25
013|18:30
002|18:35
000|18:40
000|18:45
000|18:50
000|18:55
000|19:00
000|19:05
000|19:10
000|19:15
000|19:20
000|19:25
000|19:30
000|19:35
000|19:40

I’m using the following code to parse this into a two dimensional array:

#!/usr/bin/python

import urllib2

response = urllib2.urlopen('http://gps.buienradar.nl/getrr.php?lat=52&lon=4')
html = response.read()
htmlsplit = []

for record in html.split("\r\n"):
    htmlsplit.append(record.split("|"))

print htmlsplit

This is working great, but as expected, it treats it as a string. I’ve found some examples that splits into integers. That’s great if both sides where integers. But in my case it’s an integer | string (or maybe some kind of Python time format)

How can I split this directly into different data types?

1
  • What kind of array? module array.array (weird)? List? Numpy array? Commented Jun 17, 2014 at 21:26

2 Answers 2

3

Something like this?

for record in html.split("\r\n"):  # beware, newlines are treacherous!
    s = record.split("|")
    htmlsplit.append((int(s[0]), s[1]))

Just write a parser for each record, if you have data this simple. However, I would add some try/except clause to catch errors for non-conforming lines, empty lines, etc. which may be present in the data. The code above is very fragile. Also, you might want to break at only \n and then clean your strings by strip() (i.e. replace s[1] by s[1].strip()). The integer conversion takes care of it automatically.

Sign up to request clarification or add additional context in comments.

3 Comments

Hi DrV, Thank you! I've changed it into this: for record in html.splitlines(): s = record.split("|") htmlsplit.append((int(s[0]),s[1])) Used splitlines from the advice from Aron Hall below. And added the : you forgot ;) One question, do I need to "free" the temporary "s" to rule out memory leaks?
I will use try/except, thanks for pointing that out. (This is my first time on Stackoverflow) but what a horrible reply editor is this. I hope I can make sense without the line-breaks)
@Satoer I just omitted tho colon to see if you are awake :) (Fixed now.) No need to free variables in python; if you are not using them anymore (no-one references to them), a big yellow lorry with the text "Garbage Collector" comes and picks them us. I suggest you do some reading on GC and python's "everything is an object" model, as understanding the basics is sometimes useful. BTW, Aaron Hall's solution is in a way more pythonic than mine; once you learn the basics, you'll learn to love the nice modules available!
1

Use str.splitlines instead of splitting on \r\n Use the csv module to iterate over the lines:

import csv
txt = '000|17:40\n000|17:45\n000|17:50\n000|17:55\n000|18:00\n000|18:05\n000|18:10\n000|18:15\n000|18:20\n000|18:25\n000|18:30\n000|18:35\n000|18:40\n000|18:45\n000|18:50\n000|18:55\n000|19:00\n000|19:05\n000|19:10\n000|19:15\n000|19:20\n000|19:25\n000|19:30\n000|19:35\n000|19:40\n'

reader = csv.reader(txt.splitlines(), delimiter='|')
column1 = []
column2 = []
for c1, c2 in reader:
    column1.append(c1)
    column2.append(c2)

You can also use the DictReader

import StringIO
reader2 = csv.DictReader(StringIO.StringIO(txt), 
                         fieldnames=['int', 'time'], 
                         delimiter='|')

column1 = []
column2 = []
for row in reader2:
    column1.append(row['time'])
    column2.append(row['int'])

1 Comment

Hi Aaron, thanks for the " splitlines" advice. I've discovered that this keeps the array free from a closing empty record. Thanks for the solution, but the solution from DrV does exactly what I need.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.