0

I'm wondering how I could build a .csv file with a proper structure. As an example, my data has the form:

(indice, latitude, longitude, value)

- 0 - lat=-51.490000 lon=264.313000 value=7.270077
- 1 - lat=-51.490000 lon=264.504000 value=7.231014
- 2 - lat=-51.490000 lon=264.695000 value=21.199764
- 3 - lat=-51.490000 lon=264.886000 value=49.176327
- 4 - lat=-51.490000 lon=265.077000 value=91.160702
- 5 - lat=-51.490000 lon=265.268000 value=147.152889
- 6 - lat=-51.490000 lon=265.459000 value=217.152889
- 7 - lat=-51.490000 lon=265.650000 value=301.160702
- 8 - lat=-51.490000 lon=265.841000 value=399.176327
- 9 - lat=-51.490000 lon=266.032000 value=511.199764
- 10 - lat=-51.490000 lon=266.223000 value=637.231014
- 11 - lat=-51.490000 lon=266.414000 value=777.270077
- 12 - lat=-51.490000 lon=266.605000 value=931.316952
- 13 - lat=-51.490000 lon=266.796000 value=1099.371639
- 14 - lat=-51.490000 lon=266.987000 value=1281.434139
- 15 - lat=-51.490000 lon=267.178000 value=1477.504452
- 16 - lat=-51.490000 lon=267.369000 value=1687.582577
- 17 - lat=-51.490000 lon=267.560000 value=1911.668514
- 18 - lat=-51.490000 lon=267.751000 value=2149.762264
- 19 - lat=-51.490000 lon=267.942000 value=2401.863827
- 20 - lat=-51.490000 lon=268.133000 value=2667.973202
- 21 - lat=-51.490000 lon=268.324000 value=2948.090389

I would like to be able to save this data in .csv file with the format:

         | longitude | 
latitude |   value   |   

That is, all the values with the same latitude would be in the same line and all the values with the same longitude would be in the same column. I know how to write a .csv file in Python, I'm wondering how could I perform this transformation properly.

Thank you in advance.

Thank you.

3
  • You will first have to loop over the data to collect all longitudes. Those will be your columns. Then I would probably create a dictionary for each latitude which contains longitude/value pairs. Then you can write a line for each latitude.. you should take a look at the csv.DictWriter class. Commented Sep 16, 2014 at 15:33
  • I'd break up the lines with a regex and then use nested dicts to record the values mydict[latitude][longitude] = value. I'd also make a set of longitudes. The size of this set is the number of columns, make it a list and sort it to get an indexer into the nested list. Sort the latitude keys and off you go. Commented Sep 16, 2014 at 15:36
  • What happens if there are more values pre lat/lon pair? What if there are two latitudes or longitudes which are almost the same but not exactly? Commented Sep 16, 2014 at 16:17

4 Answers 4

1

I wrote a little program for you :) see below.

I'm assuming for now that your data is stored as a list of dicts, but if it is a list of lists the code shouldn't be too hard to fix.

#!/usr/bin/env python

import csv

data = [
    dict(lat=1, lon=1, val=10),
    dict(lat=1, lon=2, val=20),
    dict(lat=2, lon=1, val=30),
    dict(lat=2, lon=2, val=40),
    dict(lat=3, lon=1, val=50),
    dict(lat=3, lon=2, val=60),
]

# get a unique list of all longitudes
headers = list({d['lon'] for d in data})
headers.sort()

# make a dict of latitudes
data_as_dict = {}
for item in data:
    # default value: a list of empty strings
    lst = data_as_dict.setdefault(item['lat'], ['']*len(headers))
    # get the longitute for this item
    lon = item['lon']
    # where in the line should it be?
    idx = headers.index(lon)
    # save value in the list
    lst[idx]=item['val']


# in the actual file, we start with an extra header for the latitude
headers.insert(0,'latitude')

with open('latitude.csv', 'w') as csvfile:
    writer = csv.writer(csvfile, delimiter=' ',
                            quotechar='|', quoting=csv.QUOTE_MINIMAL)
    writer.writerow(headers)
    lats = data_as_dict.keys()
    lats.sort()
    for latitude in lats:
        # a line starts with the latitude, followed by list of values
        l = data_as_dict[latitude]
        l.insert(0, latitude)
        writer.writerow(l)

output:

latitude 1 2
1 10 20
2 30 40
3 50 60

Granted, it's not the prettiest code, but I hope you get the idea

Sign up to request clarification or add additional context in comments.

4 Comments

Hi @rje. Thank you for your answer. A little thing that I forget to ask... It is possible to order by lat and long? My data is ordered but with this code the result isn't. Thank you.
Nope, ordering is not necessary, it'll work with unordered data too!
Yes, but I guess I wasn't clear. It worker, however, my data isn't ordered in the output file as it was in the input. Trying to manage this here.
Ah, I see. Changed the code a bit to sort the headers and keys :)
1

I'm assuming you have this data in a text file. Let's use regular expressions to parse the data (though string splitting looks like it could work if your format stays the same).

import re

data = list()

with open('path/to/data/file','r') as infile:
    for line in infile:
        matches = re.match(r".*(?<=lat=)(?P<lat>(?:\+|-)?[\d.]+).*(?<=value=)(?P<longvalue>(?:\+|-)?[\d.]+)", line)
        data.append((matches.group('lat'), matches.group('longvalue'))

To unroll that nasty regex:

pat = re.compile(r"""
  .*                         Match anything any number of times
  (?<=lat=)                  assert that the last 4 characters are "lat="
  (?P<lat>                   begin named capturing group "lat"
      (?:\+|-)?                allow one or none of either + or -
      [\d.]+                   and one or more digits or decimal points
  )                          end named capturing group "lat"
  .*                         Another wildcard
  (?<=value=)                assert that the last 6 characters are "value="
  (?P<longvalue>             begin named capturing group "longvalue"
      (?:\+|-)?                allow one or none of either + or -
      [\d.]+                   and one or more digits or decimal points
  )                          end named capturing group "longvalue"
""", re.X)

# and a terser way of writing the code, since we've compiled the pattern above:

with open('path/to/data/file', 'r') as infile:
    data = [(matches.group('lat'), matches.group('longvalue')) for line in infile for
            matches in (re.match(pat, line),)]

Comments

1

Given your input data, I came up with the following:

from __future__ import print_function


def decode(line):
    line = line.replace('- ', ' ')
    fields = line.split()
    index = fields[0]
    data = dict([_.split('=') for _ in fields[1:]])
    return index, data


def transform(filename):
    transformed = {}
    columns = set()
    for line in open(filename):
        index, data = decode(line.strip())
        element = transformed.setdefault(data['lat'], {})
        element[data['lon']] = data['value']
        columns.add(data['lon'])
    return columns, transformed


def main(filename):
    columns, transformed = transform(filename)
    columns = sorted(columns)
    print(',', ','.join(columns))
    for lat, data in transformed.items():
        print(lat, ',', ', '.join([data.get(_, 'NULL') for _ in columns]))

if __name__ == '__main__':
    main('so.txt')

Just in case, where the data contains more than only one latitude, I had added one additional line to the example, so my input data (so.txt) contained this:

- 0 - lat=-51.490000 lon=264.313000 value=7.270077
- 1 - lat=-51.490000 lon=264.504000 value=7.231014
- 2 - lat=-51.490000 lon=264.695000 value=21.199764
- 3 - lat=-51.490000 lon=264.886000 value=49.176327
- 4 - lat=-51.490000 lon=265.077000 value=91.160702
- 5 - lat=-51.490000 lon=265.268000 value=147.152889
- 6 - lat=-51.490000 lon=265.459000 value=217.152889
- 7 - lat=-51.490000 lon=265.650000 value=301.160702
- 8 - lat=-51.490000 lon=265.841000 value=399.176327
- 9 - lat=-51.490000 lon=266.032000 value=511.199764
- 10 - lat=-51.490000 lon=266.223000 value=637.231014
- 11 - lat=-51.490000 lon=266.414000 value=777.270077
- 12 - lat=-51.490000 lon=266.605000 value=931.316952
- 13 - lat=-51.490000 lon=266.796000 value=1099.371639
- 14 - lat=-51.490000 lon=266.987000 value=1281.434139
- 15 - lat=-51.490000 lon=267.178000 value=1477.504452
- 16 - lat=-51.490000 lon=267.369000 value=1687.582577
- 17 - lat=-51.490000 lon=267.560000 value=1911.668514
- 18 - lat=-51.490000 lon=267.751000 value=2149.762264
- 19 - lat=-51.490000 lon=267.942000 value=2401.863827
- 20 - lat=-51.490000 lon=268.133000 value=2667.973202
- 21 - lat=-51.490000 lon=268.324000 value=2948.090389
- 22 - lat=-52.490000 lon=268.324000 value=2948.090389

(note the last line)

With that input file, the above program creates the following output:

, 264.313000,264.504000,264.695000,264.886000,265.077000,265.268000,265.459000,265.650000,265.841000,266.032000,266.223000,266.414000,266.605000,266.796000,266.987000,267.178000,267.369000,267.560000,267.751000,267.942000,268.133000,268.324000
-51.490000 , 7.270077, 7.231014, 21.199764, 49.176327, 91.160702, 147.152889, 217.152889, 301.160702, 399.176327, 511.199764, 637.231014, 777.270077, 931.316952, 1099.371639, 1281.434139, 1477.504452, 1687.582577, 1911.668514, 2149.762264, 2401.863827, 2667.973202, 2948.090389
-52.490000 , NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 2948.090389

Comments

1

YOu can pull lat/lon/value from each line using a regex. You'll want to lookup lat and lon later, so use a nested dict of the form d[lat][lon]=value to track it all. Add a set to keep track of the unique longitudes you see, and its pretty straight forward to generate the csv.

I sorted it in the example, but you may not care about that.

import re
import collections

data = """- 0 - lat=-51.490000 lon=264.313000 value=7.270077
- 1 - lat=-51.490000 lon=264.504000 value=7.231014
- 2 - lat=-51.490000 lon=264.695000 value=21.199764
- 3 - lat=-51.490000 lon=264.886000 value=49.176327
- 4 - lat=-51.490000 lon=265.077000 value=91.160702"""

regex = re.compile(r'- \d+ - lat=([\+\-]?[\d\.]+) lon=([\+\-]?[\d\.]+) value=([\+\-]?[\d\.]+)')

# lat/lon index will hold lats[latitude][longitude] = value
lats = collections.defaultdict(dict)
# longitude columns
lonset = set()

for line in data.split('\n'):
    match = regex.match(line)
    if match:
        lat, lon, val = match.groups()
        lats[lat][lon] = val
        lonset.add(lon)

latkeys = sorted(lats.keys())
lonkeys = sorted(list(lonset))

header = ['latitude'] + lonkeys
print header

for lat in latkeys:
    lons = lats[lat]
    row = [lat] + [lons.get(lon, '') for lon in lonkeys]
    print row

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.