0

I'm pretty new to all this python and Excel things. I have a really big excel sheet with 1 million rows, my goal is to create a list with a few for list (I think that is what I need to do) and the problem is the follow:

I have something like this:

enter image description here

I want to add to a list all the POINT_Y and POINT_X (gps coordinates) where the objectid is the same and when the objectid changes I want to add those values to another list.

I only manage to get the data from the excel, my problem is in the construction of the lists.

import pandas
from shapely.geometry import Point

polygon_sheet =pandas.read_excel("C:...\coords.xlsx")
for row in polygon_sheet.index:
    data = (polygon_sheet['POINT_Y'][row]) + ", " + \
           (polygon_sheet['POINT_X'][row])

I'm sure I need a nested for loop but I don't know how to obtain what I want, that is something like this:

id1 = [[41.15226404, 25.41327506], [41.15221095, 25.41363158], [41.15220779, 25.41395401], [...(all coords from id1)]]

id2= [[41.11617356, 25.53488120], [41.11541559, 25.53517417], [41.11494893, 25.53531324], [...(all coords from id2]]

And so on until all ID's are iterated.

Hope I could explain what I expect correctly. Appreciate any help

4
  • 2
    Look up Pandas groupby. Commented Sep 16, 2019 at 10:26
  • where do you want to store those output lists ? to another pandas df ? or in a dict ? Commented Sep 16, 2019 at 10:29
  • @rusu_ro1 in a dict. For each ID I need a dict? Commented Sep 16, 2019 at 10:31
  • if you want to store in a dict than each id is a key from dict Commented Sep 16, 2019 at 10:33

2 Answers 2

1

you can apply a custom function over each row in order to create a dict with your desired output:

from collections import defaultdict

id_points = defaultdict(list)

def extrct_points(row):
    o_id, point_x, point_y = row
    id_points[o_id].append([point_x, point_y])

polygon_sheet.apply(extrct_points, axis=1)

for example:

polygon_sheet = pd.DataFrame({'OBJECTID': [1, 1,  2, 2, 2, 3], 'POINT_X':[0.23, 0.123, 0.1, 0.1, 0.2, 1],
                  'POINT_Y':[0.223, 0.3123, 0.11, 0.11, 0.22, 11]})

enter image description here

from pprint import pprint

polygon_sheet.apply(extrct_points, axis=1)

pprint(id_points)

output:

defaultdict(<class 'list'>,
            {1.0: [[0.23, 0.223], [0.123, 0.3123]],
             2.0: [[0.1, 0.11], [0.1, 0.11], [0.2, 0.22]],
             3.0: [[1.0, 11.0]]})
Sign up to request clarification or add additional context in comments.

Comments

1

You can create a dict of lists.Try something like this:

import pandas
from shapely.geometry import Point

dt = {}
polygon_sheet =pandas.read_excel("C:...\coords.xlsx")
for row in polygon_sheet.index:
    try:
        dt[polygon_sheet['ID'][row]] += [[polygon_sheet['POINT_Y'][row], polygon_sheet['POINT_X'][row]]]
    except:
        dt.update({polygon_sheet['ID'][row]: [[polygon_sheet['POINT_Y'][row], polygon_sheet['POINT_X'][row]]]})

print(dt)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.