Extracting parts of array elements using python

Question

I am working to extract all integer values from a specific column (left, top, length and width) in a csv file with multiple rows and columns. I have used pandas to isolate the columns I am interested in but Im stuck on how to use a specific parts of an array.

Let me explain: I need to use the CSV file's column with "left, top, length and width" attributes to then obtain xmin, ymin, xmax and ymax (these are coordinated of boxes in images). Example of a row in this column looks like so:

[{"left":171,"top":0,"width":163,"height":137,"label":"styrofoam container"},{"left":222,"top":42,"width":45,"height":70,"label":"chopstick"}]

And I need to extract the 171, 0, 163 and 137 to do the necessary operations for finding my xmax, xmin, ymax and ymin

The above line is a single row in my pandas array, how do I extract the numbers I need for running my operations?

Here is the code I wrote to extract the column and this is what I have so far:

import os
import csv
import pandas
import numpy as np

csvPath = "/path/of/my/csvfile/csvfile.csv"

data = pandas.read_csv(csvPath)
csv_coords = data['Answer.annotation_data'].values #column with the coordinates
image_name = data ['Input.image_url'].values
print csv_coords[2]

The expected output is the height, width, left and top for each instance so that it can be manipulated for further use — Veejay
– Veejay, Commented Aug 17, 2018 at 8:00

jezrael · Accepted Answer · 2018-08-17 08:18:19Z

1

Use:

import ast

d = {'Answer.annotation_data': ['[{"left":171,"top":0,"width":163,"height":137,"label":"styrofoam container"},{"left":222,"top":42,"width":45,"height":70,"label":"chopstick"}]',
                                '[{"left":170,"top":10,"width":173,"height":157,"label":"styrofoam container"},{"left":222,"top":42,"width":45,"height":70,"label":"chopstick"}]']}
df = pd.DataFrame(d)

print (df)
                              Answer.annotation_data
0  [{"left":171,"top":0,"width":163,"height":137,...
1  [{"left":170,"top":10,"width":173,"height":157...

#convert string data to list of dicts if necessary
df['Answer.annotation_data'] = df['Answer.annotation_data'].apply(ast.literal_eval)

For each value of cols extract values of dict and return DataFrame, last join together by concat:

def get_val(val):
    comb = [[y.get(val, np.nan) for y in x] for x in df['Answer.annotation_data']]
    return pd.DataFrame(comb).add_prefix('{}_'.format(val))

cols = ['left','top','width','height']
df1 = pd.concat([get_val(x) for x in cols], axis=1)
print (df1)
   left_0  left_1  top_0  top_1  width_0  width_1  height_0  height_1
0     171     222      0     42      163       45       137        70
1     170     222     10     42      173       45       157        70

edited Aug 17, 2018 at 8:18

answered Aug 17, 2018 at 7:47

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Veejay Over a year ago

This is great! Considering that there are two lefts, tops, widths and heights in this particular case and df["left"] is an object that looks like [171,222], how would you go about breaking this into its own integer values say left_1 and left_2?

jezrael Over a year ago

@Veejay - added general solution for multiple values in lists, not only for 2, but for N values.

Veejay Over a year ago

Perfect! Thank you

tif · Accepted Answer · 2018-08-17 08:16:12Z

0

To access one field in your DataFrame

`data.loc[row][column]` or `data.loc[row,column]`

e.g.

`data.loc[0]['left']

To find, e.g. the minimum of the top values globally

min(data['top'])

edited Aug 17, 2018 at 8:16

answered Aug 17, 2018 at 7:43

tif

1,49410 silver badges14 bronze badges

2 Comments

Veejay Over a year ago

I get a "KeyError: 'left'" when I try to work with data.loc. Could this be because data.loc isnt reading a specific column but instead the entire csv file?

tif Over a year ago

data.loc is not reading the file any more, it was the read_csv() that did it. Afterwards, everything is in the memory, i.e., the DataFrame addressed by data. For me, both of the above solutions work and return 171.

Collectives™ on Stack Overflow

Extracting parts of array elements using python

2 Answers 2

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related