2

I am working to extract all integer values from a specific column (left, top, length and width) in a csv file with multiple rows and columns. I have used pandas to isolate the columns I am interested in but Im stuck on how to use a specific parts of an array.

Let me explain: I need to use the CSV file's column with "left, top, length and width" attributes to then obtain xmin, ymin, xmax and ymax (these are coordinated of boxes in images). Example of a row in this column looks like so:

[{"left":171,"top":0,"width":163,"height":137,"label":"styrofoam container"},{"left":222,"top":42,"width":45,"height":70,"label":"chopstick"}]

And I need to extract the 171, 0, 163 and 137 to do the necessary operations for finding my xmax, xmin, ymax and ymin

The above line is a single row in my pandas array, how do I extract the numbers I need for running my operations?

Here is the code I wrote to extract the column and this is what I have so far:

import os
import csv
import pandas
import numpy as np

csvPath = "/path/of/my/csvfile/csvfile.csv"

data = pandas.read_csv(csvPath)
csv_coords = data['Answer.annotation_data'].values #column with the coordinates
image_name = data ['Input.image_url'].values
print csv_coords[2]
2
  • What is expected output? Commented Aug 17, 2018 at 7:48
  • The expected output is the height, width, left and top for each instance so that it can be manipulated for further use Commented Aug 17, 2018 at 8:00

2 Answers 2

1

Use:

import ast

d = {'Answer.annotation_data': ['[{"left":171,"top":0,"width":163,"height":137,"label":"styrofoam container"},{"left":222,"top":42,"width":45,"height":70,"label":"chopstick"}]',
                                '[{"left":170,"top":10,"width":173,"height":157,"label":"styrofoam container"},{"left":222,"top":42,"width":45,"height":70,"label":"chopstick"}]']}
df = pd.DataFrame(d)

print (df)
                              Answer.annotation_data
0  [{"left":171,"top":0,"width":163,"height":137,...
1  [{"left":170,"top":10,"width":173,"height":157...

#convert string data to list of dicts if necessary
df['Answer.annotation_data'] = df['Answer.annotation_data'].apply(ast.literal_eval)

For each value of cols extract values of dict and return DataFrame, last join together by concat:

def get_val(val):
    comb = [[y.get(val, np.nan) for y in x] for x in df['Answer.annotation_data']]
    return pd.DataFrame(comb).add_prefix('{}_'.format(val))

cols = ['left','top','width','height']
df1 = pd.concat([get_val(x) for x in cols], axis=1)
print (df1)
   left_0  left_1  top_0  top_1  width_0  width_1  height_0  height_1
0     171     222      0     42      163       45       137        70
1     170     222     10     42      173       45       157        70
Sign up to request clarification or add additional context in comments.

3 Comments

This is great! Considering that there are two lefts, tops, widths and heights in this particular case and df["left"] is an object that looks like [171,222], how would you go about breaking this into its own integer values say left_1 and left_2?
@Veejay - added general solution for multiple values in lists, not only for 2, but for N values.
Perfect! Thank you
0

To access one field in your DataFrame

`data.loc[row][column]` or `data.loc[row,column]`

e.g.

`data.loc[0]['left']

To find, e.g. the minimum of the top values globally

min(data['top'])

2 Comments

I get a "KeyError: 'left'" when I try to work with data.loc. Could this be because data.loc isnt reading a specific column but instead the entire csv file?
data.loc is not reading the file any more, it was the read_csv() that did it. Afterwards, everything is in the memory, i.e., the DataFrame addressed by data. For me, both of the above solutions work and return 171.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.