0

I have a CSV file which looks like:

Detection,Imagename,Frame_Identifier,TL_x,TL_y,BR_x,BR_y,detection_Confidence,Target_Length,Species,Confidence
0,201503.20150619.181140817.204628.jpg,0,272,142.375,382.5,340,0.475837,0,fish,0.475837
1,201503.20150619.181141498.204632.jpg,3,267.75,6.375,422.875,80.75,0.189145,0,fish,0.189145
2,201503.20150619.181141662.204633.jpg,4,820.25,78.625,973.25,382.5,0.615788,0,fish,0.615788
3,201503.20150619.181141662.204633.jpg,4,1257,75,1280,116,0.307278,0,fish,0.307278
4,201503.20150619.181141834.204634.jpg,5,194,281,233,336,0.586944,0,fish,0.586944

I load it as pandas.Dataframe named: imageannotation - I am interested in extracting a dictionary which has as key the imagename (note: Imagename can have duplicate rows), and as value, an other dictionary whit 2 keys: ['bbox',, 'species'], where bbox is a list given by the TL_x, TL_y, BR_x, BR_y values

I can accomplish this with the following code:


test = {
    i: {
        "bbox": imageannotation[imageannotation["Imagename"] == i][
            ["TL_x", "TL_y", "BR_x", "BR_y"]
        ].values,
        "species": imageannotation[imageannotation["Imagename"] == i][
            ["Species"]
        ].values,
    }
    for i in imageannotation["Imagename"].unique()
}

The results looks like this:

mydict = {'201503.20150619.181140817.204628': {'bbox': array([[272.   , 142.375, 382.5  , 340.   ]]),
  'species': array([['fish']], dtype=object)},
 '201503.20150619.181141498.204632': {'bbox': array([[267.75 ,   6.375, 422.875,  80.75 ]]),
  'species': array([['fish']], dtype=object)},
 '201503.20150619.181141662.204633': {'bbox': array([[ 820.25 ,   78.625,  973.25 ,  382.5  ],
         [1257.   ,   75.   , 1280.   ,  116.   ]]),
  'species': array([['fish'],
         ['fish']], dtype=object)},
 '201503.20150619.181141834.204634': {'bbox': array([[194., 281., 233., 336.],
         [766., 271., 789., 293.]]),
  'species': array([['fish'],
         ['fish']], dtype=object)}}

which is what I wanted but can get extremely slow when working on large files.

Q: Do you have any better way to accomplish this?

My final target is to add a new column to a dataframe imagemetadata which is bigger than the has an Imagename field with unique values - and I do this last operation with:

for i in mydict:
    imagemetadata.loc[imagemetadata.Imagename == i, "annotation"] = [test[I]]

1 Answer 1

1

(Revised answer now that I re-read things.)

This seems to be what you might be after; group the annotations by Imagename, make a dict-of-lists out of them, map them into the other dataframe.

import io

import pandas as pd

imageannotation = pd.read_csv(
    io.StringIO(
        """
Detection,Imagename,Frame_Identifier,TL_x,TL_y,BR_x,BR_y,detection_Confidence,Target_Length,Species,Confidence
0,201503.20150619.181140817.204628.jpg,0,272,142.375,382.5,340,0.475837,0,fish,0.475837
1,201503.20150619.181141498.204632.jpg,3,267.75,6.375,422.875,80.75,0.189145,0,fish,0.189145
2,201503.20150619.181141662.204633.jpg,4,820.25,78.625,973.25,382.5,0.615788,0,fish,0.615788
3,201503.20150619.181141662.204633.jpg,4,1257,75,1280,116,0.307278,0,fish,0.307278
4,201503.20150619.181141834.204634.jpg,5,194,281,233,336,0.586944,0,fish,0.586944
"""
    )
)

# (Pretend this comes from a separate file)
imagemetadata = pd.DataFrame({"Imagename": imageannotation.Imagename.unique()})


def make_annotation(r):
    return {
        "bbox": [r.TL_x, r.TL_y, r.BR_x, r.BR_y],
        "species": r.Species,
    }


annotations_by_image = (
    imageannotation.groupby("Imagename")
    .apply(lambda r: r.apply(make_annotation, axis=1).to_list())
    .to_dict()
)
imagemetadata = pd.DataFrame({"Imagename": imageannotation.Imagename.unique()})
imagemetadata["annotation"] = imagemetadata.Imagename.map(annotations_by_image)

print(imagemetadata)

The output is

                              Imagename                                         annotation
0  201503.20150619.181140817.204628.jpg  [{'bbox': [272.0, 142.375, 382.5, 340.0], 'spe...
1  201503.20150619.181141498.204632.jpg  [{'bbox': [267.75, 6.375, 422.875, 80.75], 'sp...
2  201503.20150619.181141662.204633.jpg  [{'bbox': [820.25, 78.625, 973.25, 382.5], 'sp...
3  201503.20150619.181141834.204634.jpg  [{'bbox': [194.0, 281.0, 233.0, 336.0], 'speci...

If you want imagemetadata to have multiple lines if annotation has multiple entries,

imagemetadata = imagemetadata.explode("annotation").reset_index(drop=True)

Revised, again

For a dict-of-lists instead of a list-of-dicts, it's even simpler:

# Generate a bbox column
imageannotation["bbox"] = imageannotation.apply(lambda x: [x.TL_x, x.TL_y, x.BR_x, x.BR_y], axis=1)

# Get the columns we want as a dict
annotations_by_image = imageannotation.groupby("Imagename").agg({"bbox": list, "Species": list}).to_dict("index")

# Apply the annotations to the other df
imagemetadata["annotation"] = imagemetadata.Imagename.map(annotations_by_image)
print(imagemetadata)

The output is

                              Imagename                                         annotation
0  201503.20150619.181140817.204628.jpg  {'bbox': [[272.0, 142.375, 382.5, 340.0]], 'Sp...
1  201503.20150619.181141498.204632.jpg  {'bbox': [[267.75, 6.375, 422.875, 80.75]], 'S...
2  201503.20150619.181141662.204633.jpg  {'bbox': [[820.25, 78.625, 973.25, 382.5], [12...
3  201503.20150619.181141834.204634.jpg  {'bbox': [[194.0, 281.0, 233.0, 336.0]], 'Spec...
Sign up to request clarification or add additional context in comments.

2 Comments

It is much cleaner, but the output is not exactly the one I needed - probably my fault in building the question - I fixed it - in the example, the image 201503.20150619.181141662.204633.jpg has 2 rows, which means 2 bbox and 2 species - so the resulting output should be ``` {'201503.20150619.181141662.204633.jpg': {'bbox': [[820.25, 78.625, 973.25, 382.5], [1257, 75, 1280, 116]], 'species': [['fish'], ['fish']]}} ```
@epifanio Sure. Please see my edit.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.