Using Tree like structure in python

Question

Background

I have a bucket on s3 called sample-level-test. It contains folders for each day such as 2020-10-08, 2020-10-09 and 2020-10-10.

Each date folder contains many folders that are id of a player like 2020-10-08/31001457373383, 2020-10-08/31001457373383 etc.

The folders 31001457373383 and 31001457373383 are player level folders and each such player level folder contains 3 files.

My Code

import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket(name="sample-level-test")

 for my_bucket_object in my_bucket.objects.all():
     print(my_bucket_object)

My code sample output

s3.ObjectSummary(bucket_name='sample-level-test', key='2020-10-08/31001457373383/player-DNA.json')
s3.ObjectSummary(bucket_name='sample-level-test', key='2020-10-08/31001457373383/player-DNA.csv')
s3.ObjectSummary(bucket_name='sample-level-test', key='2020-10-08/31001457373383/player-DNA_report.tsv')
s3.ObjectSummary(bucket_name='sample-level-test', key='2020-10-09/31001461776686/player-DNA.json')
s3.ObjectSummary(bucket_name='sample-level-test', key='2020-10-09/31001461776686/player-DNA.csv')
s3.ObjectSummary(bucket_name='sample-level-test', key='2020-10-09/31001461776686/player-DNA_report.tsv')
s3.ObjectSummary(bucket_name='sample-level-test', key='2020-10-10/310014685532736/player-DNA.json')
s3.ObjectSummary(bucket_name='sample-level-test', key='2020-10-10/310014685532736/player-DNA.csv')
s3.ObjectSummary(bucket_name='sample-level-test', key='2020-10-10/310014685532736/player-DNA_report.tsv')

My Problem

I am trying to create a tiny service that given some number X. It will return the X total player keys in the bucket by oldest date.

For example if X = 1 then my output should be ['2020-10-08/31001457373383'].

For example if X = 2 then my output should be ['2020-10-08/31001457373383', '2020-10-09/31001461776686'].

My Current Approach

Currently i loop through the entire output which is essentially the list of all objects in the bucket and i parse out individual date folders. Then i make check each date folder and get keys until i hit X.

I think my approach is flawed and very slow. I am wondering if there is a better way to approach this. I know in Java there is tree data structures where i can store this kind of directory output in a tree format and it would be fast to retrieve info if needed. Is there something similar i can use in python?

is it essential to tree like structure? It seems like your output is already sorted? — user2056487
– user2056487, Commented Nov 25, 2020 at 2:58

Jonathan Leon · Accepted Answer · 2020-11-26 20:08:37Z

2

assuming your structure is all the same, you can split the keys, remove duplicates and get your answer by slicing the resulting list

keys_list = ['2020-10-08/31001457373383/player-DNA.json',
'2020-10-08/31001457373383/player-DNA.csv',
'2020-10-08/31001457373383/player-DNA_report.tsv',
'2020-10-09/31001461776686/player-DNA.json',
'2020-10-09/31001461776686/player-DNA.csv',
'2020-10-09/31001461776686/player-DNA_report.tsv',
'2020-10-10/310014685532736/player-DNA.json',
'2020-10-10/310014685532736/player-DNA.csv',
'2020-10-10/310014685532736/player-DNA_report.tsv']

x=2
new_list = list(set([s.split('/player')[0] for s in keys_list]))
new_list.sort()
answer_list = new_list[0:x]

Output for x=2

['2020-10-08/31001457373383', '2020-10-09/31001461776686']

answered Nov 26, 2020 at 20:08

Jonathan Leon

5,6862 gold badges9 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Using Tree like structure in python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related