1

So I've got a dictionary of files stored as pandas dataframe objects, and I'm accessing each file through a for loop to extract the 'Country' column. What I'm looking to do is extract each of these into a list and then take the set of the entire list of lists. Here is the code and my predicament:

    country_setter = []
        for file in files_list:
        country_setter.append(all_comps[file]['Country'].tolist())

    uni_country_setter = ?

The resulting output is a list of lists, with each pandas df ['Country'] column taking a list within the parent list. It looks like this:

[['France',
  'United States',
  'Poland',
  'Poland',
  'Poland',
  'Poland',
  'Hungary',
  'Poland',
  'France',
  'United Kingdom',
    ....
  'Namibia',
  'China',
  'China',
  'Ireland'],
 ['Netherlands',
  'Canada',
  'United States',
  'Canada',
  'Canada',
  'United States',
  'Sweden',
  'Sweden',
  'United Kingdom',
   ....
  'Ireland',
  'Netherlands',
  'Netherlands',
  'France',
  'Hong Kong',
  'France',
  'France',
  'United States',
  'France',
  'United States']]

It's a list with 40 individual lists within it. I can take the set(country_setter[0]) and that works fine in getting me the unique values of the first list, but I need to know the unique values of all files in conjunction.

Let me know if any of you can help. I've pored through stackoverflow and only found one question slightly similar, but they're goal was to maintain the list structure in the unique extraction and used itertools. I want the unique individual values across all of the lists here.

Thank you in advance!

5
  • Can you add data sample? Commented Oct 14, 2017 at 19:08
  • Sure, I'll give the structure. Commented Oct 14, 2017 at 19:16
  • @jezrael does that help? Commented Oct 14, 2017 at 19:21
  • Not 100% sure, but you need unique values form all lists? Commented Oct 14, 2017 at 19:27
  • yes - I need unique values across all of the lists. Commented Oct 14, 2017 at 19:32

1 Answer 1

1

I think you need flatten lists and then create unique list by set:

uni_country_setter = list(set([item for sublist in country_setter for item in sublist]))

EDIT:

First loop is not necessary, is possible use:

uni_country_setter = list(set([item for file in files_list 
                               for item in all_comps[file]['Country'].tolist()]))
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you! I don't think I could have figured that on my own. Can you explain the logic behind that double "for" call? Are you defining each sublist and then iterating through them?
Maybe better explanation is here for flatenning.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.