1

Good morning,

Basically I have 2 pandas dataframes from CSVs:

Dataframe 1: each row is a group where the row index is a geographical area code, and the columns are the top 5 most similar areas. e.g:

       0    1    2    3    4    5    
Rank                                                       
00C   00C  03H  02D  05H  02E  04E  
00D   00D  02P  02X  01X  03R  06M  

Dataframe 2: This is a larger dataframe with hospital activity numbers broken down by age group, gender and the geographical areas. e.g:

     MALE_0-4  MALE_5-9           FEMALE_80-84  FEMALE_85+
06M        75        59                     43          48
00C       132       121                    173         204
01X        84        63                    124         102
03H       127       131                    130          83
02P        93        89                    208         151
02D        70        62                     92          81
05H        96        76                     52          32
00C       106        62                    123         106
03R        75        59                     43          48
02P        10       121                    173         204
03R        84        63                    124         102
03R        30       131                    130          83
02E        93        89                    208         151
06M        70        62                     92          81
04E        96        76                     52          32
00D       106        62                    123         106

What I am trying to do is create smaller dataframes from Dataframe 2 - filtered by the groupings from Dataframe 1. Each geographical area code can appear in multiple lookups. I have a basic idea of for loops, but can't quite get it to work.

Dataframe Output 1:

    MALE_0-4  MALE_5-9           FEMALE_80-84  FEMALE_85+
00C       132       121                    173         204
03H       127       131                    130          83
02D        70        62                     92          81
05H        96        76                     52          32
00C       106        62                    123         106
02E        93        89                    208         151
04E        96        76                     52          32

Dataframe Output 2:

    MALE_0-4  MALE_5-9           FEMALE_80-84  FEMALE_85+
06M        75        59                     43          48
01X        84        63                    124         102
02P        93        89                    208         151
03R        75        59                     43          48
02P        10       121                    173         204
03R        84        63                    124         102
03R        30       131                    130          83
06M        70        62                     92          81
00D       106        62                    123         106

...

Hope this makes sense and any help would be appreciated.

5
  • 2
    Can't understand what you want! Can you please provide the one of expected output when you give a specific input!! Commented Nov 15, 2018 at 12:19
  • Hi Rahul. I am effectively looking to split the second dataframe into multiple dataframes - filtered so each of the smaller dataframes only include the data for each of the lookups in the first dataframe (with each column in the first dataframe representing the filter criteria). Dataframe 2 is actually much bigger (so includes rows for all the codes in the first dataframe as well as the 07L, 07M....etc already included in the example). Thanks! Commented Nov 15, 2018 at 14:03
  • 1
    Possible duplicate of Filter dataframe rows if value in column is in a set list of values Commented Nov 15, 2018 at 14:34
  • Could you create example with 5 rows and 3 columns with desired output? It's much easier to help you if we have minimal reproducible example. Commented Nov 15, 2018 at 14:36
  • Thanks all. I've changed the examples so the ouputs are correct for the input data. Notice that all the codes in Output 1 are in Dataframe 1's first column, and all the codes in Output 2 are in Dataframe 2's second column. Commented Nov 15, 2018 at 15:37

2 Answers 2

1

Going by the linked duplicate question this is what you should use (sketchy):

for _, row in df1.iterrows():
    broken_down = df2[df2['region'].isin(row)]
Sign up to request clarification or add additional context in comments.

4 Comments

Many thanks, but this returned an error: 'only list-like objects are allowed to be passed to isin(), you passed a [str]'
As I wrote the solution was sketchy. With the correction it should no longer throw an error.
Fantastic - that's done the trick! Thanks for your help, this is so useful! Now I'm able to append them to a list of dataframes
I am glad I could help! Best of luck!
0

Just adding the code to append to a list, just for future use. Thanks to sophros for solving:

broken_down = []
for _, row in df1.iterrows():
    broken_down.append(df2[df2['region'].isin(row)])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.