1

If I have a dataframe such as:

data = [[10,"Apple"],[4,"Banana"],[3,"Strawberry"],[15,"Chocolate"],[5,"Kiwi"],[75,"Apple"],[4,"Potato"],[6,"Apple"],[45,"Banana"],[10,"Strawberry"],[10,"Apple"],]

df = pd.DataFrame(data, columns=('Cost', 'Fruit'))

Produces:

   Cost Fruit
0   10  Apple
1   4   Banana
2   3   Strawberry
3   15  Chocolate
4   5   Kiwi
5   75  Apple
6   4   Potato
7   6   Apple
8   45  Banana
9   10  Strawberry
10  10  Apple

I also have a dictionary with shop names assigned certain values:

 shop_names = {
    0 : "Shop 1",
    4 : "Shop 2",
    7 : "Shop 3",
}

Produces:

{0: 'Shop 1', 4: 'Shop 2', 7: 'Shop 3'}

I want to create a "Shop" column and assign values for each shop in the dictionary, based on the keys. So Rows 0-3 = Shop 1, Rows 4-6 = Shop 2, Rows 7-10 = Shop 3. The dictionary only has the first index as the key for each shop.

Desired output:

    Cost    Fruit   Shop
0   10  Apple       Shop 1
1   4   Banana      Shop 1
2   3   Strawberry  Shop 1
3   15  Chocolate   Shop 1
4   5   Kiwi        Shop 2
5   75  Apple       Shop 2
6   4   Potato      Shop 2
7   6   Apple       Shop 3
8   45  Banana      Shop 3
9   10  Strawberry  Shop 3
10  10  Apple       Shop 3 

This is an example dataset - the real dataset has 10,000 rows - but I have a dictionary with the shop intervals based on their equivalent index positions the original dataframe.

I am struggling to assign the values based on the index ranges from my dictionary. Any help would be appreciated.

Many thanks

2 Answers 2

5

Create Series by Index.map with ffill for forward filling missing values:

df['Shop'] = pd.Series(df.index.map(shop_names), index=df.index).ffill()
print (df)
    Cost       Fruit    Shop
0     10       Apple  Shop 1
1      4      Banana  Shop 1
2      3  Strawberry  Shop 1
3     15   Chocolate  Shop 1
4      5        Kiwi  Shop 2
5     75       Apple  Shop 2
6      4      Potato  Shop 2
7      6       Apple  Shop 3
8     45      Banana  Shop 3
9     10  Strawberry  Shop 3
10    10       Apple  Shop 3
Sign up to request clarification or add additional context in comments.

Comments

2

First map the shop names onto locations using Index.map(). And then fill the column with the last non-nan value (inspired by this answer)

df["Shop"] = df.index.map(shop_names).values
df["Shop"] = df["Shop"].fillna(method="ffill")

df
Out[36]: 
    Cost       Fruit    Shop
0     10       Apple  Shop 1
1      4      Banana  Shop 1
2      3  Strawberry  Shop 1
3     15   Chocolate  Shop 1
4      5        Kiwi  Shop 2
5     75       Apple  Shop 2
6      4      Potato  Shop 2
7      6       Apple  Shop 3
8     45      Banana  Shop 3
9     10  Strawberry  Shop 3
10    10       Apple  Shop 3

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.