0

I have a Pandas dataframe that looks like this:

Location Number Position
New York 111 1 through 12
Dallas 222 1 through 12
San Francisco 333 1 through 4

I would like to basically explode the dataframe based on the Position column so that the output looks like this:

Location Number Position
New York 111 1
New York 111 2
New York 111 3
New York 111 4
New York 111 5
New York 111 6
New York 111 7
New York 111 8
New York 111 9
New York 111 10
New York 111 11
New York 111 12
Dallas 222 etc. etc.

I would like to do this for every instance of Location and Number based on what the values in Position are.

Is there a quick and easy way to do this??

1
  • Is position a string like you have showed? Commented Jun 8, 2022 at 14:38

1 Answer 1

1

One option using explode:

out = (df
 .assign(Position=[range(a, b+1) for x in df['Position']
                   for a,b in [map(int, x.split(' through '))]])
 .explode('Position')
)

Another approach using reindexing:

df2 = df['Position'].str.extract('(\d+) through (\d+)').astype(int)
#    0   1
# 0  1  12
# 1  1  12
# 2  1   4

rep = df2[1].sub(df2[0]).add(1)

out = (df
 .loc[df.index.repeat(rep)]
 .assign(Position=lambda d: d.groupby(level=0).cumcount().add(df2[0]))
)

output:

        Location  Number  Position
0       New York     111         1
0       New York     111         2
0       New York     111         3
0       New York     111         4
0       New York     111         5
0       New York     111         6
0       New York     111         7
0       New York     111         8
0       New York     111         9
0       New York     111        10
0       New York     111        11
0       New York     111        12
1         Dallas     222         1
1         Dallas     222         2
1         Dallas     222         3
1         Dallas     222         4
1         Dallas     222         5
1         Dallas     222         6
1         Dallas     222         7
1         Dallas     222         8
1         Dallas     222         9
1         Dallas     222        10
1         Dallas     222        11
1         Dallas     222        12
2  San Francisco     333         1
2  San Francisco     333         2
2  San Francisco     333         3
2  San Francisco     333         4
Sign up to request clarification or add additional context in comments.

2 Comments

Try to run [x.split(' through ') for x in df['Position']] ensure this always yields 2 numbers (as string)
And what about the second approach?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.