Python/Pandas: Get index of item in a column

Question

I have a Pandas dataframe(df) with following columns:

df["ids"]

0         18281483,1658391547
1           1268212,128064430
2                  1346542425
3  13591493,13123669,35938208

df["id"]

0      18281483
1       1268212
2    1346542425
3      13123669

I like to find out, in which order of "ids" the respective "id" can be found, and output the respective value in a new column "order". Following code was tried without success:

df["order"] = df["ids"].str.split(",").index(df["id"])

----------------------------------------------------------------------
TypeError: 'Int64Index' object is not callable

Is there a syntax error? I tried the split and index function with every row manually (by inserting the lists and string), and it worked.

Desired output:

df["order"]

I want to have a column "order" that tells me in which index number the "id" appears in "ids". For instance, for row indices 0, 1 and 2 that would be "0" and for row 3 it would be "1", given indices start with 0. I added an example, thanks for your suggestion. — Christopher
– Christopher, Commented Jul 31, 2020 at 12:26

Quang Hoang · Accepted Answer · 2020-07-31 12:51:21Z

1

Try:

df['output'] = df.astype(str).apply(lambda x: x['ids'].split(',').index(x['id']), axis=1)

Output:

                          ids          id  output
0         18281483,1658391547    18281483       0
1           1268212,128064430     1268212       0
2                  1346542425  1346542425       0
3  13591493,13123669,35938208    13123669       1

edited Jul 31, 2020 at 12:51

answered Jul 31, 2020 at 12:34

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Christopher Over a year ago

"ValueError: 18281483 is not in list". Same for you?

Quang Hoang Over a year ago

Looks like id column is integer. You can convert df to string as updated.

Christopher Over a year ago

´df.apply(lambda x: x['ids'].split(',').index(str(x['id'])), axis=1)´ works indeed.

sushanth · Accepted Answer · 2020-07-31 12:30:52Z

1

Here is a approach,

def index_(ids, id):
    split_ = ids.split(",")
    if id in split_:
        return split_.index(id)
    else:
        return -1


print(
    df.assign(id = df1.id.astype(str))
        .apply(lambda x: index_(x.ids, x.id), axis=1)
)

0    0
1    0
2    0
3    1
dtype: int64

answered Jul 31, 2020 at 12:30

sushanth

8,2923 gold badges20 silver badges31 bronze badges

Comments

user3483203 · Accepted Answer · 2020-07-31 12:54:22Z

Really shouldn't need to use apply here. On larger Dataframes it will be incredibly slow. Broadcasted comparison will work just fine.

(df["ids"].str.split(",", expand=True) == df["id"][:, None]).idxmax(1)

0    0
1    0
2    0
3    1
dtype: int64

Performance

d = {'ids': {0: '18281483,1658391547',
             1: '1268212,128064430',
             2: '1346542425',
             3: '13591493,13123669,35938208'},
      'id': {0: '18281483', 
             1: '1268212', 
             2: '1346542425',
             3: '13123669'}}

df = pd.DataFrame(d)
df = pd.concat([df] * 1000)

%timeit (df["ids"].str.split(",", expand=True) == df["id"][:, None]).idxmax(1)                 
7.51 ms ± 61.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df.apply(lambda x: x['ids'].split(',').index(x['id']), axis=1)                         
54.1 ms ± 249 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Collectives™ on Stack Overflow

Python/Pandas: Get index of item in a column

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related