0

I have a Pandas dataframe(df) with following columns:

df["ids"]

0         18281483,1658391547
1           1268212,128064430
2                  1346542425
3  13591493,13123669,35938208

df["id"]

0      18281483
1       1268212
2    1346542425
3      13123669

I like to find out, in which order of "ids" the respective "id" can be found, and output the respective value in a new column "order". Following code was tried without success:

df["order"] = df["ids"].str.split(",").index(df["id"])

----------------------------------------------------------------------
TypeError: 'Int64Index' object is not callable

Is there a syntax error? I tried the split and index function with every row manually (by inserting the lists and string), and it worked.

Desired output:

df["order"]

0 0
1 0
2 0 
3 1
2
  • What's the expected output for this data? Commented Jul 31, 2020 at 12:23
  • I want to have a column "order" that tells me in which index number the "id" appears in "ids". For instance, for row indices 0, 1 and 2 that would be "0" and for row 3 it would be "1", given indices start with 0. I added an example, thanks for your suggestion. Commented Jul 31, 2020 at 12:26

3 Answers 3

1

Try:

df['output'] = df.astype(str).apply(lambda x: x['ids'].split(',').index(x['id']), axis=1)

Output:

                          ids          id  output
0         18281483,1658391547    18281483       0
1           1268212,128064430     1268212       0
2                  1346542425  1346542425       0
3  13591493,13123669,35938208    13123669       1
Sign up to request clarification or add additional context in comments.

3 Comments

"ValueError: 18281483 is not in list". Same for you?
Looks like id column is integer. You can convert df to string as updated.
´df.apply(lambda x: x['ids'].split(',').index(str(x['id'])), axis=1)´ works indeed.
1

Here is a approach,

def index_(ids, id):
    split_ = ids.split(",")
    if id in split_:
        return split_.index(id)
    else:
        return -1


print(
    df.assign(id = df1.id.astype(str))
        .apply(lambda x: index_(x.ids, x.id), axis=1)
)

0    0
1    0
2    0
3    1
dtype: int64

Comments

0

Really shouldn't need to use apply here. On larger Dataframes it will be incredibly slow. Broadcasted comparison will work just fine.

(df["ids"].str.split(",", expand=True) == df["id"][:, None]).idxmax(1)

0    0
1    0
2    0
3    1
dtype: int64

Performance

d = {'ids': {0: '18281483,1658391547',
             1: '1268212,128064430',
             2: '1346542425',
             3: '13591493,13123669,35938208'},
      'id': {0: '18281483', 
             1: '1268212', 
             2: '1346542425',
             3: '13123669'}}

df = pd.DataFrame(d)
df = pd.concat([df] * 1000)

%timeit (df["ids"].str.split(",", expand=True) == df["id"][:, None]).idxmax(1)                 
7.51 ms ± 61.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df.apply(lambda x: x['ids'].split(',').index(x['id']), axis=1)                         
54.1 ms ± 249 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.