0

I have the following pandas DataFrame.

   Id UserId    Name            Date                 Class  TagBased
0   2   23  Autobiographer  2016-01-12T18:44:49.267     3   False
1   3   22  Autobiographer  2016-01-12T18:44:49.267     3   False
2   4   21  Autobiographer  2016-01-12T18:44:49.267     3   False
3   5   20  Autobiographer  2016-01-12T18:44:49.267     3   False
4   6   19  Autobiographer  2016-01-12T18:44:49.267     3   False

I want to iterate through "TagBased" column and put the User Ids in a list where TagBased=True. I have used the following code but I am getting no output which is incorrect because there are 18 True values in TagBased.

user_tagBased = []

for i in range(len(df)):
    if (df['TagBased'] is True):
        user_TagBased.append(df['UserId'])
print(user_TagBased)

Output: []
4
  • try df.loc[df['TagBased'],'UserId'].tolist() you dont need loops most of the times in pandas Commented Jun 24, 2020 at 15:15
  • I am getting the following error by trying this method: KeyError: "None of [Index(['False', 'False', 'False', 'False', 'False', 'False', 'False', 'False',\n 'False', 'False',\n ...\n 'False', 'False', 'False', 'False', 'False', 'False', 'False', 'False',\n 'False', 'False'],\n dtype='object', length=18087)] are in the [index]" Commented Jun 24, 2020 at 15:17
  • better is df.loc[df['TagBased'].eq("True"),'UserId'].tolist() since the values are string Commented Jun 24, 2020 at 15:35
  • This one works, Thanks a lot!! Commented Jun 24, 2020 at 15:48

3 Answers 3

1

As others are suggesting, using Pandas conditional filtering is the best choice here without using loops! However, to still explain why your code did not work as expected:

You are appending df['UserId'] in a for-loop while df['UserId'] is a column. Same goes for df['TagBased'] check, which is also a column.

I assume you want to append the userId at the current row in the for-loop.

You can do that by iterating through the df rows:

user_tagBased = []

for index, row in df.iterrows():
    if row['TagBased'] == 'True': # Because it is a string and not a boolean here
        user_tagBased.append(row['UserId'])
Sign up to request clarification or add additional context in comments.

Comments

1

Try this, you don't need to use loops for this:

user_list = df[df['TagBased']==True]['UserId'].tolist()
print(user_list)

[19, 19]

Comments

0

There is no need to use any loop.

Note that:

  • df.TagBased - yields a Series of bool type - TagBased column (I assume that TagBased column is of bool type).
  • df[df.TagBased] - is an example of boolean indexing - it retrieves rows where TagBased is True
  • df[df.TagBased].UserId - limits the above result to just UserId, almost what you want, but this is a Series, whereas you want a list.

So the code to produce your expected result, with saving in the destination variable, is:

user_tagBased = df[df.TagBased].UserId.to_list()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.