Removing Rows in Python DataFrame rows using conditional

Question

I'm trying to remove rows of data that I don't need after importing from files and concatenating my list of dataframes. Here is what my current DataFrame looks like:

                            Best Movie
0                        Movie: Orphan
1                                   2.
2                        Movie: Avatar
3                                   3.
4          Movie: Inglourious Basterds
...                                ...
2371  Movie: The Deep End of the Ocean
2372                               49.
2373         Movie: Drop Dead Gorgeous
2374                               50.
2375                         Movie: Go

I need to remove all rows with just the number in them so result looks like this:

                            Best Movie
0                        Movie: Orphan
2                        Movie: Avatar
4          Movie: Inglourious Basterds
...                                ...
2371  Movie: The Deep End of the Ocean
2373         Movie: Drop Dead Gorgeous
2375                         Movie: Go

Thank you for your help!

df[~df['Best Movie'].str.endswith('.')] ? Try that.

Scott Boston
– Scott Boston

2022-07-29 18:25:26 +00:00
Commented Jul 29, 2022 at 18:25 — Scott Boston
– Scott Boston, Commented Jul 29, 2022 at 18:25
stackoverflow.com/questions/48996822/… check this.

Jui Sen
– Jui Sen

2022-07-29 18:26:32 +00:00
Commented Jul 29, 2022 at 18:26 — Jui Sen
– Jui Sen, Commented Jul 29, 2022 at 18:26
is not getting only even rows a better way?

MoRe
– MoRe

2022-07-29 19:30:28 +00:00
Commented Jul 29, 2022 at 19:30 — MoRe
– MoRe, Commented Jul 29, 2022 at 19:30

Dani Mesejo · Accepted Answer · 2022-07-29 19:24:56Z

2

One solution using str.match

mask = ~df["Best Movie"].str.match(r"^\s*\d+\.$")
res = df[mask]
print(res)

Output

                         Best Movie
0                     Movie: Orphan
2                     Movie: Avatar
4       Movie: Inglourious Basterds
5  Movie: The Deep End of the Ocean
7         Movie: Drop Dead Gorgeous
9                         Movie: Go

UPDATE

To replace "Movie:" and reset the index, do:

res = df[mask].reset_index()
res = res["Best Movie"].str.replace(r"^\s*Movie:", "", regex=True)
print(res)

Output

0                        Orphan
1                        Avatar
2          Inglourious Basterds
3     The Deep End of the Ocean
4            Drop Dead Gorgeous
5                            Go
Name: Best Movie, dtype: object

edited Jul 29, 2022 at 19:24

answered Jul 29, 2022 at 18:28

Dani Mesejo

62.2k6 gold badges56 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Faisal Over a year ago

Thank you very much, Dani---worked beautifully. Now I just need to reset index so that the row numbering matches up with the actual number of rows and also remove the word "Movie: " from each row.

Dani Mesejo Over a year ago

@Faisal See the update

SomeDude · Accepted Answer · 2022-07-29 18:32:32Z

1

You can do:

df.loc[~df['Best Movie'].str.match('^\d+.$')]

answered Jul 29, 2022 at 18:32

SomeDude

14.3k5 gold badges26 silver badges49 bronze badges

Comments

srinath · Accepted Answer · 2022-07-29 18:29:09Z

0

Sample input

df = pd.DataFrame({
    
    "Best_Movie": ["Movie: Orphan", "2.", "Movie: Avatar", "3."]
})

apply pd.to_numeric. the rows with only numbers will be converted to float and others will be marked as NaN.

df["nums"] = pd.to_numeric(df['Best_Movie'], errors='coerce')

extract rows which has text (i.e. rows marked as nan )

df.loc[df.nums.isnull(), "Best_Movie"]

Sample output

0    Movie: Orphan
2    Movie: Avatar
Name: Best_Movie, dtype: object

answered Jul 29, 2022 at 18:29

srinath

3,0288 gold badges39 silver badges62 bronze badges

Comments

Nuri Taş · Accepted Answer · 2022-07-29 18:29:14Z

0

Try the following. '|' is basically means or in this case

df[~df['Best Movie'].str.contains('|'.join(str(i) for i in range(10)))]

answered Jul 29, 2022 at 18:29

Nuri Taş

3,8552 gold badges8 silver badges22 bronze badges

Collectives™ on Stack Overflow

Removing Rows in Python DataFrame rows using conditional

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related