1

I'm trying to remove rows of data that I don't need after importing from files and concatenating my list of dataframes. Here is what my current DataFrame looks like:

                            Best Movie
0                        Movie: Orphan
1                                   2.
2                        Movie: Avatar
3                                   3.
4          Movie: Inglourious Basterds
...                                ...
2371  Movie: The Deep End of the Ocean
2372                               49.
2373         Movie: Drop Dead Gorgeous
2374                               50.
2375                         Movie: Go

I need to remove all rows with just the number in them so result looks like this:

                            Best Movie
0                        Movie: Orphan
2                        Movie: Avatar
4          Movie: Inglourious Basterds
...                                ...
2371  Movie: The Deep End of the Ocean
2373         Movie: Drop Dead Gorgeous
2375                         Movie: Go

Thank you for your help!

3
  • df[~df['Best Movie'].str.endswith('.')] ? Try that. Commented Jul 29, 2022 at 18:25
  • stackoverflow.com/questions/48996822/… check this. Commented Jul 29, 2022 at 18:26
  • is not getting only even rows a better way? Commented Jul 29, 2022 at 19:30

4 Answers 4

2

One solution using str.match

mask = ~df["Best Movie"].str.match(r"^\s*\d+\.$")
res = df[mask]
print(res)

Output

                         Best Movie
0                     Movie: Orphan
2                     Movie: Avatar
4       Movie: Inglourious Basterds
5  Movie: The Deep End of the Ocean
7         Movie: Drop Dead Gorgeous
9                         Movie: Go

UPDATE

To replace "Movie:" and reset the index, do:

res = df[mask].reset_index()
res = res["Best Movie"].str.replace(r"^\s*Movie:", "", regex=True)
print(res)

Output

0                        Orphan
1                        Avatar
2          Inglourious Basterds
3     The Deep End of the Ocean
4            Drop Dead Gorgeous
5                            Go
Name: Best Movie, dtype: object
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much, Dani---worked beautifully. Now I just need to reset index so that the row numbering matches up with the actual number of rows and also remove the word "Movie: " from each row.
@Faisal See the update
1

You can do:

df.loc[~df['Best Movie'].str.match('^\d+.$')]

Comments

0

Sample input

df = pd.DataFrame({
    
    "Best_Movie": ["Movie: Orphan", "2.", "Movie: Avatar", "3."]
})

apply pd.to_numeric. the rows with only numbers will be converted to float and others will be marked as NaN.

df["nums"] = pd.to_numeric(df['Best_Movie'], errors='coerce')

extract rows which has text (i.e. rows marked as nan )

df.loc[df.nums.isnull(), "Best_Movie"]

Sample output

0    Movie: Orphan
2    Movie: Avatar
Name: Best_Movie, dtype: object

Comments

0

Try the following. '|' is basically means or in this case

df[~df['Best Movie'].str.contains('|'.join(str(i) for i in range(10)))] 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.