5

I have a column in a dataframe with two data types, like this:

25                3037205
26    2019-09-04 19:54:57
27    2019-09-09 17:55:45
28    2019-09-16 21:40:36
29                3037206
30    2019-09-06 14:49:41
31    2019-09-11 17:17:11
32                3037207
33    2019-09-11 17:19:04

I'm trying to slice it and build a new data frame like this:

26    3037205    2019-09-04 19:54:57
27    3037205    2019-09-09 17:55:45
28    3037205    2019-09-16 21:40:36
29    3037206    2019-09-06 14:49:41
30    3037206    2019-09-11 17:17:11
31    3037207    2019-09-11 17:19:04

I can't find how to slice between numbers "no datetype".

Some ideas?

Thx!

1
  • What does I can’t find how to slice between numbers “no datetype” mean? Is that part of an error message? Commented Dec 3, 2019 at 17:37

2 Answers 2

4

Another approach:

s = pd.to_numeric(df['col1'], errors='coerce')
df.assign(val=s.ffill().astype(int)).loc[s.isnull()]

Output:

                   col1      val
26  2019-09-04 19:54:57  3037205
27  2019-09-09 17:55:45  3037205
28  2019-09-16 21:40:36  3037205
30  2019-09-06 14:49:41  3037206
31  2019-09-11 17:17:11  3037206
33  2019-09-11 17:19:04  3037207
Sign up to request clarification or add additional context in comments.

1 Comment

Hi, this return an error: "ValueError: Cannot convert non-finite values (NA or inf) to integer"
2

I'm not sure if this is the most efficient way of solving the issue, but it seems to get the job done. I've added the option to rename the second column (since its name is not specified) after the #:

import pandas as pd
import numpy as np
data = {'dates':[3037205,'2019-09-04 19:54:57','2019-09-09 17:55:45','2019-09-16 21:40:36',3037206,'2019-09-06 14:49:41','2019-09-11 17:17:11',3037207,'2019-09-11 17:19:04']}

df = pd.DataFrame(data)

df['mask'] = np.where(df['dates'].str.isnumeric(),df['dates'],np.nan)
df['mask_2'] = np.where(df['dates'].str.isnumeric(),np.nan,df['dates'])
df['mask'] = df['mask'].fillna(method='ffill')
df = df.dropna(subset=['mask_2']).drop(columns=['mask_2'])#.rename(columns={'mask':'desired_name'})
print(df)

Output:

                 dates     mask
1  2019-09-04 19:54:57  3037205
2  2019-09-09 17:55:45  3037205
3  2019-09-16 21:40:36  3037205
5  2019-09-06 14:49:41  3037206
6  2019-09-11 17:17:11  3037206
8  2019-09-11 17:19:04  3037207

7 Comments

df.dropna(how='any') is pretty dangerous given that the data only represents one column in OP's.
It is indeed. But the structure of the data seems not to follow this issue as there is an id kind of value, followed by a certain number of dates, and so on... The major problem would arise if the id value contains strings, hence making the mask column contain NaN values.
I would say the major issue is not the dropna() but the np.where(), since the first one is based on the latter one. I would like your opinion on this take, since I'm a beginner and these kind of discussions are really useful to me .
No, as presented, your solution would work just fine. However, I'm talking about nan in other columns not mentioned in this post. by df.dropna(how='any') you might drop the rows with these nan, even if they are actual datetime in this column.
You are correct, I'll add subset to the dropna. I assumed the information OP is giving is the full dataframe. Thanks for your feedback. These are the differences between a seasoned coder and a beginner that I really enjoy learning!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.