0

My data looks like this :

    900324492   900405679   900472531
1   2017-04-03 08:04:09 2017-04-03 07:49:53 2017-04-03 07:52:39
2   2017-04-03 08:05:36 2017-04-03 07:54:36 2017-04-03 07:52:19
3   2017-04-03 08:05:28 2017-04-03 07:43:00 2017-04-03 07:50:52
4   2017-04-03 08:06:05 2017-04-03 07:49:42 2017-04-03 07:53:55

So, for each column, I have a set of time stamps (datetime objects, to be exact). I like to make a scatter plot, where x is the df index or row number (i.e. x=[1,2,3,4,...]), and y is a time point. For example, If there are 4 rows and 10 columns in df, x axis should be 1, 2, 3, 4, and for x=1 there should be one point per entry in the first row.

It seemed like a simple task, but I'm struggling a bit. My code so far:

df = pd.read_csv('test.csv')
df2 = df.apply(lambda x : pd.to_datetime(x))

fig = plt.figure()                                                                                                                                                                                                                                                             
ax = fig.add_subplot(111)                                                                                                                                                                                                                                                      
y = df2.ix[:, 1]                                                                                                                                                                                                                                           
x = df2.index.values
# returns nonsense
ax.plot(x,y)
# TypeError: invalid type promotion
ax.scatter(x=x, y = df2.ix[:,1])
# TypeError: Empty 'DataFrame': no numeric data to plot
df2.ix[:,1].plot()

Test file link : test.csv

1 Answer 1

1

Please check my example from yours. You should focus on to_pydatetime() and date2num() and np.nan. (You have to tag y axis to datetime format finally.)

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as dates


df = pd.read_csv('test.csv', header=None)
df2 = df.apply(lambda x : pd.to_datetime(x))

fig = plt.figure()                                                                                                                                                                                                                                                             
ax = fig.add_subplot(111)                                                                                                                                                                                                                                                      
y = df2.ix[:, 1]                                                                                                                                                                                                                                         
x = df2.index.values

def fix(x):
    try:
        return dates.date2num(x.to_pydatetime())
    except:
        return np.nan

y_lab = [str(e) for e in y]
y_ = [fix(e) for e in y]

ax.scatter(x=x, y=y_)

plt.yticks(y_, y_lab, rotation=0, size=6)
plt.show()

enter image description here

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks! But you see the y axis? It is not easy to interpret it, which kinda contradicts the motivation for doing it. Is there a way to fix it, so that it shows time, instead of decimals?
You can use num2date() function for ylabel.
I will show you an example after 3 hours! (AFK and currently mobile) So please hold on.
Updated, sorry for late.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.