AttributeError: 'numpy.int64' object has no attribute 'to_timestamp'

Question

I am trying to plot a time series from a python data frame. The code is below.

import requests
from bs4 import BeautifulSoup
import pandas as pd
import datetime
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter, YearLocator, MonthLocator
plt.style.use('ggplot')


def plot(df, filename, heading=None):

    fig, ax = plt.subplots(figsize=(8, 4))

    min_date = None
    max_date = None
    for col_name in df.columns.values:

        # plot the column
        col = df[col_name]
        col = col[col.notnull()] # drop NAs
        dates = [zzz.to_timestamp().date() for zzz in col.index]
        ax.plot_date(x=dates, y=col, fmt='-', label=col_name,
            tz=None, xdate=True, ydate=False, linewidth=1.5)

        # establish the date range for the data
        if min_date:
            min_date = min(min_date, min(dates))
        else:
            min_date = min(dates)
        if max_date:
            max_date = max(max_date, max(dates))
        else:
            max_date = max(dates)

    # give a bit of space at each end of the plot - aesthetics
    span = max_date - min_date
    extra = int(span.days * 0.03) * datetime.timedelta(days=1)
    ax.set_xlim([min_date - extra, max_date + extra])

    # format the x tick marks
    ax.xaxis.set_major_formatter(DateFormatter('%Y'))
    ax.xaxis.set_minor_formatter(DateFormatter('\n%b'))
    ax.xaxis.set_major_locator(YearLocator())
    ax.xaxis.set_minor_locator(MonthLocator(bymonthday=1, interval=2))

    # grid, legend and yLabel
    ax.grid(True)
    ax.legend(loc='best', prop={'size':'x-small'})
    ax.set_ylabel('Percent')

    # heading
    if heading:
        fig.suptitle(heading, fontsize=12)
    fig.tight_layout(pad=1.5)

    # footnote
    fig.text(0.99, 0.01, 'nse-timeseries-plot', ha='right',
        va='bottom', fontsize=8, color='#999999')

    # save to file
    fig.savefig(filename, dpi=125)


    url = "https://www.google.com/finance/historical?cid=207437&startdate=Jan%201%2C%201971&enddate=Jul%201%2C%202017&start={0}&num=30"
    how_many_pages=138
    start=0

    for i in range(how_many_pages):
        new_url = url.format(start)
        page = requests.get(new_url)
        soup = BeautifulSoup(page.content, "lxml")
        table = soup.find_all('table', class_='gf-table historical_price')[0]

        columns_header = [th.getText() for th in table.findAll('tr')[0].findAll('th')]
        data_rows=table.findAll('tr')[1:]
        data=[[td.getText() for td in data_rows[i].findAll(['td'])] for i in range(len(data_rows))]

        if start == 0:
            final_df = pd.DataFrame(data, columns=columns_header)
        else:
            df = pd.DataFrame(data, columns=columns_header)
            final_df = pd.concat([final_df, df],axis=0)
        start += 30
        final_df.to_csv('nse_data.csv', sep='\t', encoding='utf-8')


    plot(final_df,'nsetsplot')

When I run the code I get the error AttributeError: 'numpy.int64' object has no attribute 'to_timestamp'

when I do

dates = [zzz.to_timestamp().date() for zzz in col.index]

I am using Anaconda 64-bit on Windows 7 (x86_64)

Which library supports to_timestamp()? numpy, scipy, pandas? — Brian Cain
– Brian Cain, Commented Jul 3, 2017 at 1:15
I found to_timedelta() and to_datetime() in pandas but no to_timestamp(). Maybe you're just calling the wrong method or from the wrong object/scope. — Brian Cain
– Brian Cain, Commented Jul 3, 2017 at 1:29
@BrianCain - I got the timeseries plotting code from http://markthegraph.blogspot.com.au/2015/05/plotting-time-series-dataframes-in.html — liv2hak
– liv2hak, Commented Jul 3, 2017 at 1:34

Warren Weckesser · Accepted Answer · 2017-07-03 05:43:29Z

1

Apparently the index of your DataFrame is not a pandas.PeriodIndex. Instead, the index appears hold integers. The code that you posted requires the index of the data frame to be a PeriodIndex. E.g.

In [36]: df
Out[36]: 
                a         b
2012-01  1.457900  7.084201
2012-02  1.775861  6.448277
2012-03  1.069051  7.861898

In [37]: df.index
Out[37]: PeriodIndex(['2012-01', '2012-02', '2012-03'], dtype='period[M]', freq='M')

When the index is the correct type, the following code (similar to the line in the code you posted) works:

In [39]: dates = [zzz.to_timestamp().date() for zzz in df.index]

In [40]: dates
Out[40]: 
[datetime.date(2012, 1, 1),
 datetime.date(2012, 2, 1),
 datetime.date(2012, 3, 1)]

answered Jul 3, 2017 at 5:43

Warren Weckesser

116k20 gold badges207 silver badges224 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Kadir A. Peker · Accepted Answer · 2018-07-18 07:49:32Z

0

This may be due to a excel format issue if you imported your dataframe from excel. I had a similar problem: The dates appear fine in excel, but appear as integers (the integer representation of the date in excel) in the imported dataframe. This solved the problem for me: I select the whole column of dates in excel, and apply date format to the column. When I import as a dataframe after this, dates come out as dates.

answered Jul 18, 2018 at 7:49

Kadir A. Peker

1301 silver badge4 bronze badges

Collectives™ on Stack Overflow

AttributeError: 'numpy.int64' object has no attribute 'to_timestamp'

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related