2

so i want to get the monthly sum with my script but i always get an AttributeError, which i dont
understand. The column Timestamp does indeed exist on my combined_csv.
I know for sure that this line is causing the problem since i tested al of my other code before.
AttributeError: 'DataFrame' object has no attribute 'Timestamp'
I'll appreciate every kind of help i can get - thanks

import os
import glob
import pandas as pd

# set working directory
os.chdir("Path to CSVs")

# find all csv files in the folder
# use glob pattern matching -> extension = 'csv'
# save result in list -> all_filenames
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
# print(all_filenames)

# combine all files in the list
combined_csv = pd.concat([pd.read_csv(f, sep=';') for f in all_filenames])
# Format CSV
# Transform Timestamp column into datetime
combined_csv['Timestamp'] = pd.to_datetime(combined_csv.Timestamp)
# Read out first entry of every day of every month
combined_csv = round(combined_csv.resample('D', on='Timestamp')['HtmDht_Energy'].agg(['first']))
# To get the yield of day i have to subtract day 2 HtmDht_Energy - day 1 HtmDht_Energy
combined_csv["dailyYield"] = combined_csv["first"] - combined_csv["first"].shift()
# combined_csv.reset_index()
# combined_csv.index.set_names(["year", "month"], inplace=True)
combined_csv["monthlySum"] = combined_csv.groupby([combined_csv.Timestamp.dt.year, combined_csv.Timestamp.dt.month]).sum()

Output of combined_csv.columns

Index(['Timestamp', 'teHst0101', 'teHst0102', 'teHst0103', 'teHst0104',
       'teHst0105', 'teHst0106', 'teHst0107', 'teHst0201', 'teHst0202',
       'teHst0203', 'teHst0204', 'teHst0301', 'teHst0302', 'teHst0303',
       'teHst0304', 'teAmb', 'teSolFloHexHst', 'teSolRetHexHst',
       'teSolCol0501', 'teSolCol1001', 'teSolCol1501', 'vfSol', 'prSolRetSuc',
       'rdGlobalColAngle', 'gSolPump01_roActual', 'gSolPump02_roActual',
       'gHstPump03_roActual', 'gHstPump04_roActual', 'gDhtPump06_roActual',
       'gMB01_isOpened', 'gMB02_isOpened', 'gCV01_posActual',
       'gCV02_posActual', 'HtmDht_Energy', 'HtmDht_Flow', 'HtmDht_Power',
       'HtmDht_Volume', 'HtmDht_teFlow', 'HtmDht_teReturn', 'HtmHst_Energy',
       'HtmHst_Flow', 'HtmHst_Power', 'HtmHst_Volume', 'HtmHst_teFlow',
       'HtmHst_teReturn', 'teSolColDes', 'teHstFloDes'],
      dtype='object')

Traceback:
When i select it with
combined_csv["monthlySum"] = combined_csv.groupby([combined_csv['Timestamp'].dt.year, combined_csv['Timestamp'].dt.month]).sum()

Traceback (most recent call last):
  File "D:\Users\wink\PycharmProjects\csvToExcel\main.py", line 28, in <module>
    combined_csv["monthlySum"] = combined_csv.groupby([combined_csv['Timestamp'].dt.year, combined_csv['Timestamp'].dt.month]).sum()
  File "D:\Users\wink\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\frame.py", line 3024, in __getitem__
    indexer = self.columns.get_loc(key)
  File "D:\Users\wink\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\indexes\base.py", line 3082, in get_loc
    raise KeyError(key) from err
KeyError: 'Timestamp'

traceback with mustafas solution

Traceback (most recent call last):
  File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\frame.py", line 3862, in reindexer
    value = value.reindex(self.index)._values
  File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\util\_decorators.py", line 312, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\frame.py", line 4176, in reindex
    return super().reindex(**kwargs)
  File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\generic.py", line 4811, in reindex
    return self._reindex_axes(
  File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\frame.py", line 4022, in _reindex_axes
    frame = frame._reindex_index(
  File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\frame.py", line 4038, in _reindex_index
    new_index, indexer = self.index.reindex(
  File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\indexes\multi.py", line 2492, in reindex
    target = MultiIndex.from_tuples(target)
  File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\indexes\multi.py", line 175, in new_meth
    return meth(self_or_cls, *args, **kwargs)
  File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\indexes\multi.py", line 531, in from_tuples
    arrays = list(lib.tuples_to_object_array(tuples).T)
  File "pandas\_libs\lib.pyx", line 2527, in pandas._libs.lib.tuples_to_object_array
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\winklerm\PycharmProjects\csvToExcel\main.py", line 28, in <module>
    combined_csv["monthlySum"] = combined_csv.groupby([combined_csv.Timestamp.dt.year, combined_csv.Timestamp.dt.month]).sum()
  File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\frame.py", line 3163, in __setitem__
    self._set_item(key, value)
  File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\frame.py", line 3242, in _set_item
    value = self._sanitize_column(key, value)
  File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\frame.py", line 3888, in _sanitize_column
    value = reindexer(value).T
  File "C:\Users\winklerm\PycharmProjects\csvToExcel\venv\lib\site-packages\pandas\core\frame.py", line 3870, in reindexer
    raise TypeError(
TypeError: incompatible index of inserted column with frame index
6
  • What happens if you use combined_csv['Timestamp'] instead of combined_csv.Timestamp ? Commented Apr 26, 2021 at 7:06
  • After this line combined_csv = pd.concat([pd.read_csv(f, sep=';') for f in all_filenames]), can you put print(combined_csv.columns) to see what columns combined_csv has and share it here? Maybe there is some blank space issue. Commented Apr 26, 2021 at 7:06
  • 1
    Hi, i added the output of the columns in the post Commented Apr 26, 2021 at 7:07
  • that what i thought and tried too. with this line of code combined_csv["monthlySum"] = combined_csv.groupby([combined_csv['Timestamp'].dt.year, combined_csv['Timestamp'].dt.month]).sum() Commented Apr 26, 2021 at 7:09
  • I get an Keyerror Commented Apr 26, 2021 at 7:10

1 Answer 1

1

This line makes the Timestamp column the index of the combined_csv:

combined_csv = round(combined_csv.resample('D', on='Timestamp')['HtmDht_Energy'].agg(['first']))

and therefore you get an error when you try to access .Timestamp.

Remedy is to reset_index, so instead of above line, you can try this:

combined_csv = round(combined_csv.resample('D', on='Timestamp')['HtmDht_Energy'].agg(['first'])).reset_index()

which will take the Timestamp column back into normal columns from the index and you can then access it.


Side note:
combined_csv["dailyYield"] = combined_csv["first"] - combined_csv["first"].shift()

is equivalent to

combined_csv["dailyYield"] = combined_csv["first"].diff()
Sign up to request clarification or add additional context in comments.

9 Comments

Thanks for your answer, it seems a bitl cleare now but i get another error when i'm doing it like you suggested - i updated my post
@OTRAY Which pandas version are you using by the way: print(pd.__version__)?
i'm using 1.2.3
@OTRAY Okay, with this line combined_csv.groupby([combined_csv.Timestamp.dt.year, combined_csv.Timestamp.dt.month]).sum(), you are grouping over year and month and sum it. But then this gives less output than the original dataframe's index has because you are summing and it reduces. If you look at the output of this line separately, you will see it has less elements than the dataframe's index, hence the error.
@OTRAY For example let's say you have the Timestamp as [2012-09-01, 2012-09-24, 2012-09-30, 2013-01-05, 2013-01-06] and you do that summation, you will get 2 outputs only: 1 for 2012 september and one for 2013 january. But when you put this result back into original df, it has 5 entries; hence the error.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.