1

I have created a class with two methods, NRG_load and NRG_flat. The first loads a CSV, converts it into a DataFrame and applies some filtering; the second takes this DataFrame and, after creating two columns, it melts the DataFrame to pivot it.

I am trying out these methods with the following code:

nrg105 = eNRG.NRG_load('nrg_105a.tsv')
nrg105_flat = eNRG.NRG_flat(nrg105, '105')

where eNRG is the class, and '105' as second argument is needed to run an if-loop within the method to create the aforementioned columns.

The behaviour I cannot explain is that the second line - the one with the NRG_flat method - changes the nrg105 values.

Note that if I only run the NRG_load method, I get the expected DataFrame.

What is the behaviour I am missing? Because it's not the first time I apply a syntax like that, but I never had problems, so I don't know where I should look at.

Thank you in advance for all of your suggestions.

EDIT: as requested, here is the class' code:

# -*- coding: utf-8 -*-
"""
Created on Tue Apr 16 15:22:21 2019

@author: CAPIZZI Filippo Antonio
"""

import pandas as pd
from FixFilename import FixFilename as ff
from SplitColumn import SplitColumn as sc
from datetime import datetime as ddt


class EurostatNRG:
    # This class includes the modules needed to load and filter
    # the Eurostat NRG files

    # Default countries' lists to be used by the functions
    COUNTRIES = [
        'EU28', 'AL', 'AT', 'BE', 'BG', 'CY', 'CZ', 'DE', 'DK', 'EE', 'EL',
        'ES', 'FI', 'FR', 'GE', 'HR', 'HU', 'IE', 'IS', 'IT', 'LT', 'LU', 'LV',
        'MD', 'ME', 'MK', 'MT', 'NL', 'NO', 'PL', 'PT', 'RO', 'SE', 'SI', 'SK',
        'TR', 'UA', 'UK', 'XK'
    ]

    # Default years of analysis
    YEARS = list(range(2005, int(ddt.now().year) - 1))

    # NOTE: the 'datetime' library will call the current year, but since
    # the code is using the 'range' function, the end years will be always
    # current-1 (e.g. if we are in 2019, 'current year' will be 2018).
    # Thus, I have added "-1" because the end year is t-2.

    INDIC_PROD = pd.read_excel(
        './Datasets/VITO/map_nrg.xlsx',
        sheet_name=[
            'nrg105a_indic', 'nrg105a_prod', 'nrg110a_indic', 'nrg110a_prod',
            'nrg110'
        ],
        convert_float=True)

    def NRG_load(dataset, countries=COUNTRIES, years=YEARS, unit='ktoe'):
        # This module will load and refine the NRG dataset,
        # preparing it to be filtered

        # Fix eventual flags
        dataset = ff.fix_flags(dataset)

        # Load the dataset into a DataFrame
        df = pd.read_csv(
            dataset,
            delimiter='\t',
            encoding='utf-8',
            na_values=[':', ': ', ' :'],
            decimal='.')

        # Clean up spaces from the column names
        df.columns = df.columns.str.strip()

        # Removes the mentioned column because it's not needed
        if 'Flag and Footnotes' in df.columns:
            df.drop(columns=['Flag and Footnotes'], inplace=True)

        # Split the first column into separate columns
        df = sc.nrg_split_column(df)

        # Rename the columns
        df.rename(
            columns={
                'country': 'COUNTRY',
                'fuel_code': 'KEY_PRODUCT',
                'nrg_code': 'KEY_INDICATOR',
                'unit': 'UNIT'
            },
            inplace=True)

        # Filter the dataset
        df = EurostatNRG.NRG_filter(
            df, countries=countries, years=years, unit=unit)

        return df

    def NRG_filter(df, countries, years, unit):
        # This module will filter the input DataFrame 'df'
        # showing only the 'countries', 'years' and 'unit' selected

        # First, all of the units not of interest are removed
        df.drop(df[df.UNIT != unit.upper()].index, inplace=True)

        # Then, all of the countries not of interest are filtered out
        df.drop(df[~df['COUNTRY'].isin(countries)].index, inplace=True)

        # Finally, all of the years not of interest are removed,
        # and the columns are rearranged according to the desired output
        main_cols = ['KEY_INDICATOR', 'KEY_PRODUCT', 'UNIT', 'COUNTRY']
        cols = main_cols + [str(y) for y in years if y not in main_cols]
        df = df.reindex(columns=cols)

        return df

    def NRG_flat(df, name):
        # This module prepares the DataFrame to be flattened,
        # then it gives it as output

        # Assign the indicators and products' names
        if '105' in name:  # 'name' is the name of the dataset
            # Creating the 'INDICATOR' column
            indic_dic = dict(
                zip(EurostatNRG.INDIC_PROD['nrg105a_indic'].KEY_INDICATOR,
                    EurostatNRG.INDIC_PROD['nrg105a_indic'].INDICATOR))
            df['INDICATOR'] = df['KEY_INDICATOR'].map(indic_dic)
            # Creating the 'PRODUCT' column
            prod_dic = dict(
                zip(
                    EurostatNRG.INDIC_PROD['nrg105a_prod'].KEY_PRODUCT.astype(
                        str), EurostatNRG.INDIC_PROD['nrg105a_prod'].PRODUCT))
            df['PRODUCT'] = df['KEY_PRODUCT'].map(prod_dic)
        elif '110' in name:
            # Creating the 'INDICATOR' column
            indic_dic = dict(
                zip(EurostatNRG.INDIC_PROD['nrg110a_indic'].KEY_INDICATOR,
                    EurostatNRG.INDIC_PROD['nrg110a_indic'].INDICATOR))
            df['INDICATOR'] = df['KEY_INDICATOR'].map(indic_dic)
            # Creating the 'PRODUCT' column
            prod_dic = dict(
                zip(
                    EurostatNRG.INDIC_PROD['nrg110a_prod'].KEY_PRODUCT.astype(
                        str), EurostatNRG.INDIC_PROD['nrg110a_prod'].PRODUCT))
            df['PRODUCT'] = df['KEY_PRODUCT'].map(prod_dic)

        # Delete che columns 'KEY_INDICATOR' and 'KEY_PRODUCT', and
        # rearrange the columns in the desired order
        df.drop(columns=['KEY_INDICATOR', 'KEY_PRODUCT'], inplace=True)
        main_cols = ['INDICATOR', 'PRODUCT', 'UNIT', 'COUNTRY']
        year_cols = [y for y in df.columns if y not in main_cols]
        cols = main_cols + year_cols
        df = df.reindex(columns=cols)

        # Pivot the DataFrame to have it in flat format
        df = df.melt(
            id_vars=df.columns[:4], var_name='YEAR', value_name='VALUE')

        # Convert the 'VALUE' column into float numbers
        df['VALUE'] = pd.to_numeric(df['VALUE'], downcast='float')

        # Drop rows that have no indicators (it means they are not in
        # the Excel file with the products of interest)
        df.dropna(subset=['INDICATOR', 'PRODUCT'], inplace=True)

        return df

EDIT 2: if this could help, this is the error I receive when using the EurostatNRG class in IPython:

[autoreload of EurostatNRG failed: Traceback (most recent call last): File "C:\Users\CAPIZZIF\AppData\Local\Continuum\anaconda3\lib\site-packages\IPython\extensions\autoreload.py", line 244, in check superreload(m, reload, self.old_objects) File "C:\Users\CAPIZZIF\AppData\Local\Continuum\anaconda3\lib\site-packages\IPython\extensions\autoreload.py", line 394, in superreload update_generic(old_obj, new_obj) File "C:\Users\CAPIZZIF\AppData\Local\Continuum\anaconda3\lib\site-packages\IPython\extensions\autoreload.py", line 331, in update_generic update(a, b) File "C:\Users\CAPIZZIF\AppData\Local\Continuum\anaconda3\lib\site-packages\IPython\extensions\autoreload.py", line 279, in update_class if (old_obj == new_obj) is True: File "C:\Users\CAPIZZIF\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py", line 1478, in nonzero .format(self.class.name)) ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). ]

8
  • 7
    We will need to see the class's code. Commented Apr 17, 2019 at 11:49
  • As @brunns mentioned, we need to check those methods. Probably the NRG_flat method is changing the first parameter in-place Commented Apr 17, 2019 at 12:02
  • try just doing nrg105_flat = eNRG.NRG_flat(nrg105.copy(), '105'). As others have said, you are probably changing the parameter inplace Commented Apr 17, 2019 at 12:10
  • Thank you all for your replies, I have added the class code in the original post. Commented Apr 17, 2019 at 14:14
  • Your class definition uses all class level variables. You even call the methods explicitely through the class. You need to use instance variables, and properly define your methods to accept self as the first parameter. Frankly, this is all wrong. you should read the documentation on classes: docs.python.org/3/tutorial/classes.html Commented Apr 17, 2019 at 17:28

1 Answer 1

1

I managed to find the culprit.

In the NRG_flat method, the lines:

df['INDICATOR'] = df['KEY_INDICATOR'].map(indic_dic)
...
df['PRODUCT'] = df['KEY_PRODUCT'].map(indic_dic)

mess up the copies of the df DataFrame, thus I had to change them with the Pandas assign method:

df = df.assign(INDICATOR=df.KEY_INDICATOR.map(prod_dic))
...
df = df.assign(PRODUCT=df.KEY_PRODUCT.map(prod_dic))

I do not get any more error.

Thank you for replying!

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.