1

To build an algorithm I want to make python script to work in such a way that it doesn't needs to make use of Panda's read_csv function again and again.

Following is the code that I am using.

start_date = '2016-06-01'
end_date = '2017-09-22'

#Pool of symbols that I want to use
usesymbols = ['GLAXO', 'AVN']

#Function to build a dataframe 
def data(symbols):
    dates=pd.date_range(start_date,end_date) 
    df=pd.DataFrame(index=dates)
    for symbol in symbols:
        df_temp=pd.read_csv('/home/furqan/Desktop/python_data/{}.csv'.format(str(symbol)),usecols=['Date','Close'],
                            parse_dates=True,index_col='Date',na_values=['nan'])
        df_temp = df_temp.rename(columns={'Close': symbol})
        df=df.join(df_temp)
        df=df.fillna(method='ffill')
        df=df.fillna(method='bfill')
    return df

#Function to build powerset from list of "usesymbols"
def powerset(iterable):
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(1, len(s)+1))

power_set = list(powerset(usesymbols))
dataframe = data(usesymbols)
print(dataframe)
for j in range(0, len(power_set)):

Using usesymbols first I gernerated a power set which looks like as follows:

[('GLAXO',), ('AVN',), ('GLAXO', 'AVN')]

then I created a dataframe which looks like as follows:

             GLAXO    AVN
2016-06-01  205.93  31.42
2016-06-02  206.22  32.62
2016-06-03  207.86  31.65
2016-06-04  207.86  31.65
2016-06-05  207.86  31.65

After that I added a loop, under that loop I want to create a temporary dataframe such that if j = 0 that temporary dataframe should consist of 1 column namely GLAXO, then when j = 1 it should consist of one column 'AVN' and finally when j = 3 it should comprise of both the columns 'AVN' and 'GLAXO'.

I am having difficulty in making that temporary dataframe. Second option is to make use of data function, but that would end up using pandas read_csv function every time.

1 Answer 1

1
powerset = [('GLAXO'), ('AVN'), ('GLAXO', 'AVN')]
j = 1
print(df.loc[:,powerset[j]])

2016-06-01    31.42
2016-06-02    32.62
2016-06-03    31.65
2016-06-04    31.65
2016-06-05    31.65
Name: AVN, dtype: float64

j=2
print(df.loc[:,powerset[j]])

 GLAXO    AVN
2016-06-01  205.93  31.42
2016-06-02  206.22  32.62
2016-06-03  207.86  31.65
2016-06-04  207.86  31.65
2016-06-05  207.86  31.65
Sign up to request clarification or add additional context in comments.

1 Comment

Perfect solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.