0

I am looking to generate a list of tuples from my Dataframes. Here is my dataframe

data.csv

,Date,Open,High,Low,Close,min,max
2022-10-03 12:00:00+01:00,19268.458333333332,141.95199584960938,141.97999572753906,141.30999755859375,141.42999267578125,141.42999267578125,
2022-10-04 16:00:00+01:00,19269.625,143.83799743652344,144.07699584960938,143.72999572753906,143.99000549316406,,143.99000549316406
2022-10-05 15:00:00+01:00,19270.583333333332,142.83299255371094,142.87100219726562,142.4199981689453,142.66000366210938,142.66000366210938,
2022-10-06 06:00:00+01:00,19271.208333333332,143.36000061035156,143.43600463867188,143.24000549316406,143.4010009765625,,143.4010009765625
2022-10-07 13:00:00+01:00,19272.5,141.85899353027344,142.1219940185547,141.17999267578125,141.45599365234375,141.45599365234375,

I want to extract ('Date', 'Close') of each row like this ('2022-10-03', 141.42999267578125) and create a tuples list from those tuples.

I manually created the list of tuples to show what exactly I am looking for

tuples_list = [
        ('2022-10-03', 141.42999267578125), ('2022-10-04', 143.99000549316406), # row[0-1]
        ('2022-10-04', 143.99000549316406), ('2022-10-05', 142.66000366210938), # row[1-2]
        ('2022-10-05', 142.66000366210938), ('2022-10-06', 143.4010009765625),  # row[2-3]
        ('2022-10-06', 143.4010009765625), ('2022-10-07', 141.45599365234375),  # row[3-4]
    ]

3 Answers 3

1

One approach could be as follows:

df.index = pd.to_datetime(df.index).date.astype(str)

s = pd.concat([df.Close]*2).sort_index()
tuples_list = list(zip(s.index, s))[1:-1]

print(tuples_list)

[('2022-10-03', 141.42999267578125),('2022-10-04', 143.99000549316406),
 ('2022-10-04', 143.99000549316406),('2022-10-05', 142.66000366210938),
 ('2022-10-05', 142.66000366210938),('2022-10-06', 143.4010009765625),
 ('2022-10-06', 143.4010009765625),('2022-10-07', 141.45599365234375)]
Sign up to request clarification or add additional context in comments.

Comments

1

The line below gives the desired list of tuples assuming that df is your pandas dataframe:

list_tuples = list(df[['Date', 'Close']].to_records(index=True))

Edit: Edited answer so that the result is exactly the tuples you want.

3 Comments

This is what it give me [(19268.45833333, 141.42999268), (19269.625, 143.99000549),
@tiberhockey I taught that you have independent indexes. Just change that False in "index=False" to True and I think you are good to go.
to_records does not actually give tuples, it gives numpy records. For some purposes this won't matter, but if I try to give list_tuples to loc as an indexer into another dataframe I get a KeyError. See print(type(list_tuples[0])). To get a list that works as an indexer one option is list(pd.MultiIndex.from_frame(df['Date', 'Close'])).
1

With such simple data, and a non-pandas desired output, using pandas may be overkill.

import csv

with open('data.csv') as f:
    file = csv.reader(f)
    header = next(file)
    tuples_list = [(x[0][:10], float(x[5])) for x in file]

print(tuples_list)

Output:

[('2022-10-03', 141.42999267578125),
 ('2022-10-04', 143.99000549316406),
 ('2022-10-05', 142.66000366210938),
 ('2022-10-06', 143.4010009765625),
 ('2022-10-07', 141.45599365234375)]

from itertools import pairwise, chain

tuples_list = list(chain.from_iterable(pairwise(tuples_list)))
print(tuples_list)

Output:

[('2022-10-03', 141.42999267578125), ('2022-10-04', 143.99000549316406),
 ('2022-10-04', 143.99000549316406), ('2022-10-05', 142.66000366210938),
 ('2022-10-05', 142.66000366210938), ('2022-10-06', 143.4010009765625),
 ('2022-10-06', 143.4010009765625), ('2022-10-07', 141.45599365234375)]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.