Using variable to pull data from a DataFrame in a loop

Question

I have a dataframe and I am trying to get columns A,B,and C lined up next to each other in a new dataframe.

I have 15 columns for each letter. I tried to create a for-loop to loop through them so that A1,B1,C1 are next to each other up until A15,B15,and C15 are next to each other as well.

def organize_data(df):

rng = int(input('How many peptides do you have to analyze:  '))

number = 1
frames = []
for i in range(rng):
    if number == 16:
        break
    else:
        Ax='A'+ str(number)
        Bx='B'+ str(number)
        Cx='C'+ str(number)

        A = df.Ax[:41]
        B = df.Bx[:41]
        C = df.Cx[:41]
        dfABC = pd.concat([A,B,C], axis=1)
        frames.append(dfABC)
        number = number+1 

df1 = pd.concat(frames)
return(df1)

I keep getting this error: AttributeError: 'DataFrame' object has no attribute 'Ax'

Is there a way to get around this?

Here is my data set that I'm trying to organize: The "Wavelength" cell is at B29.

It seems you need A = df.iloc[:41, df.columns.get_loc(Ax)] — jezrael
– jezrael, Commented Aug 30, 2017 at 14:17
If you're trying to dynamically access columns, you'll need to use the [...] notation. — cs95
– cs95, Commented Aug 30, 2017 at 14:21
I reopen question because better solution is in my comment - df[Ax][:41] is not nice solution and multiple ][ creates chaining indexing. — jezrael
– jezrael, Commented Aug 30, 2017 at 14:35
Can you show example of your data and fix the indentation please? — zipa
– zipa, Commented Aug 30, 2017 at 14:38
@zipa i just edited the question to include a picture of my data — Sarah Allen
– Sarah Allen, Commented Aug 30, 2017 at 15:17

jezrael · Accepted Answer · 2017-08-30 17:42:37Z

You need iloc with get_loc if need select first 41 rows with custom column name:

A = df.iloc[:41, df.columns.get_loc(Ax)]

EDIT:

I change solution completely - idea is use MultiIndex in columns with level with strings and numbers. Then sort it by second numeric level and last filter by rng. concat function is not necessary.

Sample:

np.random.seed(100)
mux = pd.MultiIndex.from_product([list('ABC'), range(1,16)])

df = pd.DataFrame(np.random.randint(10, size=(3,45)), columns=mux)
df.columns = [''.join((x[0], str(x[1]))) for x in df.columns]
print (df)
   A1  A2  A3  A4  A5  A6  A7  A8  A9  A10 ...   C6  C7  C8  C9  C10  C11  \
0   8   8   3   7   7   0   4   2   5    2 ...    9   3   2   5    8    1   
1   0   8   2   5   1   8   1   5   4    2 ...    6   6   0   7    2    3   
2   3   7   9   0   0   5   9   6   6    5 ...    9   0   9   8    6    2   

   C12  C13  C14  C15  
0    0    7    6    2  
1    5    4    2    4  
2    0    5    3    2  

[3 rows x 45 columns]

#helper df 
df1 = df.columns.to_series().str.extract('([a-zA-Z]+)(\d+)', expand=True)
#convert second column to int
df1[1] = df1[1].astype(int)
#create MultiIndex from df1
df.columns = df1.T.values.tolist()
#sort second level
df = df.sort_index(level=1, axis=1)
print (df)
   A  B  C  A  B  C  A  B  C  A ...  C  A  B  C  A  B  C  A  B  C
  1  1  1  2  2  2  3  3  3  4  ... 12 13 13 13 14 14 14 15 15 15
0  8  4  7  8  0  7  3  9  0  7 ...  0  1  7  7  0  1  6  8  1  2
1  0  3  2  8  6  4  2  3  2  5 ...  5  5  7  4  0  6  2  9  6  4
2  3  2  8  7  3  5  9  8  2  0 ...  0  7  4  5  3  8  3  9  9  2

#filter by condition
rng = 4
df2 = df.loc[:, df.columns.get_level_values(1) <= rng]
#convert MultiIndex to columns
df2.columns = [''.join((x[0], str(x[1]))) for x in df2.columns]
print (df2)
   A1  B1  C1  A2  B2  C2  A3  B3  C3  A4  B4  C4
0   8   4   7   8   0   7   3   9   0   7   6   2
1   0   3   2   8   6   4   2   3   2   5   4   7
2   3   2   8   7   3   5   9   8   2   0   7   7

All together in function:

def organize_data(df):

    rng = int(input('How many peptides do you have to analyze:  '))

    df1 = df.columns.to_series().str.extract('([a-zA-Z]+)(\d+)', expand=True)
    df1[1] = df1[1].astype(int)
    df.columns = df1.T.values.tolist()
    df = df.sort_index(level=1, axis=1)

    df2 = df.loc[:, df.columns.get_level_values(1) <= rng]
    df2.columns = [''.join((x[0], str(x[1]))) for x in df2.columns]
    return df2

a = organize_data(df)
print (a)

This did the trick for me! My only issue now is aligning the data correctly in the final dataframe

Collectives™ on Stack Overflow

Using variable to pull data from a DataFrame in a loop

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related