Using Python how to take input from an Excel file , define a function and generate ouput in a new sheet of that Excel file?

Question

Please help me understand how to approach this problem, I'm a beginner in Python.

I have this specific task where I have to import data from an excel file (.xlsx) and take the column 'Count' to perform normalization in Python.

Then under Numpy library define a function in Python to perform this normalization operation (or any operation in future) and print the output(Normalized result) to a new sheet in the same excel workbook

Is it possible to do this task strictly using numpy?*

[ formula used in excel -> ( =(A2-MIN($A$2:$A$11))/(MAX($A$2:$A$11)-MIN($A$2:$A$11))*10 ) which is to be translated in to a function in python using numpy}

instructions provided to me is as follows:

import numpy as nd

def normalize (x):
    """ This function has the logic for normalization
    Inputs
    ------
      x: input count 
    Returns
    ------
      the transformed f(x)  
    """
    return x

Sample Data:

Count	Constant
10	100
20	100
30	100
40	100
50	100
60	100
70	100
80	100
90	100
100	100

This is what I I've coded so far:-

import pandas as pd
import numpy as np

data = pd.read_excel(r"path of file") #import or read excel file
data = data['Count'] #to convert the column into dataframe
data2 = data.to_numpy() #to convert dataframe into numpy array  
print(data2)

def normalize(data2):
    return ((data2 - min(data2))/(max(data2)-min(data2)))*10
  print(normalize(data2))

But this code doesn't seem like to be completely on par with the instructions provided

Is the file a delimited text file or an actual .xls, .xlsx, ...? Numpy has ufuncs equivalent to Python's min and max. You should spend some time with the Numpy user guide - at least the quickstart, absolute basics and fundamentals section. The way you wrote the specifications in your question it doesn't sound like you are required to load the data using Numpy. — wwii
– wwii, Commented Jul 16, 2021 at 13:02
@wwii thanks, it is .xlsx file. I will start reading up the numpy user guide. Thank you — Jackk
– Jackk, Commented Jul 16, 2021 at 13:22

Hugo_Hensoldt · Accepted Answer · 2021-07-16 13:28:46Z

1

I assumed your excel file is in csv format, if not, you can open and save your file in csv.

import numpy as np

#Opening data just with numpy lib
from numpy import genfromtxt
data = genfromtxt('Sample data.csv', delimiter=';') 

#Defining normalize function
def normalize(x,MA,MI):
  return ((x - MI)/(MA-MI))*10

#Cleaning ignored values
data2 = np.delete(data, 1, axis=1)     #Constant
data3 = np.delete(data2, 0, axis=0)    #Column Names

#Precalculating Min and Max
MI=np.amin(data3) 
MA=np.amax(data3)

#Applying function to the array
data4=np.apply_along_axis(normalize,1,data3,MA,MI)

print(data4)

Output array:

[[ 0.        ]
 [ 1.11111111]
 [ 2.22222222]
 [ 3.33333333]
 [ 4.44444444]
 [ 5.55555556]
 [ 6.66666667]
 [ 7.77777778]
 [ 8.88888889]
 [10.        ]]

answered Jul 16, 2021 at 13:28

Hugo_Hensoldt

1108 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

wwii Over a year ago

OP said it is a .xlsx file in one of the comments.

Jackk Over a year ago

@Hugo_Hensoldt thanks for sharing this. When I tried running this code I'm getting the following error : AxisError: axis 1 is out of bounds for array of dimension 1 for the line, data2 = np.delete(data, 1, axis=1) and when I changed axis=0 in previous line, then I got the same error for the next line which is, data3 = np.delete(data2, 0, axis=0)

Hugo_Hensoldt Over a year ago

You're welcome, i've succesfully compiled this code with my csv sample data, so my best guess, is that you have a problemn in your data opening part. Try print(data) to see what looks like. It should look like:

[[ nan  nan]  [ 10. 100.]  [ 20. 100.]  [ 30. 100.]  [ 40. 100.]  [ 50. 100.]  [ 60. 100.]  [ 70. 100.]  [ 80. 100.]  [ 90. 100.]  [100. 100.]]

anonymouscat · Accepted Answer · 2021-07-16 12:48:50Z

1

I do not think you are actually accessing the value at A2. You save the array saved into data2, but when you go to call A2 in your normalization equation, you are just calling the entire array. I think that your normalization method should be as follows:

def normalize(data2):
   return ((data2[INDEX OF A2] - min(data2))/(max(data2)-min(data2)))*10

answered Jul 16, 2021 at 12:48

anonymouscat

411 silver badge3 bronze badges

5 Comments

Jackk Over a year ago

getting SyntaxError: invalid syntax for INDEX OF A2

wwii Over a year ago

All of the operations should be performed on the Series - data2[INDEX OF A2].

anonymouscat Over a year ago

@Jackk what are you inputting for [INDEX OF A2]

Jackk Over a year ago

@anonymouscat I don't know how to input the index location of A2 , I was using [INDEX OF A2] as it is

anonymouscat Over a year ago

@Jackk arrays are split by different indexes, and based on the way you are doing it, you have entered each cell into an array. too get the value of A2, you will have to find the index of the array that corresponds to the data. To access arrays at indexes, you use array_name[INDEX]. Array indices start at 0, meaning that index 1 is actually the second entry in the array. Does that make sense?

Collectives™ on Stack Overflow

Using Python how to take input from an Excel file , define a function and generate ouput in a new sheet of that Excel file?

2 Answers 2

3 Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related