2

Please help me understand how to approach this problem, I'm a beginner in Python.

I have this specific task where I have to import data from an excel file (.xlsx) and take the column 'Count' to perform normalization in Python.

Then under Numpy library define a function in Python to perform this normalization operation (or any operation in future) and print the output(Normalized result) to a new sheet in the same excel workbook

Is it possible to do this task strictly using numpy?*

[ formula used in excel -> ( =(A2-MIN($A$2:$A$11))/(MAX($A$2:$A$11)-MIN($A$2:$A$11))*10 ) which is to be translated in to a function in python using numpy}

instructions provided to me is as follows:

import numpy as nd

def normalize (x):
    """ This function has the logic for normalization
    Inputs
    ------
      x: input count 
    Returns
    ------
      the transformed f(x)  
    """
    return x 

Sample Data:

Count Constant
10 100
20 100
30 100
40 100
50 100
60 100
70 100
80 100
90 100
100 100

This is what I I've coded so far:-

import pandas as pd
import numpy as np

data = pd.read_excel(r"path of file") #import or read excel file
data = data['Count'] #to convert the column into dataframe
data2 = data.to_numpy() #to convert dataframe into numpy array  
print(data2)

def normalize(data2):
    return ((data2 - min(data2))/(max(data2)-min(data2)))*10
  print(normalize(data2))

But this code doesn't seem like to be completely on par with the instructions provided

3
  • 1
    Please fix the indentation of the return statement. Commented Jul 16, 2021 at 12:51
  • Is the file a delimited text file or an actual .xls, .xlsx, ...? Numpy has ufuncs equivalent to Python's min and max. You should spend some time with the Numpy user guide - at least the quickstart, absolute basics and fundamentals section. The way you wrote the specifications in your question it doesn't sound like you are required to load the data using Numpy. Commented Jul 16, 2021 at 13:02
  • @wwii thanks, it is .xlsx file. I will start reading up the numpy user guide. Thank you Commented Jul 16, 2021 at 13:22

2 Answers 2

1

I assumed your excel file is in csv format, if not, you can open and save your file in csv.

import numpy as np

#Opening data just with numpy lib
from numpy import genfromtxt
data = genfromtxt('Sample data.csv', delimiter=';') 

#Defining normalize function
def normalize(x,MA,MI):
  return ((x - MI)/(MA-MI))*10

#Cleaning ignored values
data2 = np.delete(data, 1, axis=1)     #Constant
data3 = np.delete(data2, 0, axis=0)    #Column Names

#Precalculating Min and Max
MI=np.amin(data3) 
MA=np.amax(data3)

#Applying function to the array
data4=np.apply_along_axis(normalize,1,data3,MA,MI)

print(data4)

Output array:

[[ 0.        ]
 [ 1.11111111]
 [ 2.22222222]
 [ 3.33333333]
 [ 4.44444444]
 [ 5.55555556]
 [ 6.66666667]
 [ 7.77777778]
 [ 8.88888889]
 [10.        ]]
Sign up to request clarification or add additional context in comments.

3 Comments

OP said it is a .xlsx file in one of the comments.
@Hugo_Hensoldt thanks for sharing this. When I tried running this code I'm getting the following error : AxisError: axis 1 is out of bounds for array of dimension 1 for the line, data2 = np.delete(data, 1, axis=1) and when I changed axis=0 in previous line, then I got the same error for the next line which is, data3 = np.delete(data2, 0, axis=0)
You're welcome, i've succesfully compiled this code with my csv sample data, so my best guess, is that you have a problemn in your data opening part. Try print(data) to see what looks like. It should look like: [[ nan nan] [ 10. 100.] [ 20. 100.] [ 30. 100.] [ 40. 100.] [ 50. 100.] [ 60. 100.] [ 70. 100.] [ 80. 100.] [ 90. 100.] [100. 100.]]
1

I do not think you are actually accessing the value at A2. You save the array saved into data2, but when you go to call A2 in your normalization equation, you are just calling the entire array. I think that your normalization method should be as follows:

def normalize(data2):
   return ((data2[INDEX OF A2] - min(data2))/(max(data2)-min(data2)))*10

5 Comments

getting SyntaxError: invalid syntax for INDEX OF A2
All of the operations should be performed on the Series - data2[INDEX OF A2].
@Jackk what are you inputting for [INDEX OF A2]
@anonymouscat I don't know how to input the index location of A2 , I was using [INDEX OF A2] as it is
@Jackk arrays are split by different indexes, and based on the way you are doing it, you have entered each cell into an array. too get the value of A2, you will have to find the index of the array that corresponds to the data. To access arrays at indexes, you use array_name[INDEX]. Array indices start at 0, meaning that index 1 is actually the second entry in the array. Does that make sense?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.