8

I have a numpy array

z = array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa','Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor','Iris-virginica', 'Iris-virginica', 'Iris-virginica'])

I want to replace

Iris-setosa -0
Iris-versicolor - 1
Iris-virginica - 2

to apply logistic regression.

Final output should be like

z = [ 0, 0 ,.. 1,1,.. 2,2,..]

Is there a simple way to do this operation instead of iterating through the array and use replace command?

3
  • 1
    Not exactly what you want, but maybe another idea: pd.Series(z, dtype="category"), see pandas.pydata.org/pandas-docs/stable/categorical.html Commented Feb 18, 2018 at 15:00
  • Your example is ambiguous. Are the strings supposed to be numbered in order of appearance or substituted with a given value? Commented Feb 18, 2018 at 15:15
  • The fact that you want to subsequently apply logistic regression does not make this a machine-learning question; please do not spam the tag (removed) Commented Feb 18, 2018 at 23:22

4 Answers 4

14

Use factorize:

a = pd.factorize(z)[0].tolist()
print (a)
[0, 0, 0, 0, 1, 1, 1, 2, 2, 2]

Or numpy.unique:

a = np.unique(z, return_inverse=True)[1].tolist()
print (a)
[0, 0, 0, 0, 1, 1, 1, 2, 2, 2]
Sign up to request clarification or add additional context in comments.

1 Comment

@Sanjay - Glad can help!
11

you can use a dictionary:

my_dict = {'Iris-setosa': 0, 'Iris-versicolor': 1, 'Iris-virginica': 2}

then use list comprehension:

z = [my_dict[zi] for zi in z]

2 Comments

That really helped. I need to convert it from numpy array to list before doing the operation.
this syntactic sugar is useful right now for me
0

Are you trying to count the number of occurrence as you are trying to do logistic regression?

If yes, you can use the following as well.

import collections
z = ['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa','Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor','Iris-virginica', 'Iris-virginica', 'Iris-virginica']
print (collections.Counter(z))

It will print as below:

Counter({'Iris-setosa': 4, 'Iris-versicolor': 3, 'Iris-virginica': 3})

If you want to print in another way, you can do the following:

import collections
z = ['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa','Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor','Iris-virginica', 'Iris-virginica', 'Iris-virginica']
for item in collections.Counter(z):
    print(str(item)+ ' ' + str(collections.Counter(z)[item]))

The output will be

Iris-setosa 4
Iris-versicolor 3
Iris-virginica 3

Comments

-1
[list(set(z)).index(val) for val in z]

simply put, cast a set out of z to get only unique values, then cast a list out of that set for indexing, then finally use a list comprehension to get the final list. If you have a very large list of strings, I would suggest setting list(set(z)) to a variable outside of the list comprehension

2 Comments

I got the output as [2, 2, 2, 2, 0, 0, 0, 1, 1, 1, 2] but isn't Iris-setosa be set to 0
How about this [list(np.unique(z)).index(val) for val in z]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.