0

I have an array like this

array([[('Weather1', 57), 428, '74827'],
       [('weather1', 57), 429, '74828'],
       [('weather1', 57) 409, '74808'],
       [('weather2', 57) 11553, '76568'],
       [('weather2', 57) 11573, '76574'],

I want to return only the [2] values into a new array group by the values in [0]

Final outcome:

array([['74827', '74828', '74808'],['76568', '76574']]

I use this code:

read_data = [] # stores Weather1, Weather2 etc. as we read that
final_array = [] # stores final arrays

# stores data for weather1, then clears it out and
# then stores data for weather2, and so on...
sub_array = [] 

# read each item of array
for x in array:

    # e.g. for first row, is Weather1 already read?
    # No, it's not read
    if x[0].lower() not in read_data:

        # when you reach weather 2 and hit this statement,
        # sub_array will have data from weather1. So, if you find
        # sub_array with data, it is time to add it to the final_array
        # and start fresh with the sub_array
        if len(sub_array) > 0:
            final_array.append(sub_array)
            sub_array = [x[2]]
        # if sub_array is empty, just add data to it
        else:
            sub_array.append(x[2])
        
        # make sure that read_data contains the item you read
        read_data.append(x[0].lower())

    # if weather1 has been read already, just add item to sub_array
    else:
        sub_array.append(x[2])

# After you are done reading all the lines, sub_array may have data in it
# if so, add to the final alrray
if len(sub_array) > 0:
    final_array.append(sub_array)

However, as index 0 is a tuple I get back

AttributeError: 'tuple' object has no attribute 'lower'

Any ideas on how to fix it?

3
  • x[0] is tuple ('Weather1', 57). You need to take the first field of this one as well, so x[0][0] Commented Dec 20, 2021 at 17:51
  • Well, it's just like the error says - you can't call .lower() on a tuple, because the method doesn't exist. Are you trying to lowercase the first element of the tuple? Commented Dec 20, 2021 at 17:51
  • array([['74827', '74828', '74808'],['76568', '76574']] does not have a regular shape, and is not something you have in NumPy. Commented Dec 20, 2021 at 17:55

3 Answers 3

1
import numpy as np
import pandas as pd

data = np.array([[('Weather1', 57), 428, '74827'],
                 [('weather1', 57), 429, '74828'],
                 [('weather1', 57), 409, '74808'],
                 [('weather2', 57), 11553, '76568'],
                 [('weather2', 57), 11573, '76574']])

df = pd.DataFrame(data)

# Fix uppercase "Weather"
df[0] = df[0].apply(lambda x: x[0].lower())

newdata = [group[1].loc[:, 2].values for group in df.groupby(0)]

print(newdata)

[array(['74827', '74828', '74808'], dtype=object), array(['76568', '76574'], dtype=object)]

If you want a list of lists, instead of NumPy array, you can add the following:

newdata = [item.tolist() for item in newdata]

print(newdata)

[['74827', '74828', '74808'], ['76568', '76574']]

Sign up to request clarification or add additional context in comments.

4 Comments

This work but how to not have words array and dtype=object on the outcome>?
You can't, really, because your resulting array is not regular. So it's an array of objects. The two objects here are an array of length 2, and an array of length 3 (both of type string). You can either turn it into a list of lists, but not into a 2-D array of a single (string) dtype.
somehow must be done. Because the solution in my initial comment if the first element was not a tuple (say just the word weather without 57) it return me back the desired outcome without these words
That is because your final result from your code is a list of lists. While the result given as "Final outcome" is a (NumPy) array. These are not the same thing. If you want a list of lists, that is straightforward: just add newdata = [item.tolist() for item in newdata] at the end.
0

cast it to str first ?

str(x[0]).lower()

Comments

0

You can do this in a much shorter and more efficient way by using a combination of np.unique and np.split:

_, counts = np.unique(np.array([str(tup.lower()) for tup in a[:, 0]]), return_counts=True)
splits = np.split(a, counts.cumsum()[:-1])
splits = [s[:, 2].tolist() for s in splits]

Output:

>>> splits
[['74827', '74828', '74808'], ['76568', '76574']]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.