How to Remove Columns in Numpy Array that Contains Non-Numeric Values
When a NumPy numeric array contains NaN values, you often want to drop any column that has at least one NaN so the remaining array contains only fully numeric columns.
For Example: This example shows a simple case where a column containing NaN is removed from a 2×2 array.
import numpy as np
arr = np.array([[1.2, np.nan], [3.4, 5.6]])
print(arr[:, ~np.isnan(arr).any(axis=0)])
Output
[[1.2] [3.4]]
Explanation: The removal of columns with non-numeric (NaN) values is done using np.isnan() + any(axis=0) + boolean indexing:
- np.isnan(arr): Creates a boolean array where NaN values are marked as True.
- .any(axis=0): Checks each column and marks it True if that column contains at least one NaN.
- ~ (NOT operator): Inverts the result so that only columns without NaNs are kept.
- arr[:, mask]: Selects and returns only the valid columns.
This way, all columns containing NaNs are removed in one clean step.
More Examples
Example 1: In this example, we remove columns containing non-numeric values from the 2X3 Numpy array.
import numpy as np
n_arr = np.array([[10.5, 22.5, np.nan], [41, 52.5, np.nan]])
print("Array:")
print(n_arr)
print("Result: ")
print(n_arr[:, ~np.isnan(n_arr).any(axis=0)])
Output
Array: [[10.5 22.5 nan] [41. 52.5 nan]] Result: [[10.5 22.5] [41. 52.5]]
Example 2: Here, we remove columns containing non-numeric values from the 3X3 Numpy array.
import numpy as np
n_arr = np.array([[10.5, 22.5, 10.5], [41, 52.5, 25], [100, np.nan, 41]])
print("Array:")
print(n_arr)
print("Result: ")
print(n_arr[:, ~np.isnan(n_arr).any(axis=0)])
Output
Array: [[ 10.5 22.5 10.5] [ 41. 52.5 25. ] [100. nan 41. ]] Result: [[ 10.5 10.5] [ 41. 25. ] [100. 41. ]]
Example 3: In this example, we remove columns containing non-numeric values from the 5X3 Numpy array.
import numpy as np
n_arr = np.array( [[10.5, 22.5, 3.8],
[23.45, 50, 78.7],
[41, np.nan, np.nan],
[20, 50.20, np.nan],
[18.8, 50.60, 8.8]] )
print("Array:")
print(n_arr)
print("Result:")
print(n_arr[:, ~np.isnan(n_arr).any(axis=0)])
Output
Array: [[10.5 22.5 3.8 ] [23.45 50. 78.7 ] [41. nan nan] [20. 50.2 nan] [18.8 50.6 8.8 ]] Result: [[10.5 ] [23.45] [41. ] [20. ] [18.8 ]]