32

I have been working on scikit-learn SVMs for a binary classification problem. I have calculated the features of audio files and wrote them into a CSV file. This is how each row in a CSV file looks like:

"13_10 The Long And Winding Road " "[-6.5633095666136669e-16,-1.56E-15,-3.21E-15,-2.20E-
15,-2.52E-15,-3.04E-15,-3.39E-15,-3.47E-15,-3.07E-15,-6.02E-15,-3.00E-15,-4.77E-15,-3.05E-
15,-2.13E-15,-1.57E-15,-1.87E-15,-2.05E-15,-1.76E-15,-1.38E-15,-9.89E-16,-7.89E-16,-8.99E-
16,-1.09E-15,-7.26E-16,-8.68E-16,-4.68E-16,-2.82E-16,-1.99E-16,-1.75E-16,-2.18E-16,-1.43E-
16,-1.56E-16,-1.91E-16,-1.21E-16,-4.82E-17,-4.39E-17,-2.89E-17,-2.05E-17,0.0]" 0

The first column has the name of the Audio, second column has the feature array and the last element is the label {0,1} for binary classification.

There are 39 float values in the array. I am using the following code to extract them from the CSV file.

with open('File.csv', 'rb') as csvfile:
    albumreader = csv.reader(csvfile, delimiter=' ')
    data = list()
    for row in albumreader:
        data.append(row[0:]) 
data = np.array(data)

X_train = list()
Y_train = list()
k = data.shape[0]

for i in range(k):
    feature = data[i][1]
    x = map(float, feature[1:-2].split(','))
    X_train.append(x)
    label = data[i][2]
    y = float(label)
    Y_train.append(y)    

So when I print X_train and Y_train I get exact values in an array. But when I use

clf = svm.SVC(C=1.0, cache_size=200,kernel='linear', max_iter=-1)
clf.fit(X_train,Y_train)

I get the error saying

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-
packages\spyderlib\widgets\externalshell\sitecustomize.py",line 540, in runfile
execfile(filename, namespace)
File "SVM_test.py", line 55, in <module>
clf.fit(X_train,Y_train)
File "sklearn\svm\base.py", line 137, in fit
File "sklearn\utils\validation.py", line 165, in atleast2d_or_csr
File "sklearn\utils\validation.py", line 142, in _atleast2d_or_sparse
File "sklearn\utils\validation.py", line 120, in array2d
File "C:\Python27\lib\site-packages\numpy\core\numeric.py", line 460, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.

Can someone help me as to what I can do now? I am really not sure what is happening inside. Both the dimensions of X_train and Y_train are same [X_train has 21 vectors with 39 elements and Y_train has 21 floats {0 or 1}, I don't see what made these errors.

Note: I have a feeling that something might be wrong while I convert the numpy array to string and then a string to a numpy array. Thanks in advance.

Edit: X_Train is very large. Here it is..

[[93812.4999999983, 73189.57452, 48892.17363, 37682.69053, 33709.51536, 20815.68443,   12476.88854, 13364.13645, 9574.010981, 5844.293383, 7910.017736, 12721.38592, 14184.99241, 6988.131481, 9407.380437, 6333.852471, 5688.156663, 7167.61338, 6911.084942, 9210.064235, 5732.338515, 3585.039683, 4433.278772, 4757.658741, 3387.832928, 2711.640327, 2680.255742, 1649.410788, 2024.333977, 997.2348795, 1102.115501, 1386.86396, 1160.477719, 883.941971, 881.2712624, 749.3620066, 885.6355941, 514.1635441, 0.0], [93411.33935126709, 90714.51224, 89773.71828, 61018.71033, 28082.94493, 10120.93228, 11106.07725, 6204.140734, 5968.528906, 4970.099848, 6967.870007, 6990.611982, 7656.630743, 6615.957476, 5573.621516, 8957.245225, 8512.408652, 6976.021692, 7774.215884, 5301.046573, 4666.784091, 2539.587812, 2953.578612, 3529.863917, 2365.101263, 2579.870258, 2890.325096, 3302.179572, 2078.005268, 1425.18236, 1297.961119, 736.4896705, 640.0635888, 819.022382, 659.9559469, 438.2773842, 359.3957991, 193.9937669, 0.0], [95528.45827960827, 79000.64725, 75540.32258, 47915.39365, 29573.63325, 13554.15721, 10101.04124, 6935.685456, 13681.96711, 7726.754596, 9413.96529, 9468.785586, 10479.23762, 10070.81121, 8893.475453, 9517.553541, 8493.077533, 8021.721412, 8568.069341, 7687.282084, 9902.16325, 5442.263263, 5575.258138, 4748.557573, 4580.647869, 3014.91771, 3958.708771, 2851.846841, 3407.31788, 1982.369432, 1937.459179, 1689.049684, 1457.579778, 1055.411047, 1048.471861, 661.6174333, 827.8371903, 414.802354], [101683.46698748806, 62367.04137, 66444.15995, 49621.45404, 31623.19485, 16585.34427, 12271.46378, 12114.5615, 6666.281052, 9335.886213, 19314.70299, 22588.00911, 14133.31813, 12723.03772, 7994.399321, 11447.449, 15457.39519, 7419.208867, 9286.751692, 6128.746537, 5617.886066, 4461.131891, 4651.73188, 5835.270092, 3876.10397, 4499.228748, 2661.999151, 1431.362029, 1378.115091, 1048.827946, 1470.297845, 1087.453644, 825.6318213, 861.5003481, 804.8519616, 397.0719915, 368.8037827, 293.36727], [96614.66763477474, 89674.79785, 73045.22026, 55387.48162, 32450.76131, 26161.93729, 16379.95699, 13446.77762, 6178.297767, 4499.9064, 6128.624979, 4928.968691, 7139.579976, 6442.404748, 7303.917218, 9064.476552, 8246.412739, 4526.169172, 4931.980606, 4022.38625, 3193.080061, 3991.709836, 4894.262891, 4523.545798, 5013.65655, 3165.268896, 2252.272798, 1971.857637, 1543.455559, 1248.305408, 1340.303682, 1069.466847, 1062.971087, 596.4763587, 541.7390803, 481.9598053, 261.6165905, 135.050925], [77116.86410716272, 85174.88022, 48949.81474, 39272.16867, 28721.41507, 26604.82082, 17057.75385, 11417.45143, 12775.94149, 8095.318819, 8318.738856, 7768.406613, 9501.155323, 8215.579012, 5801.439936, 6997.611748, 8358.126592, 6710.072432, 7903.976639, 4770.389995, 4443.449546, 3622.278619, 3628.985312, 4025.879147, 3378.124716, 1681.144815, 1873.675902, 1813.454359, 1203.261884, 734.9896092, 612.7767898, 581.1641439, 554.9952946, 338.9208239, 329.6306536, 210.3361409, 124.684456, 95.1698974], [86000.24134707314, 54315.80346, 61723.06357, 48194.93238, 34145.18298, 18060.21908, 17759.95552, 13594.71484, 10034.81255, 6892.428679, 13609.12234, 11345.97425, 12640.27575, 13636.73634, 8353.154837, 11543.51778, 9620.892875, 5364.536625, 6645.647746, 6939.929388, 6404.367983, 4279.002491, 5473.449778, 5173.72645, 4161.012572, 3189.349797, 1868.016199, 2370.813774, 1991.805589, 1862.750613, 1535.097522, 1195.019326, 824.4997101, 836.5762868, 758.8865079, 739.0096703, 426.339462, 495.362511], [88356.8775920093, 68677.18631, 56499.17126, 41069.83582, 34004.99481, 21584.94408, 16827.63584, 10875.88263, 8838.404327, 10399.33201, 10247.97332, 11592.57345, 6888.99984, 8027.86374, 4396.353004, 4926.542018, 4160.408132, 4829.051031, 5104.507749, 4445.908694, 4113.401198, 2070.059053, 2331.063956, 3091.764189, 2708.490628, 1357.792132, 1476.379979, 1099.46743, 895.2046416, 1017.410994, 855.9326154, 807.2299975, 817.8896259, 688.1633806, 620.1147918, 404.4791452, 355.3012015, 155.124636], [129161.3158422606, 99871.12426, 69682.53863, 42152.57846, 27722.10719, 16851.46834, 12503.65957, 15820.8482, 10208.86252, 3737.281589, 11388.29292, 9216.418551, 8412.969115, 8915.691889, 7214.795344, 6312.935476, 5691.760401, 4452.333587, 6080.803383, 3169.211512, 4640.513939, 2965.070935, 2603.678979, 3427.596811, 2650.097593, 3407.197764, 2399.210804, 1585.540133, 900.6057596, 1562.799097, 1414.458688, 1085.727804, 862.853398, 1046.809149, 1299.422095, 452.1395434, 416.0278005, 342.487369], [97676.58730158686, 85928.37013, 60031.54702, 50283.65633, 30440.49477, 23396.44028, 17693.84492, 13834.72723, 13079.6, 9484.172923, 11026.12866, 15489.77935, 14751.23748, 7719.575611, 6916.062149, 9947.922301, 9860.230801, 6685.554777, 5314.504743, 6412.026375, 5126.472976, 3994.412881, 3469.94381, 3087.75188, 2150.012155, 2510.441776, 1633.896465, 1468.22101, 1451.997957, 1594.288508, 1208.749937, 1539.411357, 846.1440547, 1015.738147, 760.9050287, 531.4752058, 352.2906744, 256.992846], [99873.48353552721, 96128.33417, 56062.95108, 48316.51261, 33803.61475, 20090.40769, 14532.69355, 16973.62408, 11745.412, 10555.56359, 12415.12332, 11311.00716, 13055.02538, 13457.43473, 11949.02017, 13726.34027, 13210.19444, 6924.913491, 7526.293551, 6489.797287, 7504.193589, 3693.345327, 3173.144967, 4589.951959, 3817.607517, 2296.577132, 4241.66248, 2298.259695, 2104.233705, 1894.800787, 1435.902299, 1237.861542, 1008.052264, 743.557111, 447.3644689, 360.231905, 263.6887002, 252.53243], [118318.40927047582, 96894.04475, 72455.95855, 53538.90521, 34270.2485, 14028.66282, 6110.994324, 10831.06944, 6500.061124, 5648.546259, 9746.722376, 11098.67455, 12414.31738, 11859.15818, 5661.36057, 6467.490449, 7160.019668, 4986.101354, 4805.715894, 4384.860917, 4818.433908, 2776.480858, 2906.711958, 4180.355966, 3029.563639, 2121.677425, 2977.055372, 1650.875378, 1328.284924, 1641.967101, 1374.844716, 1269.983055, 756.2822371, 746.9782069, 635.1025738, 901.5181204, 500.4240422, 124.234986], [99496.1074660524, 91134.19642, 64615.65163, 51749.95315, 27017.75136, 17498.19736, 8686.464718, 6354.494714, 6279.181765, 6011.661362, 9583.683802, 16802.58819, 12848.82539, 12448.85086, 9717.906293, 6025.712047, 8968.944145, 6116.427844, 8009.500521, 5857.252734, 5994.629798, 4602.865888, 5568.279578, 3847.961198, 3664.838032, 2285.641295, 2343.300802, 1538.656643, 1595.004126, 1438.685894, 1278.233128, 1138.847548, 1387.660031, 727.3346259, 443.3437923, 399.422316, 202.3671643, 210.818774], [97897.81181619188, 81534.24658, 86124.34023, 55859.41234, 43498.35095, 16317.93548, 9240.704588, 8335.639737, 5398.77203, 2959.587234, 7638.934756, 9237.569061, 9669.92492, 6395.762472, 5297.481894, 4628.757031, 5965.00084, 5360.168945, 4918.802753, 5403.035015, 7760.124783, 4316.46269, 3586.003412, 4862.517393, 2722.334238, 1950.153709, 2308.64693, 1738.602095, 1431.956923, 1195.875585, 903.5619486, 628.8441079, 378.5951575, 279.5559759, 290.8523867, 185.8872588, 124.4224622, 102.6474251], [92929.03125177219, 66037.16827, 85713.22692, 60594.81708, 21299.03928, 9728.745394, 7164.560274, 7530.287996, 3986.197072, 4768.423334, 7965.588661, 6884.742393, 7813.113615, 6783.772795, 5068.375149, 5563.205324, 4549.089711, 4178.977925, 7176.864923, 3595.204266, 4075.654498, 3667.874878, 5018.867408, 4632.204595, 4236.022945, 2419.634542, 1965.732854, 2017.314496, 1125.444672, 1776.994722, 1380.972752, 877.5693874, 1048.039171, 698.4293241, 587.3589805, 425.3561446, 374.9688448, 242.143167], [112279.4547224921, 85626.58906, 82479.88981, 41194.72139, 18581.67331, 17171.661, 11041.06798, 7470.697485, 5647.489476, 5413.921458, 6258.45235, 7817.02576, 5690.588758, 5018.057148, 2835.675844, 4192.365122, 5264.669752, 2899.863762, 4722.075443, 4359.368543, 4475.52712, 4364.193393, 4366.760559, 4466.265791, 3581.127965, 3229.694902, 3061.592084, 2761.368431, 2924.520852, 2278.74424, 1842.130366, 1353.160812, 1061.970453, 801.4987863, 559.7692834, 542.6125554, 365.0923416, 345.207555], [103517.74691357967, 75660.08945, 77823.62831, 44178.34395, 30627.84085, 16822.14795, 12153.82383, 10477.03604, 6737.154621, 3948.567091, 5952.492101, 6657.190597, 8458.524435, 4644.542091, 3262.595869, 6196.748153, 4725.493005, 3131.648336, 3043.832975, 2397.211069, 2221.444205, 1846.007568, 1906.256992, 2565.899774, 1879.678929, 1983.431392, 2057.925713, 1379.158985, 1161.566123, 1269.932159, 1882.60896, 2175.463202, 1945.131584, 2617.451168, 1724.479089, 934.2682688, 703.1608361, 325.546], [93568.70860926574, 94747.49539969431, 46432.77848910925, 34021.920729891295, 20420.991633759644, 11780.466421174959, 11808.677934216039, 8356.053623407755, 5251.866299007888, 1837.1346095714694, 3388.9444867864604, 4160.840941722876, 3099.1858407062873, 2498.7020047692304, 2320.2543950190047, 3123.103649596546, 2353.3994748227874, 2282.188646923959, 2461.343070326571, 1867.3943363024234, 2097.623570329987, 2009.6550578189285, 3308.2027220730074, 2765.1388333698114, 1557.1149504889588, 1365.611056602633, 1960.1916988919656, 1357.4554303292293, 1174.7058492639005, 1132.4243713198962, 845.4972050001261, 1168.2703928255426, 792.9426082625839, 670.3864757268884, 462.94718979251763, 418.287362938786, 440.0999154920886, 177.49705973335398, 0.0], [95355.29766162521, 80982.187596427, 52611.83577098383, 32094.021907352668, 17954.16900608238, 10539.432714398477, 8455.780912660928, 7826.206728864228, 6509.019127983875, 3428.593131775805, 4133.834750579424, 4897.866949416399, 4527.962919676826, 3097.9755890532115, 2644.294656259542, 3715.9623636641186, 2820.3307205895694, 2502.4555417041665, 4294.009887075389, 3305.480815069842, 3473.739729060158, 3436.008663252062, 2646.057627969427, 2915.118003316749, 2807.214040627724, 2182.2047542975124, 2307.7279832228096, 2051.914227220658, 1701.1785697138466, 1387.86622139378, 1717.4780638249865, 1444.4320186566786, 1543.3397450160378, 1008.7972827019012, 804.7763630817929, 727.0076251244793, 661.7971983605773, 328.023137248546, 0.0], [106950.75545607107, 83171.33894927795, 98570.84168082179, 53995.601284217235, 34045.2113137451, 32682.002511908893, 20258.01771044016, 17863.78159524713, 10999.026649078776, 7606.650910143417, 8186.182643934389, 12307.240199947704, 6014.871257290792, 4781.08981401508, 5131.609324634855, 4391.107045269739, 4364.496837469433, 3795.810404058682, 5693.929878923241, 3511.0866864164072, 4967.40355405853, 3290.291028496737, 2401.232195128987, 2787.2578565673602, 2210.985797970096, 2106.714353398232, 1799.725035771931, 2223.1076215378416, 1189.234114777526, 1003.1624544891614, 1046.7700894681655, 812.1805254193989, 750.3209854314467, 893.172975198784, 492.44092578555313, 379.87738447537436, 169.4616484512177, 100.56120686339501, 0.0]]

Y_Train is just the labels for individual sets of features. It is like this:

[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0]

Hope that helps!

8
  • Is this your actual indentation? It's a little unusual to me that you do feature = data[i][1] inside a loop, but then don't use feature at all until the loop is over. And something has to be inside the with open('File.csv', 'rb') as csvfile: block, or the code would crash with an IndentationError. Commented Aug 25, 2014 at 12:13
  • Sorry @Kevin, thanks for pointing it out. I did not check it properly. I edited it now. Commented Aug 25, 2014 at 12:46
  • Please show what X_train and Y_train look like. Commented Aug 25, 2014 at 14:44
  • 3
    If you do X_train = np.asarray(X_train) and similarly for Y_train, what are the shapes? (NumPy arrays are also easier to print.) Commented Aug 25, 2014 at 14:56
  • 2
    This points to non uniform length lists. If np.asarray(X_train).dtype is o or object, then check out np.unique(map(len, X_train)) and see how many values there are. There should be only one. Commented Aug 25, 2014 at 19:47

3 Answers 3

61

Finally I found the answer to my question with the help of some ideas from @larsmans and @eickenberg. The problem was that X_train did not have the same number of elements in all the arrays. So, it was not able to form a 2D array. Now that I have added an additional value to that array, the dimensionality matched for all the 1D arrays and X_train was able to form a 2D array. Thanks for the ideas guys!

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks. Feature extractor module had been returning a None value in case of exception rather than creating a zero vector - thus the same problem.
I have the same issue. What extra values did you pad it with? Zeros?
@Nikki_Champ can you tell the exact steps you did for that. I am also facing the same issue.
Great solution. The problem was with different size sub arrays.
18

I had the same problem. The size of the data was consistent, so that did not help. What did work: clf.fit(list(X_train),Y_train)

2 Comments

Abso-freaking-lutely! Seems to be an issue with GridSearchCV after updating from Python 2 to Python 3
It did work for me, not sure why it didn't work without list, it was np.array
2

I see the error when X_train has n elements and all n elements are of not same size. Workaround is to make sure all n elements in the X_train are of same size either by padding or trimming techniques.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.