i tryed to use multilist to hold scraped data from html
but after 50.000 list append i got memory error
So i decided to change lists to numpy array
SapList= []
ListAll = np.array([])
def eachshop(): #filling each list for each shop data
global ListAll
SapList.append(RowNum)
SapList.extend([sap]) # here can be from one to 10 values in one list["sap1","sap2","sap3",...,"sap10"]
SapList.extend([[strLink,ProdName],ProdCode,ProdH,NewPrice, OldPrice,[FileName+'#Komp!A1',KompPrice],[FileName+'#Sav!A1','Sav']])
SapList.extend([ss]) # here can be from null to 80 sublist with 3 values [["id1", "link", "address"],["id80", "link", "address"]]
ListAll = np.append(np.array(SapList))
So then i do print(ListAll) i got exception C:\Python36\scrap.py, LINE 307 "ListAll = np.append(np.array(SapList))"): setting an array element with a sequence
now for speed up i using pool.map
def makePool(cP, func, iters):
try:
pool = ThreadPool(cP)
#perebiraem Url
pool.map_async(func,enumerate(iters, start=2)).get(99999)
pool.close()
pool.join()
except:
print('Pool Error')
raise
finally:
pool.terminate()
So how to use numpy array in my example and reduce memory usage and speedup I\O operation using Numpy?
ListAll = np.append(np.array(SapList))supposed to be doing? It’s obviously not going to append anything toListAll, it’s going to callappendon nothing but the temp array created fromSapList, then store the result inListAll, replacing whatever used to be there. I’m pretty sure that’s not what you want, but I’m not sure what you do want, so I can’t tell you how to fix it.ListAll = np.append(np.array(SapList))is same asListAll.append([SapList])appendmethod onListAll. The former calls anappendfunction on thenpmodule, doesn't even passListAllto it, and then just assigns the result toListAll.