All,
I am going to compute some feature values using the following python codes. But, because the input sizes are too big, it is very time-consuming. Please help me to optimize the codes.
leaving_volume=len([x for x in pickup_ids if x not in dropoff_ids])
arriving_volume=len([x for x in dropoff_ids if x not in pickup_ids])
transition_volume=len([x for x in dropoff_ids if x in pickup_ids])
union_ids=list(set(pickup_ids + dropoff_ids))
busstop_ids=[x for x in union_ids if self.geoitems[x].fare>0]
busstop_density=np.sum([Util.Geodist(self.geoitems[x].orilat, self.geoitems[x].orilng, self.geoitems[x].destlat, self.geoitems[x].destlng)/(1000*self.geoitems[x].fare) for x in busstop_ids])/len(busstop_ids) if len(busstop_ids) > 0 else 0
busstop_ids=[x for x in union_ids if self.geoitems[x].balance>0]
smartcard_balance=np.sum([self.geoitems[x].balance for x in busstop_ids])/len(busstop_ids) if len(busstop_ids) > 0 else 0
Hi, All,
Here is my revised version. I run this code on my GPS traces data. It is faster.
intersect_ids=set(pickup_ids).intersection( set(dropoff_ids) )
union_ids=list(set(pickup_ids + dropoff_ids))
leaving_ids=set(pickup_ids)-intersect_ids
leaving_volume=len(leaving_ids)
arriving_ids=set(dropoff_ids)-intersect_ids
arriving_volume=len(arriving_ids)
transition_volume=len(intersect_ids)
busstop_density=np.mean([Util.Geodist(self.geoitems[x].orilat, self.geoitems[x].orilng, self.geoitems[x].destlat, self.geoitems[x].destlng)/(1000*self.geoitems[x].fare) for x in union_ids if self.geoitems[x].fare>0])
if not busstop_density > 0:
busstop_density = 0
smartcard_balance=np.mean([self.geoitems[x].balance for x in union_ids if self.geoitems[x].balance>0])
if not smartcard_balance > 0:
smartcard_balance = 0
Many thanks for the help.