0

i get values from an orderbook as a list like this:

list1 = [...,'ethbtc', '0.077666', '10', '0.077680', '15',...]
------------------------^symbol-----^value-----^quantity--

There are around 100 symbols in this list and 40 values for each symbol. They are always in the same order.
I would like to find out at what maximum price my system buys in this moment if I pay say 100 % of my balance.

So if I want to buy 11 ETH at 0.077666 the real price would be 0.077680 because there are only 10 ETH available at first price.
I dont want to get the average because that would be to much at the moment

My code has a nested for loop and loops through 2 lists:

  1. coinlist = where all 100 symbols are listed like this symbollist = [ethbtc, eoseth,...]
  2. list of indexes called a because the values and quantities are always at the same spot
    a = ['1', '3', '5', ...]

My Code:

for symbolnow in symbollist:
sumlist = []
    for i in a:
        quantity = float(list1[list1.index(symbolnow) + (i+1)] if symbolnow in list1 else 0)
        sumlist.append(quantity)
        if sum(sumlist) > mycurrentbalance:
            maxvalue = float(list1[list1.index(symbolnow) + i] if symbolnow in list1 else -1)
            break
        else:
            maxvalue = -1

So what does this code do:
1) loop through every symbol in the symbollist
2) for every found symbol i look for the available quantity
3) if my balance (i.e. 10 ETH) is smaller than qty the loop breaks
4) if not keeps searching and summarizing every qty in a sum list until there is enough.

The code works as intended but not that fast. As expected list1.index takes long to execute..

Question
How would a faster code work. Is a list comprehension better in this scenario or even regex? Is my code very ugly?

Thank you in advance!

EDIT:
to clarify the input and desired output, a sample:

list1 = [...,'ethbtc', '0.077666', '1', '0.077680', '1.5', '0.077710', '3', '0.078200', '4',...]
mycurrentbalance = 5.5 <-- balance is in ETH
every third entry in list1 is the quantity in ETH so in the list it would be ['1', '1.5', '3', '4']

so if i want to sell all of my ETH (in this case 5.5) the max value would be '0.077710'

list1 contains 100 symbols so before and after 'ethbtc' there are other values quantities and symbols

1
  • Can you post a few sample lines of input data the corresponding and desired output? Commented May 25, 2018 at 12:25

3 Answers 3

3

Preprocess list1 and store it in a dict. This means you only iterate over list1 once instead of every time your inner loop runs.

price_dict = {'ethbtc': ['0.077666', '10', '0.077680', '15'], 'btceth': [...], ...}

Instead of iterating over a, iterate over a range (Python 3) or xrange (Python 2). This will use an iterator instead of a list, and make your code more flexible.

range(0, len(price_dict[symbol]), 2)
Sign up to request clarification or add additional context in comments.

2 Comments

thank you, how can I get i.e. the value '10' in price_dict?
you can get the prices with range(0, len(price_dict[symbol]), 2) and the quantities with range(1, len(price_dict[symbol]), 2)
1

In your case I think using a slice object would help with your 'a' loop, if there is a fixed interval. You can save a list slice to an object, as shown below (also, 1 or 2 other tips). I agree with user above that if you have a chance to pre-process that input data, then you really must. I would recommend using the pandas library for that, because it is very fast, but dictionaries will also allow for hashing the values.

input_data = ['ethbtc', '0.0776666', '10', '0.077680', '15']  # Give your variables meaningful names

length = 20 # a variable to store how long a list of values is for a particular symbol.

for symbol in symbollist: # Use meaningful names if loops too
    start = input_data.index(symbol)  # break up longer lines
    # Some exception handling here
    indxs = slice(start: start+length:2) # python lets you create slice objects
    quantities = [float(number) for number in input_data[indxs]]

    if sum(quantities) > mycurrentbalance:
        # Whatever code here
        ....

Comments

0

In addition to the answer from user3080953, you have to preprocess your data not only because that will be more efficient, but because it will help you to handle the complexity. Here, you are doing two things at once: decoding your list and using the data. First decode, then use.

The target format should be, in my opinion:

prices_and_quantities_by_symbol = {
    'ethbtc': {
        'prices':[0.077666, 0.077680, 0.077710, 0.078200], 
        'quantities':[1, 1.5, 3, 4]
    }, 
    'btceth': {
        ...
    }, 
...}

Now, you just have to do:

for symbol, prices_and_quantities in prices_and_quantities_by_symbol.items(): # O(len(symbol_list))
    total = 0
    for p, q in zip(prices_and_quantities["prices"], prices_and_quantities["quantities"]): # O(len(quantities))
        total += q # the running sum
        if total >= my_current_balance:
            yield symbol, p # this will yield the symbol and the associated max_value
            break

How to get the data in the target format? Just iterate over the list and, if you find a symbol, begin to store the values and quantities until the next symbol:

prices_and_quantities_by_symbol = {}
symbol_set = (symbol_list) # O(len(symbol_list))
for i, v in enumerate(list1): # O(len(list1))
    if v in symbol_set:  # amortized O(1) lookup
        current_prices = []
        current_quantities = []
        current_start = i+1
        prices_and_quantities_by_symbol[v] = {
            'prices':current_prices, 
            'quantities':current_quantities
        }
    else: # a value or a quantity
        (current_prices if (i-current_start)%2==0 else current_quantities).append(float(v))

You have a slight but interesting optimization, especially if your list of quantities/values are long. Don't store the quantity but the running total of quantities:

prices_and_running_total_by_symbol = {
    'ethbtc': {
        'prices':[0.077666, 0.077680, 0.077710, 0.078200], 
        'running_total':[1, 2.5, 5.5, 9.5]
    }, 
    'btceth': {
        ...
    }, 
...}

Now, you can find very quickly your max_value, using bisect. The code becomes more easy to understand, since bisect.bisect_left(rts, my_current_balance) will return the index of the first running total >= my_current_balance:

for symbol, prices_and_running_totals in prices_and_running_totals_by_symbol.items(): # O(len(symbol_list))
    ps = prices_and_running_totals["prices"]
    rts = prices_and_running_totals["running_total"]
    i = bisect.bisect_left(rts, my_current_balance) # O(log(len(rts)))
    yield symbol, ps[i] # this will yield the symbol and the associated max_value

To build the running total, you have to handle differently the prices and the quantities:

# O(len(list1))
...
if v in symbol_set:  # amortized O(1) lookup*
    ...
elif (i-current_start)%2==0:
    current_prices.append(float(v))
else:
    current_running_totals.append((current_running_totals[-1] if current_running_totals else 0.0) + float(v))

Put everything into functions (or better, methods of a class):

prices_and_running_totals_by_symbol = process_data(list1)
for symbol, max_value in symbols_max_values(prices_and_running_totals_by_symbol, my_current_balance):
    print(symbol, max_value)

You can see how, by splitting the problem in two parts (decode and use), the code becomes faster and (in my opinion) easier to understand (I didn't put the comments, but they should be there).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.