2

I am reading an excel table:

enter image description here

import pandas as pd

df = pd.read_excel('file.xlsx', usecols = 'A,B,C')
print(df)

Now I want to create a list with every row in the table as string. In addition I want to add a 'X' at the end of every string in the list:

keylist = []
list1, list2, list3 = df['A'].tolist(), df['B'].tolist(), df['C'].tolist()

for i in zip(list1, list2, list3):
    val = map(str, i)
    keylist.append('/'.join(val))
    keylist += 'X'

print(keylist)

Everything works except the 'adding a X' part. This results in:

['blue/a/a1', 'X', 'blue/a/a2', 'X', ....

But what I want is:

['blue/a/a1/X', 'blue/a/a2/X',

Thanks beforehand.

2
  • keylist is an array, so doing += is the same as adding the the array. You would want to do something moreso akin to val. Commented Apr 3, 2018 at 17:08
  • Did you try val = map(str, i) keylist.append('/'.join(val+'X')) in your for loop! Commented Apr 3, 2018 at 17:09

6 Answers 6

8

I think better is:

d = {'A': ['blue', 'blue', 'blue', 'red', 'red', 'red', 'yellow', 
           'yellow', 'green', 'green', 'green'],
     'B': ['a', 'a', 'b', 'c', 'c', 'c', 'd', 'e', 'f', 'f', 'g'], 
     'C': ['a1', 'a2', 'b1', 'c1', 'c2', 'c3', 'd1', 'e1', 'f1', 'f2', 'g1']}
df = pd.DataFrame(d)
print (df)
         A  B   C
0     blue  a  a1
1     blue  a  a2
2     blue  b  b1
3      red  c  c1
4      red  c  c2
5      red  c  c3
6   yellow  d  d1
7   yellow  e  e1
8    green  f  f1
9    green  f  f2
10   green  g  g1

keylist = df.apply(lambda x: '/'.join(x), axis=1).add('/X').values.tolist()
print (keylist)

['blue/a/a1/X', 'blue/a/a2/X', 'blue/b/b1/X', 'red/c/c1/X', 'red/c/c2/X', 
 'red/c/c3/X', 'yellow/d/d1/X', 'yellow/e/e1/X', 
 'green/f/f1/X', 'green/f/f2/X', 'green/g/g1/X']

Or if only few columns:

keylist = (df['A'] + '/' + df['B'] + '/' + df['C'] + '/X').values.tolist()

Some timings:

#[110000 rows x 3 columns]
df = pd.concat([df] * 10000, ignore_index=True)

In [364]: %%timeit
     ...: (df['A'] + '/' + df['B'] + '/' + df['C'] + '/X').values.tolist()
     ...: 
60.2 ms ± 1.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [365]: %%timeit
     ...: df.apply(lambda x: '/'.join(x), axis=1).add('/X').tolist()
     ...: 
2.48 s ± 39.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [366]: %%timeit
     ...: list1, list2, list3 = df['A'].tolist(), df['B'].tolist(), df['C'].tolist()
     ...: for i in zip(list1, list2, list3):
     ...:     val = map(str, i)
     ...:     keylist.append('/'.join(val))
     ...:     keylist[-1] += '/X'
     ...: 
192 ms ± 78.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [367]: %%timeit
     ...: df.iloc[:,0].str.cat([df[c] for c in df.columns[1:]],sep='/').tolist()
     ...: 
61.1 ms ± 540 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [368]: %%timeit
     ...: df.assign(New='X').apply('/'.join,1).tolist()
     ...: 
2.51 s ± 76.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [369]: %%timeit
     ...: ['{0}/{1}/{2}/X'.format(i, j, k) for i, j, k in df.values.tolist()]
74.6 ms ± 2.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Sign up to request clarification or add additional context in comments.

6 Comments

join works on an array, doing it your way, will give: blue/a/a2X. if you look at my answer, you can easily augment this by adding it to the val array... so that way JOIN works correctly.
@Fallenreaper ah, but yours will provide a syntax error because of a stray . ;)
touche. hahahaha
@Fallenreaper - hmmm, I guess you dont downvote, right?
Again ? sign ...:-(
|
1

Here is one way using a list comprehension with str.format:

res = ['{0}/{1}/{2}/X'.format(i, j, k) for i, j, k in df.values.tolist()]

# ['blue/a/a1/X', 'blue/a/a2/X', 'blue/b/b1/X', 'red/c/c1/X', ...]

There is no need, as in this solution, to split into 3 lists and zip them.

Comments

1

Base on pandas

df.assign(New='X').apply('/'.join,1).tolist()
Out[812]: ['blue/a/a1/X', 'blue/a/a2/X', 'blue/b/b1/X']

Comments

1

You are doing += do the keylist which adds to that list, you need to do it to the val array.

for i in zip(list1, list2, list3):
  val = map(str,i)
  val += 'X' # you can combine this and the above if you want to look like:
  #val = map(str, i) + 'X'
  keylist.append("/".join(val))
print(keylist)

3 Comments

why? val is a map, so when you use join, it recognizes the entry and adds it there?
Its all good. :) I am just using the variables OP uses, otherwise id have named it something a bit more human readable. :)
Sorry, I dont check it carefully.
0

You can use the cat string operation to join the columns into a single series with a specified sep argument. Then simply convert the new series into a list

 df
         A  B   C
0     blue  a  a1
1     blue  a  a2
2     blue  b  b1
3      red  c  c1
4      red  c  c2
5      red  c  c3
6   yellow  d  d1
7   yellow  e  e1
8    green  f  f1
9    green  f  f2
10   green  g  g1

df.iloc[:,0].str.cat([df[c] for c in df.columns[1:]],sep='/').tolist()

['blue/a/a1', 'blue/a/a2', 'blue/b/b1', 'red/c/c1', 'red/c/c2', 'red/c/c3', 'yellow/d/d1', 'yellow/e/e1', 'green/f/f1', 'green/f/f2', 'green/g/g1']

Comments

0

You could add /X to last item in list everytime in the loop:

for i in zip(list1, list2, list3):
    val = map(str, i)
    keylist.append('/'.join(val))
    keylist[-1] += '/X'

# ['blue/a/a1/X', 'blue/a/a2/X',....]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.