1

I have a table like this:

col1 col2
ben US-US-Uk
Man Uk-NL-DE
bee CA-CO-MX-MX

how can I unique the values in col 2, which means have a table like this?

col1 col2
ben US-Uk
Man Uk-NL-DE
bee CA-CO-MX

I have tried this :

a.cc.str.split('-').unique()

but get the following error:

TypeError: unhashable type: 'list'

Does anybody know how to do this?

3 Answers 3

2

You can use apply to call a lambda function that splits the string and then joins on the unique values:

In [10]:

df['col2'] = df['col2'].apply(lambda x: '-'.join(set(x.split('-'))))
df
Out[10]:
  col1      col2
0  ben     Uk-US
1  Man  Uk-NL-DE
2  bee  CA-CO-MX

Another method:

In [22]:

df['col2'].str.split('-').apply(lambda x: '-'.join(set(x)))

Out[22]:
0       Uk-US
1    Uk-NL-DE
2    CA-CO-MX
Name: col2, dtype: object

timings

In [24]:

%timeit df['col2'].str.split('-').apply(lambda x: '-'.join(set(x)))
%timeit df['col2'] = df['col2'].apply(lambda x: '-'.join(set(x.split('-'))))
1000 loops, best of 3: 418 µs per loop
1000 loops, best of 3: 246 µs per loop
Sign up to request clarification or add additional context in comments.

Comments

2

I like @EdChum's answer. But reordering the values is disconcerting. It can make both human visual inspections and mechanical comparisons more difficult.

Unfortunately, Python doesn't have an ordered set, which would be the perfect tool here. So:

def unique(items):
    """
    Return unique items in a list, in the same order they were
    originally.
    """
    seen = set()
    result = []
    for item in items:
        if item not in seen:
            result.append(item)
            seen.add(item)
    return result

df.col2 = df.col2.apply(lambda x: '-'.join(unique(x.split('-'))))

An alternative way of creating an ordered set is with OrderedDict:

from collections import OrderedDict

def u2(items):
    od = OrderedDict.fromkeys(items)
    return list(od.keys())

You can then use u2 instead of unique. Either way, the results are:

  col1      col2
0  ben     US-Uk
1  Man  Uk-NL-DE
2  bee  CA-CO-MX

Comments

1

Try this

col2 = 'CA-CO-MX-MX'
print '-'.join(set(col2.split('-')))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.