0

minor problem doing my head in. I have a dataframe similar to the following:

Number      Title
12345678    A
34567890-S  B
11111111    C
22222222-L  D

This is read from an excel file using pandas in python, then the index set to the first column:

db = db.set_index(['Number'])

I then lookup Title based on Number:

lookup = "12345678"
title = str(db.loc[lookup, 'Title'])

However... Whilst anything postfixed with "-Something" works, anything without it doesn't find a location (eg. 12345678 will not find anything, 34567890-S will). My only hunch is it's to do with looking up as either strings or ints, but I've tried a few things (converting the table to all strings, changing loc to iloc,ix,etc) but so far no luck.

Any ideas? Thanks :)

UPDATE: So trying this from scratch doesn't exhibit the same behaviour (creating a test db presumably just sets everything as strings), however importing from CSV is resulting in the above, and...

Searching "12345678" (as a string) doesn't find it, but 12345678 as an int will. Likewise the opposite for the others. So the dataframe is only matching the pure numbers in the index with ints, but anything else with strings.

Also, I can't not search for the postfix, as I have multiple rows with differing postfix eg 34567890-S, 34567890-L, 34567890-X.

2
  • 1
    Are you sure you don't have blanks after the "numbers"? Commented Apr 17, 2019 at 11:42
  • 1
    should work if you dont have spaces as @vercelli mentioned. '12345678' will work but '12345678 ' wont. You can strip the index df.index=df.index.str.strip() and then loc[] Commented Apr 17, 2019 at 11:49

2 Answers 2

4

If you want to cast all entries to one particular type, you can use pandas.Series.astype:

db["Number"] = df["Number"].astype(str)
db = db.set_index(['Number'])

lookup = "12345678"
title = db.loc[lookup, 'Title']

Interestingly this is actually slower than using pandas.Index.map:

x1 = [pd.Series(np.arange(n)) for n in np.logspace(1, 4, dtype=int)]
x2 = [pd.Index(np.arange(n)) for n in np.logspace(1, 4, dtype=int)]

def series_astype(x1):
    return x1.astype(str)

def index_map(x2):
    return x2.map(str)

enter image description here

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, that works. This also worked (after setting the index): db.index = db.index.map(str)
@pepsi_max2k: Added a timing comparison between pandas.Series.astype and pandas.Index.map.
0

Consider all the indeces as strings, as at least some of them are not numbers. If you want to lookup a specific item that possibly could have a postfix, you could match it by comparing the start of the strings with .str.startswith:

lookup = db.index.str.startswith("34567890")
title = db.loc[lookup, "Title"]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.