0

I am very new to python and it's probably a simple question, but I cannot seem to find a solution.

I have several pandas data frames with names going like: output_1, output_2, ..., output_n

I want to sum their lengths (as in the number of their rows) and I came up with something like this:

sum =0
for num in range(1,n):
    nameframe="output_"+str(num)
    sum+=nameframe.shape[0]

The problem is that Python sees nameframe as a string, not as the name of a dataframe.

Looking around I found a potential solution:

sum =0
for num in range(1,n):
    x = globals()["output_urls_%s" % num] 
    sum+=x.shape[0]

This seems to work, however the usage of globals() seem to be very discouraged. Therefore, what is the most pythonic way to achieve my purpose?

3
  • 1
    You're in a messed up situation that requires a lot of work because you didn't stick with standards to begin with. Instead of manually creating all your data frames and giving them df_id like names, create them in a loop and stick them into a list. Then you could iterate over that list. Commented Aug 25, 2014 at 13:05
  • you can use nameframe = eval("output_"+str(num)) in the loop, but I agree with @FooBar you should be storing these as a list upon creation, or if you want to keep names use a dictionary. Commented Aug 25, 2014 at 14:08
  • OK, thank you. I indeed inserted the data frames in a list. It's much more tidy and easy to access. Commented Aug 26, 2014 at 9:22

1 Answer 1

3

The most pythonic way would probably be to store your dataframes in a list. E.g.,

dfs = [output_1, output_2, ...]
df_length = sum(x.shape[0] for x in dfs)

Alternatively, you could look at storing your data in a combined pandas data structure, assuming they are all related in some way. E.g., if each dataframe is a different group, you could set a MultiIndex on the combined frame, like

df = pd.concat([output_1, output_2, ...], keys=['group_a', 'group_b', ..]) 

Then you could just take the length of the combined frame.

Sign up to request clarification or add additional context in comments.

2 Comments

Your second line can be simplified to df_length = sum(len(x) for x in dfs)
Thanks, as suggested throughout the comments, storing the data frames in a list is the best solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.