1

After several weeks of refining this I have the following code thanks to awesome folks on SO which produces dataframes as I need but I'm not sure how to concat the dataframes in the program into one for the final dataframe object variable. I just assign the concat statement to a variable then I end up getting the last dataframe only.

{
"zipcode":"08989",
"current" {"canwc":null,"cig":4900,"class":"observation","clds":"OVC","day_ind":"D","dewpt":19,"expireTimeGMT":1385486700,"feels_like":34,"gust":null,"hi":37,"humidex":null,"icon_code":26,"icon_extd":2600,"max_temp":37,"wxMan":"wx1111"},
"triggers":[53,31,9,21,48,7,40,178,55,179,176,26,103,175,33,51,20,57,112,30,50,113]
}
{
"zipcode":"08990",
"current":{"canwc":null,"cig":4900,"class":"observation","clds":"OVC","day_ind":"D","dewpt":19,"expireTimeGMT":1385486700,"feels_like":34,"gust":null,"hi":37,"humidex":null,"icon_code":26,"icon_extd":2600,"max_temp":37, "wxMan":"wx1111"},
"triggers":[53,31,9,21,48,7,40,178,55,179,176,26,103,175,33,51,20,57,112,30,50,113]
}

def lines_per_n(f, n):
    for line in f:
        yield ''.join(chain([line], itertools.islice(f, n - 1)))

def series_chunk(chunk):
    try:
        jfile = json.loads(chunk)
        zipcode = jfile['zipcode']
        datetime = jfile['current']['proc_time']
        triggers = jfile['triggers']
        return pd.Series([jfile['zipcode'], jfile['current']['proc_time'],\
                            jfile['triggers']])
    except ValueError, e:
        pass
    else:
        pass

for fin in glob.glob('*.txt'):
    with open(fin) as f:
        print pd.concat([series_chunk(chunk) for chunk in lines_per_n(f, 5)], axis=1).T

output from above program which I need to concat as one dataframe:

       0               1                                                  2
0  08988  20131126102946                                                 []
1  08989  20131126102946  [53, 31, 9, 21, 48, 7, 40, 178, 55, 179, 176, ...
       0               1                                                  2
0  08988  20131126102946                                                 []
1  08989  20131126102946  [53, 31, 9, 21, 48, 7, 40, 178, 55, 179, 176, ...

Finally wrestled this into submission. Here is final code that does what I need:

dfs = []
for fin in glob.glob('*.txt'):
    with open(fin) as f:
        df = pd.concat([series_chunk(chunk) for\
            chunk in lines_per_n(f, 7)], axis=1).T
        dfs.append(df)

df = pd.concat(dfs, ignore_index=True)
6
  • 1
    see here: pandas.pydata.org/pandas-docs/dev/…; just append the df's to a list, them to a list, then concat at the end, e.g. result = pd.concat([list_of_frames]) Commented Dec 19, 2013 at 13:58
  • you might be able to do some of this directly via: pandas.pydata.org/pandas-docs/dev/io.html#json (their is also a Normalization section available in 0.13 for nested json) Commented Dec 19, 2013 at 14:22
  • @Jeff I tried doing this and got ValueError: Mixing dicts with non-Series may lead to ambiguous ordering., :S Commented Dec 20, 2013 at 1:34
  • @AndyHayden never used the normalization myself... Commented Dec 20, 2013 at 2:07
  • @Jeff I'd not seen it was implemented! Think there are some codes I can make less messy. Syntax looks magical. Commented Dec 20, 2013 at 2:14

1 Answer 1

1

Glad you got this sorted. IMO a slightly cleaner way to do this as a list comprehension

def dataframe_from_file(fin):
    with open(fin) as f:
        return pd.concat([series_chunk(chunk) for chunk in lines_per_n(f, 7)],
                            axis=1).T

df = pd.concat([dataframe_from_file(fin) for fin in glob.glob('*.txt')],
                  ignore_index=True)

Note: it could be that using axis=1 to the final concat means you can avoid T-ing earlier.

Sign up to request clarification or add additional context in comments.

1 Comment

definitely cleaner. Thank you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.