1

suppose I have a csv file like this:

Name: Jack
Place: Binghampton
Age:27
Month,Sales,Revenue
Jan,51,$1000
Feb,20,$1050
Mar,100,$10000
### Blank File Space
### Blank File Space
Name: Jill
Place: Hamptonshire
Age: 49
Month,Sales,Revenue
Apr,11,$1000
May,55,$3000
Jun,23,$4600
### Blank File Space
### Blank File Space
...

And the contents of the file are evenly spaced as shown. I want to read each Month,Sales,Revenue portion in as its own df. I know I can do this manually by doing:

df_Jack = pd.read_csv('./sales.csv', skiprows=3, nrows=3)
df_Jill = pd.read_csv('./sales.csv', skiprows=12, nrows=3)

I'm not even super worried about the names of the df as I think I could do that on my own, I just don't really know how to iterate through the evenly spaced file to find sales records and store them as unique dfs.

Thanks for any help in advance!

2 Answers 2

3

Obviously you could do this:

dfs = [pd.read_csv('./sales.csv', skiprows=i, nrows=3) for i in range(3, n, 9)]
# where n is your expected end line...

But another way is to read the csv yourself and pass the data back to pandas:

with open('./sales.csv', 'r') as file:
    streaming = True
    while streaming:
        name = file.readline().rstrip().replace('Name: ','')
        for _ in range(2): file.readline()
        headers = file.readline().rstrip().split(',')
        data = [file.readline().rstrip().split(',') for _ in range(3)]
        dfs[name] = pd.DataFrame.from_records(data, columns=headers)
        for _ in range(2):
            streaming = file.readline()

I'll concede it's quite brutish and inelegant compared to the other answer... but it works. And it actually gives you the DataFrame by name within a dictionary:

>>> dfs['Jack']

  Month Sales Revenue
0   Jan    51   $1000
1   Feb    20   $1050
2   Mar   100  $10000
>>> dfs['Jill']

  Month Sales Revenue
0   Apr    11   $1000
1   May    55   $3000
2   Jun    23   $4600
Sign up to request clarification or add additional context in comments.

Comments

2

How about create a list of dfs?

from io import StringIO

csvfile = StringIO("""Name: Jack
Place: Binghampton
Age:27
Month,Sales,Revenue
Jan,51,$1000
Feb,20,$1050
Mar,100,$10000
### Blank File Space
### Blank File Space
Name: Jill
Place: Hamptonshire
Age: 49
Month,Sales,Revenue
Apr,11,$1000
May,55,$3000
Jun,23,$4600
### Blank File Space
### Blank File Space""")

df = pd.read_csv(csvfile, sep=',', error_bad_lines=False, names=['Month','Sales','Revenue'])

df1 = df.dropna().loc[df.Month!='Month']

listofdf = [df1[i:i+3] for i in range(0,df1.shape[0],3)]

print(listofdf[0])

Output:

  Month Sales Revenue
4   Jan    51   $1000
5   Feb    20   $1050
6   Mar   100  $10000

print(listofdf[1])

Output:

   Month Sales Revenue
13   Apr    11   $1000
14   May    55   $3000
15   Jun    23   $4600

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.