- I am new to pandas/python. Have used excel and stata pretty extensively.
- I get a .csv file with multiple tables in it from a supplier that will not change their format.
- The tables have headers and a blank row in between them.
- The number of rows in each table can vary
- The number of tables also seems to vary (i just discovered!)
- There are 23 possible tables that can come in the file
- I have managed to create one big data frame from the file
- I can't seem to group by the index=0
Here is the code i have so far:
%matplotlib inline
import csv
from pandas import Series, DataFrame
import pandas as pd # if len(row) == 0,new_table_coming_up = 1if len(row) > 0,if new_table_coming_up == 0
import numpy as np
import matplotlib.pyplot as plt
import io
df = pd.read_csv(r'C:\Users\file.csv',names=range(25))
table_names = ["WAREHOUSE","SUPPLIER","PRODUCT","BRAND","INVENTORY","CUSTOMER","CONTACT","CHAIN","ROUTE","INVOICE","INVOICETRANS","SURVEY","FORECAST","PURCHASE","PURCHASETRANS","PRICINGMARKET","PRICINGMARKETCUSTOMER","PRICINGLINE","PRICINGLINEPRODUCT","EMPLOYEE"]
groups = df[0].isin(table_names).cumsum()
tables = {g.iloc[0,1]: g.iloc[0] for k,g in df.groupby(groups)}
here is a sample of the .csv file with the first 3 tables:
Record Identifier Sender ID Receiver ID Action Warehouse ID Warehouse Name System Close Date DBA Address Address 2 City State Postal Code Phone Fax Primary Contact Email FEIN DUNS GLN
WAREHOUSE COX SUPPLIERX Change 1 Richmond 20160127 Company 700 Court Anywhere CA 99999 5555555555 5555555555 na na 0 50682020
Record Identifier Sender ID Receiver ID Sender Supplier ID Supplier Name Supplier Family
SUPPLIER COX SUPPLIERX 16 SUPPLIERX SUPPLIERX
Record Identifier Sender ID Receiver ID Supplier Product Number Sender Product ID Product Name Sender Brand ID Active Cases Per Pallet Cases Per Layer Case GTIN Carrier GTIN Unit GTIN Package Name Case Weight Case Height Case Width Case Length Case Ounces Case Equivalents Retail Units Per Case Consumable Units Per Case Selling Unit Of Measure Container Material
PRODUCT COX SUPPLIERX 53030 LAG DOGTOWN PALE ALE 4/6/12OZ NR 217 Active 70 10 7.2383E+11 7.2383E+11 7.2383E+11 4/6/12oz NR 31.9 9.5 10.75 15.5 288 1 4 24 Case Aluminum
PRODUCT COX SUPPLIERX 53071 LAG DOGTOWN PALE ALE 1/2 KEG 217 Active 8 8 0 KEG-1/2 BBL 160.6 23.5 15.75 15.75 1984 6.888889 1 1 Each Aluminum
PRODUCT COX SUPPLIERX 2100008003 53122 LAG CAPPUCCINO STOUT 12/22OZ NR 221 Active 75 15 7.2383E+11 7.2383E+11 7.2383E+11 12/22oz NR 33.6 9.5 10.75 14.2083 264 0.916667 12 12 Case Aluminum
PRODUCT COX SUPPLIERX 53130 LAG SUCKS ALE 4/6/12OZ NR 1473 Active 70 10 7.23831E+11 7.2383E+11 7.2383E+11 4/6/12oz NR 31.9 9.5 10.75 15.5 288 1 4 24 Case Aluminum
PRODUCT COX SUPPLIERX 53132 LAG SUCKS ALE 12/32oz NR 1473 Active 50 10 7.23831E+11 7.2383E+11 7.2383E+11 12/32oz NR 38.2 9.5 10.75 20.6667 384 1.333333 12 12 Case Aluminum
PRODUCT COX SUPPLIERX 53170 LAG SUCKS ALE 1/4 KEG 1473 Inactive 1 1 0 1.11111E+11 KEG-1/4 BBL 87.2 11.75 17 17 992 3.444444 1 1 Each Aluminum
PRODUCT COX SUPPLIERX 53171 LAG FARMHOUSE SAISON 1/2 KEG 1478 Inactive 16 1 0 KEG-1/2 BBL 160.6 23.5 15.75 15.75 1984 6.888889 1 1 Each Aluminum
PRODUCT COX SUPPLIERX 53172 LAG SUCKS ALE 1/2 KEG 1473 Active 80 4 0 KEG-1/2 BBL 160.6 23.5 15.75 15.75 1984 6.888889 1 1 Each Aluminum
PRODUCT COX SUPPLIERX 53255 LAG FARMHOUSE HOP STOOPID ALE 12/22 222 Active 75 15 7.23831E+11 7.2383E+11 7.2383E+11 12/22oz NR 33.6 9.5 10.75 14.2083 264 0.916667 12 12 Case Aluminum
PRODUCT COX SUPPLIERX 53271 LAG FARMHOUSE HOP STOOPID 1/2 KEG 222 Active 8 8 0 KEG-1/2 BBL 160.6 23.5 15.75 15.75 1984 6.888889 1 1 Each Aluminum
PRODUCT COX SUPPLIERX 53330 LAG CENSORED ALE 4/6/12OZ NR 218 Active 70 10 7.23831E+11 7.2383E+11 7.2383E+11 4/6/12oz NR 31.9 9.5 10.75 15.5 288 1 4 24 Case Aluminum
PRODUCT COX SUPPLIERX 53331 LAG CENSORED ALE 2/12/12 OZ NR 218 Inactive 60 1 7.2383E+11 7.2383E+11 7.2383E+11 2/12/12oz NR 31.9 9.5 10.75 15.5 288 1 2 24 Case Aluminum
PRODUCT COX SUPPLIERX 53333 LAG CENSORED ALE 24/12 OZ NR 218 Inactive 70 1 7.2383E+11 24/12oz NR 31.9 9.5 10.75 15.5 288 1 1 24 Case Aluminum