0

Having a bit of an issue trying to figure out how to save the output of my python script as a CSV. When I run this script, the file does not appear in the location that i need in order to access it. Any suggestions?

import pandas as pd
import os

folder_path = os.path.join("T:", "04. Testing","3. Wear Testing","TESTS","CKUW","180604 OP STRAPLESS","Survey Response Data")
mapping_path = os.path.join(folder_path + r'\Survey_MappingTable Strapless.xlsx')

# Read mapping table
mapping = pd.ExcelFile(mapping_path)
mapping.sheet_names
# ['SurveyInfo', 'Question Mapping', 'Answer Mapping']
# Transform sheets to 3 tables (surveyinfo, Q_mapping, A_mapping)
surveyinfo = mapping.parse("SurveyInfo")
Q_mapping = mapping.parse("Question Mapping", skiprows = 2)
A_mapping = mapping.parse("Answer Mapping", skiprows = 3)

# Get input file name and read the data. Table name is df.
input_file_name = surveyinfo.loc[surveyinfo['Parameter Name']=='Input File Name','Value'].to_string(index=False)

path = os.path.join(r'T:\04. Testing\3. Wear Testing\TESTS\CKUW\180604 OP STRAPLESS\Survey Response Data',input_file_name)
df = pd.read_csv(path,header=None,engine='python')
# ,encoding='utf-8'  Tried this as a way to fix but it didn't work
# Fill in previous colunmn names if blank, using the preceeding header
df.iloc[0] = df.iloc[0].fillna(method='ffill')

# Read the count of columns
n_col = len(df.iloc[0])
n_respondent = len(df)-2
c_name = []
for i in range(n_col):
# Multiple columns; each columns with differnt single answer. and the question text is to combine the category ex. support, comfort, are both in the satisfaction category etc.
# If it's satisfaction question, concatenate first row and second row
    if "satisfaction" in df.iloc[0][i]: 
        c_name.append(df.iloc[0][i]+df.iloc[1][i])
    elif "functionality" in df.iloc[0][i]:
        c_name.append(df.iloc[0][i]+df.iloc[1][i])
    elif ("shape" in df.iloc[0][i]) and ("please specify" in df.iloc[1][i]):
        c_name.append(df.iloc[0][i]+df.iloc[1][i])
    elif ("room in the cup" in df.iloc[0][i]) and ("please specify" in df.iloc[1][i]):
        c_name.append(df.iloc[0][i]+df.iloc[1][i])       
# - in the column header which is part of the question and part of the response
    elif ("wire" in df.iloc[0][i]) and ("Response" not in df.iloc[1][i]):
        if "-" in df.iloc[1][i]:
            c_name.append(df.iloc[0][i]+df.iloc[1][i][df.iloc[1][i].find("-")+2:])
        else:
            c_name.append(df.iloc[0][i]+df.iloc[1][i])
        for j in range(n_respondent):
            if pd.notnull(df.iloc[j+2,i]) and "please specify" not in df.iloc[1,i]:
                df.iloc[j+2,i] = df.iloc[1,i][:df.iloc[1][i].find("-")-1]               
# Multiple columns; each columns with differnt single answer. and the question text is not to combine the category.
# Use to combine band and cup size
    elif "size bra do you typically wear?" in df.iloc[0][i]:
        c_name.append(df.iloc[0][i])
        for j in range(n_respondent):
            if pd.notnull(df.iloc[j+2,i]):
                df.iloc[j+2,i] = df.iloc[1,i] + df.iloc[j+2,i]
# Single answer to the question; or multiple answers to the question but the answer is the same as the column header
    else:
        c_name.append(df.iloc[0][i])

# Make the column names as the first row
df.columns = c_name

# Drop the first and second rows
df2 = df.drop(df.index[[0,1]])

# Transform the wide dataset to a long dataset; 
r = list(range(10))+list(range(17,20))  # skipping "What size bra do you typically wear? (only select one size)"
df_long = pd.melt(df2,id_vars = list(df.columns[r]), var_name = 'Question', value_name = 'Answer')

# Delete rows with null value to answer
df_long_notnull = df_long[pd.notnull(df_long['Answer'])]

# Make typically wear as a column dimension
sizewear = df_long_notnull.loc[df_long_notnull['Question'] == 'What size bra do you typically wear? (Only select one size)']
sizewear2 = sizewear[['Respondent ID','Collector ID','Email Address','Answer']]
sizewear2.columns = ['Respondent ID','Collector ID','Email Address','What size bra do you typically wear?']
df_long_notnull2 = df_long_notnull[df_long_notnull['Question'] != 'What size bra do you typically wear? (Only select one size)']
df_final = pd.merge(df_long_notnull2, sizewear2, how='left', on=['Respondent ID','Collector ID','Email Address'])

# Join Answer description mapping table
df_full = pd.merge(df_final, A_mapping, how='left', left_on = ['Question','Answer'], right_on = ['Question','Answer Description'])
df_full.loc[df_full['Answer_y'].isnull(),'Answer_y'] = df_full['Answer_x']
df_full.loc[df_full['Answer Description'].isnull(),'Answer Description'] = df_full['Answer_x']
df_full = df_full.drop(labels = ['Answer_x'], axis=1)
df_full = df_full.rename(columns = {'Answer_y':'Answer','Answer Description':'Answer Desc'})

# Join Question Mapping table
df_full = pd.merge(df_full,Q_mapping, how='left', left_on = ['Question'], right_on = ['Raw Column Name'])
df_full = df_full.drop(labels = ['Raw Column Name'], axis=1)

# Get Survey Info
product_name = surveyinfo.loc[surveyinfo['Parameter Name']=='Product Name','Value'].to_string(index=False)

if "," in surveyinfo.loc[surveyinfo['Parameter Name']=='Style Number','Value'].item():
    style_number = surveyinfo.loc[surveyinfo['Parameter Name']=='Style Number','Value'].to_string(index=False).split(',')
    style_number = [s.strip() for s in style_number]
else:
    style_number = surveyinfo.loc[surveyinfo['Parameter Name']=='Style Number','Value'].to_string(index=False)

if "," in surveyinfo.loc[surveyinfo['Parameter Name']=='Style Name','Value'].item():
    style_name = surveyinfo.loc[surveyinfo['Parameter Name']=='Style Name','Value'].to_string(index=False).split(',')
    style_name = [s.strip() for s in style_name]
else: 
    style_name = surveyinfo.loc[surveyinfo['Parameter Name']=='Style Name','Value'].to_string(index=False)

# get survey information
survey_name = surveyinfo.loc[surveyinfo['Parameter Name']=='Survey Name','Value'].to_string(index=False)
survey_id = surveyinfo.loc[surveyinfo['Parameter Name']=='Survey ID','Value'].item()
survey_year = surveyinfo.loc[surveyinfo['Parameter Name']=='Survey Year','Value'].item()
survey_mo = surveyinfo.loc[surveyinfo['Parameter Name']=='Survey Month','Value'].item()
output_file_name = surveyinfo.loc[surveyinfo['Parameter Name']=='Output File Name','Value'].to_string(index=False)

# adding columns for survey information
df_full['Product Name'] = product_name
df_full['Survey Name'] = survey_name
df_full['Survey ID'] = survey_id
df_full['Survey Year'] = survey_year
df_full['Survey Month'] = survey_mo

### create a table with style_number and style_name
if type(style_name) == list: 
    style_t = pd.DataFrame(list(zip(style_name, style_number)), columns = list(["Style_Name","Style_Number"]))
    df_full = pd.merge(df_full, style_t, how='left', left_on = ['Which style did you receive?'], right_on = ['Style_Name'])
else: 
    df_full['Style Name'] = style_name
    df_full['Style Number'] = style_number


# Identify the path for saving output file
path_out = os.path.join("C:","Users","Sali3",output_file_name)

# Save as comma separated csv file 
df_full.to_csv(path_out, sep=',', index = False)

The last portion of this script here is where i am having a problem. The path_out should be on my local "C" drive as a CSV file. Please help.

6
  • Whats the content of output_file_name? Commented Jun 29, 2018 at 12:57
  • Should be right under the "get survey information" as: output_file_name = surveyinfo.loc[surveyinfo['Parameter Name']=='Output File Name','Value'].to_string(index=False) Commented Jun 29, 2018 at 12:59
  • Should just be a file with the name of the survey underscore final. But can't figure out how to get to that point. Commented Jun 29, 2018 at 13:03
  • Can you print path_out? Commented Jun 29, 2018 at 13:06
  • I tried printing the "path_out" but gives me an error of: FileNotFoundError: [Errno 2] No such file or directory: 'C:Users\\Sali3\\Strapless Bra Survey_clean_final.csv' Commented Jun 29, 2018 at 13:10

1 Answer 1

1

Assuming you are on Windows, the documentation on os.path.join says:

On Windows, the drive letter is not reset when an absolute path component (e.g., r'\foo') is encountered. If a component contains a drive letter, all previous components are thrown away and the drive letter is reset. Note that since there is a current directory for each drive, os.path.join("c:", "foo") represents a path relative to the current directory on drive C: (c:foo), not c:\foo.

This should fix your problem:

path_out = os.path.join("C:\\","Users","Sali3",output_file_name)
Sign up to request clarification or add additional context in comments.

2 Comments

Your a life saver! Thank you for all your help.
Upvote and mark as answered if this answers your question

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.