3

I'm working on a web scraping project, and have all the right code that returns me the json data in the format that I want if I used the #print command below, but when I got to run the same code except through Pandas Dataframe it only returns the first row of Data that I'm looking for. Just running the print, it returns the expected 17 rows of data I'm looking for. Dataframe to CSV gives me the first row only. Totally stumped! So grateful for anyone's help!

for item in response['body']:
    DepartureDate = item['legs'][0][0]['departDate']
    ReturnDate = item['legs'][1][0]['departDate']
    Airline = item['legs'][0][0]['airline']['code']
    Origin = item['legs'][0][0]['depart']
    Destination = item['legs'][0][0]['destination']
    OD = (Origin + Destination)
    TrueBaseFare = item['breakdown']['baseFareAmount']
    YQYR = item['breakdown']['fuelSurcharge']
    TAX = item['breakdown']['totalTax']
    TTL = item['breakdown']['totalFareAmount']
    MARKEDUPTTL = item['breakdown']['totalCalculatedFareAmount']
    MARKUP = ((MARKEDUPTTL - TTL) / (TTL)*100)
    FBC = item['fareBasisCode']

    #print(DepartureDate,ReturnDate,Airline,OD,TrueBaseFare,YQYR,TAX,TTL,MARKEDUPTTL,MARKUP,FBC)

MI = pd.DataFrame(
     {'Dept': [DepartureDate],
     'Ret': [ReturnDate],
     'AirlineCode': [Airline],
     'Routing': [OD],
     'RealFare': [TrueBaseFare],
     'Fuel': [YQYR],
     'Taxes': [TAX],
     'RealTotal': [TTL],
     'AgencyTotal': [MARKEDUPTTL],
     'Margin': [MARKUP],
     'FareBasis': [FBC],
    })

df = pd.DataFrame(MI)

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

df.to_csv('MITest7.csv')

1 Answer 1

3

When you print all your values after the cycle, you will see that you get only the last values. To resolve this problem you need to create lists and put there your values. Try this:

DepartureDate = []
ReturnDate = []
Airline = []
Origin = []
Destination = []
OD = []
TrueBaseFare = []
YQYR = []
TAX = []
TTL = []
MARKEDUPTTL = []
MARKUP = []
FBC = []

for item in response['body']:
    DepartureDate.append(item['legs'][0][0]['departDate'])
    ReturnDate.append(item['legs'][1][0]['departDate'])
    Airline.append(item['legs'][0][0]['airline']['code'])
    Origin.append(item['legs'][0][0]['depart'])
    Destination.append(item['legs'][0][0]['destination'])
    OD.append((Origin[-1] + Destination[-1]))
    TrueBaseFare.append(item['breakdown']['baseFareAmount'])
    YQYR.append(item['breakdown']['fuelSurcharge'])
    TAX.append(item['breakdown']['totalTax'])
    TTL.append(item['breakdown']['totalFareAmount'])
    MARKEDUPTTL.append(item['breakdown']['totalCalculatedFareAmount'])
    MARKUP.append(((MARKEDUPTTL[-1] - TTL[-1]) / (TTL[-1])*100))
    FBC.append(item['fareBasisCode'])
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you!! That worked to have the .csv file have all the expected data, but oddly it is putting all the returns into the same row. It should be displaying 11 columns and 17 rows of data. Right now it is showing 11 columns with 1 row with each cell including all 17 rows. Any thoughts?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.