0

I am extracting data from the Google Adwords Reporting API via Python. I can successfully pull the data and then hold it in a variable data.

data = get_report_data_from_google()

type(data)
str

Here is a sample:

data = 'ID,Labels,Date,Year\n3179799191,"[""SKWS"",""Exact""]",2016-05-16,2016\n3179461237,"[""SKWS"",""Broad""]",2016-05-16,2016\n3282565342,"[""SKWS"",""Broad""]",2016-05-16,2016\n'

I need to process this data more, and ultimately output a processed flat file (Google Adwords API can return a CSV, but I need to pre-process the data before loading it into a database.).

If I try to turn data into a csv object, and try to print each line, I get one character per line like:

c = csv.reader(data, delimiter=',')

for i in c:
    print(i)

    ['I']
    ['D']
    ['', '']
    ['L']
    ['a']
    ['b']
    ['e']
    ['l']
    ['s']
    ['', '']
    ['D']
    ['a']
    ['t']
    ['e']

So, my idea was to process each column of each line into a list, then add that to a csv object. Trying that:

for line in data.splitlines():
    print(line)

3179799191,"[""SKWS"",""Exact""]",2016-05-16,2016

What I actually find is that inside of the str, there is a list: "[""SKWS"",""Exact""]"

This value is a "label" documentation

This list is formatted a bit weird - it has numerous parentheses in the value, so trying to use a quote char, like ", will return something like this: [ SKWS Exact ]. If I could get to [""SKWS"",""Exact""], that would be acceptable.

Is there a good way to extract a list object within a str? Is there a better way to process and output this data to a csv?

2
  • Generally webservices return JSON or XML for this exact reason because those formats can easily be converted to a Python dictionary. Have you tried parsing the API response as JSON? Do you need help with that? Commented May 18, 2016 at 21:33
  • Can you show how or where from exactly you get that data? Commented May 18, 2016 at 21:47

2 Answers 2

2

You need to split the string first. csv.reader expects something that provides a single line on each iteration, like a standard file object does. If you have a string with newlines in it, split it on the newline character with splitlines():

>>> import csv
>>> data = 'ID,Labels,Date,Year\n3179799191,"[""SKWS"",""Exact""]",2016-05-16,2016\n3179461237,"[""SKWS"",""Broad""]",2016-05-16,2016\n3282565342,"[""SKWS"",""Broad""]",2016-05-16,2016\n'
>>> c = csv.reader(data.splitlines(), delimiter=',')
>>> for line in c:
...     print(line)
...
['ID', 'Labels', 'Date', 'Year']
['3179799191', '["SKWS","Exact"]', '2016-05-16', '2016']
['3179461237', '["SKWS","Broad"]', '2016-05-16', '2016']
['3282565342', '["SKWS","Broad"]', '2016-05-16', '2016']
Sign up to request clarification or add additional context in comments.

1 Comment

And to add to this, it looks like you should then have labels = json.loads(line[1]).
0

This has to do with how csv.reader works.

According to the documentation:

csvfile can be any object which supports the iterator protocol and returns a string each time its next() method is called

The issue here is that if you pass a string, it supports the iterator protocol, and returns a single character for each call to next. The csv reader will then consider each character as a line.

You need to provide a list of line, one for each line of your csv. For example:

c = csv.reader(data.split(), delimiter=',')
for i in c:
    print i

# ['ID', 'Labels', 'Date', 'Year']
# ['3179799191', '["SKWS","Exact"]', '2016-05-16', '2016']
# ['3179461237', '["SKWS","Broad"]', '2016-05-16', '2016']
# ['3282565342', '["SKWS","Broad"]', '2016-05-16', '2016']

Now, your list looks like a JSON list. You can use the json module to read it.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.