2

I have an HTML body with 4 divs with text inside the divs. I use Scrapy Selectors to remove the text and write it to csv. However, if the div has no text the selector skips it. This is bad because the result needs to match up to each column in the csv. I need empty divs to return empty strings.

Desired result is:

blah,blah,,blah

Because of this requirement, this does not work:

csvfile.writerow(Selector(text=Z).xpath('//div/text()').extract())

giving:

blah,blah,blah

where Z is the html body.

Current code is:

for sl in Selector(text=Z).xpath('//div'):
    g = sl.xpath('./text()').extract()
    jl.append(g)

csvfile.writerow(sum(jl,[]))

This almost works but I get a list of lists returned:

[u'blah'],[u'blah'],[],[u'blah']

instead of what's desired:

blah,blah,,blah

If I attempt to flatten the list:

csvfile.writerow(sum(jl,[]))

I'm back where I started, the empty strings are removed from the list.

blah,blah,blah

1 Answer 1

2

That list of lists should be enough, with one more step:

>>> e  = [u'blah'],[u'blah'],[],[u'blah']
>>> [i[0] if i else '' for i in e]
['blah', 'blah', '', 'blah']

If you need all these elements in a single string:

>>> ','.join(i[0] if i else '' for i in e)
'blah,blah,,blah'

csv.writerow() needs a list, so I'm not sure you really want a string here, but here are both options.

Sign up to request clarification or add additional context in comments.

3 Comments

Just put a ','.join(...) around that comprehension and you're good.
csv.writerow() takes a list... it will flatten it based on the appropriate dialect. Though you could use ''.join(i) inside the comprehension instead of the ternary operator.
final code was: x = [i[0] if i else '' for i in jl] csvfile.writerow(x). I have no idea what it's doing but if it works it works. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.