0

I am using the following code to scrape content from a webpage with the end goal of writing to a CSV. On the first iteration I had this portion working, but now that my data is formatted differently it writes the data in a way that gets mangled when I try to view it in excel.

If I use the code below the "heading.text" data is correctly put into one cell when viewed in excel. Where as the contents of "child.text" is packed into one cell rather then being split based on the commas. You will see I have attempted to clean up the content of "child.text" in an effort to see if that was my issue.

If I remove "heading.text" from "z" and try again, it writes in a way that has excel showing one letter per cell. In the end I would like each value that is seperated by commas to display in one cell when viewed in excel, I believe I am doing something (many things?) incorrectly in structuring "z" and or when I write the row.

Any guidance would be greatly appreciated. Thank you.

    csvwriter = csv.writer(csvfile) 
    for heading in All_Heading:
        driver.execute_script("return arguments[0].scrollIntoView(true);", heading)
        print("------------- " + heading.text + " -------------")
        ChildElement = heading.find_elements_by_xpath("./../div/div")
        for child in ChildElement:
            driver.execute_script("return arguments[0].scrollIntoView(true);", child)
            #print(heading.text)
            #print(child.text)
            z = (heading.text, child.text)
            print (z)
            csvwriter.writerow(z)

When I print "z" I get the following:

('Flower', 'Afghani 3.5g Pre-Pack Details\nGREEN GOLD ORGANICS\nAfghani 3.5g Pre-Pack\nIndica\nTHC: 16.2%\n1/8 oz  -  \n$45.00')

When I print "z" with the older code that split the string on "\n" I get the following:

('Flower', "Cherry Limeade 3.5g Flower - BeWell Details', 'BE WELL', 'Cherry Limeade 3.5g Flower - BeWell', 'Hybrid', 'THC: 18.7 mg', '1/8 oz  -  ', '$56.67")
2
  • Why are you splitting child.text in lines? Is it possible to provide an example text for it? Commented May 5, 2021 at 2:20
  • 1
    I was testing to see if the \n was at fault, I will update the code and add what its printing as an example. Commented May 5, 2021 at 2:25

1 Answer 1

3

csv.writerow() takes an iterable, each element of which is separated by the writer's delimiter i.e. made a different cell.

First let’s see what’s been happening with you till now:

  1. (heading.text, child.text) has two elements i.e. two cells, heading.text and child.text
  2. (child.text) is simply child.text (would be a tuple if it was (child.text**,**)) and a string's elements are each letter. Hence each letter made its own cell.

To get different cells in a row we need separate elements in our iterable so we want an iterable like [header.text, child.text line 1, child.text line 2, ...]. You were right in splitting the text into lines but the lines weren’t being added to it correctly. Tuples being immutable I’ll use a list instead:

  1. We know heading.text is to take a single cell so we can write the following to start with
row  = [heading.text] # this is what your z is
  1. We want each line to be a separate element so we split child.text:
lines = child.text.split("\n") 
# The text doesn’t start or end with a newline so this should suffice
  1. Now we want each element to be added to the row separately, we can make use of the extend() method on lists:
row.extend(lines)
# [1, 2].extend([3, 4, 5]) would result in [1, 2, 3, 4, 5]

To cumulate it:

row  = [heading.text]
lines = child.text.split("\n") 
row.extend(lines)

or unpacking it in a single line:

row = [heading.text, *child.text.split("\n")] # You can also use a tuple here
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much. One point to clarify which may require its own question, but in the write statement is it possible to know the element name or path of what I am writing? Some items are missing data points and I would like to detect if its "blank" but since a blank implies its not even there I am confused as to how I would check this.
I'm not sure what you mean by data points but that sounds possible. Create a new question for the same and I'll check it out. Best not to change the scope of the question here.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.