Problem with converting XML to CSV 'NoneType' object has no attribute 'text'

Question

I'm using python code that converts my XML to a CSV file and reads specific fields like "full_name", "item_name", "price", "in_stock". Unfortunately, I have a problem with reading the EAN field. During conversion, he receives the error: "AttributeError: 'NoneType' object has no attribute 'text'". I would like to add that when I remove the EAN code, everything works without any problems. How to modify the code so that it can read the EAN as well? I would be grateful for a specific piece of code that I need to add.

Below is a piece of XML file:

<?xml version="1.0" encoding="UTF-8"?>
<catalogue date="2022-08-23 15:58" GMT= "+1">
    <product>
        <id>14726</id>
        <manufacturer>Kieslect</manufacturer>
        <item_name>Kieslect Smart Tag Lite Pack (2 x Black and 1 x White) Black White</item_name>
        <sku>157003-126899-18495_HU03</sku>
        <warehouse>HU03</warehouse>
        <bar_code>157003-126899-18495</bar_code>
        <in_stock><![CDATA[&amp;lt;50]]></in_stock>
        <exp_delivery><![CDATA[0]]></exp_delivery>
        <delivery_date>0000-00-00</delivery_date>
        <price>20.00</price>
        <image>https://images.bluefinmobileshop.com/1637675528/large-full/kieslect-smart-tag-lite-pack-2-x-black-and-1-x-white-black-white.jpg</image>
        <properties>            <full_name>Kieslect Smart Tag Lite (6974377570098)</full_name>
            <ean>6974377570098</ean>
        </properties>
        <category>accessory</category>
    </product>
</catalogue>

Here is my Python code:


# Importing the required libraries
import xml.etree.ElementTree as Xet
import pandas as pd
  
cols = ["full_name", "item_name", "price", "in_stock", "ean"]
rows = []
  
# Parsing the XML file
xmlparse = Xet.parse('in.xml')
root = xmlparse.getroot()

parameters = root.findall('.//product')
for product in parameters:
    item_name = product.find("item_name").text
    in_stock = product.find("in_stock").text
    price = product.find("price").text
    sku = product.find("sku").text
    for child in product.findall('.//properties'):
        full_name = child.find('full_name').text
        ean = child.find('ean').text
  
    rows.append({
        "full_name": full_name,
        "item_name": full_name,
        "price": price,
        "in_stock": in_stock,
        "ean": ean
        })
  
df = pd.DataFrame(rows, columns=cols)
  
# Writing dataframe to csv
df.to_csv('out.csv', index=False)

Jacek Kupiec - is the xml in the post generates this error or its a different one? — balderman
– balderman, Commented Sep 10, 2022 at 12:51
Why use the large data analysis library, pandas, if converting XML to CSV? Simply use built-in csv and pass parsed dict of values with its DictWriter. — Parfait
– Parfait, Commented Sep 10, 2022 at 16:08

Parfait · Accepted Answer · 2022-09-10 16:26:17Z

Likely, the error you receive is due to larger XML (not sample posted) where one or more of elements (not just <EAN>) is not an available element and hence contains no text attribute.

For this reason consider Element.findtext where it defaults to None if node text does not exist. Additionally, consider built-in csv with its DictWriter and avoid the large pandas library.

# Importing the required libraries
import csv
import xml.etree.ElementTree as Xet

# Parsing the XML file
doc = Xet.parse('in.xml')

# Initialize CSV file for writing
with open('out.csv', 'w', newline='') as csvfile:
    cols = ["full_name", "item_name", "price", "in_stock", "ean"]
    writer = csv.DictWriter(csvfile, fieldnames=cols)
    writer.writeheader()

    # Iterate through elements and write rows to CSV
    parameters = doc.findall('.//product')
    for product in parameters:
        item_name = product.findtext("item_name")
        in_stock = product.findtext("in_stock")
        price = product.findtext("price")
        sku = product.findtext("sku")
        full_name = product.findtext('properties/full_name')
        ean = product.findtext('properties/ean')
  
        writer.writerow({
            "full_name": full_name,
            "item_name": item_name,
            "price": price,
            "in_stock": in_stock,
            "ean": ean
        })

Jacek Kupiec · Accepted Answer · 2022-09-12 08:35:59Z

@Parfait, thanks for your help. This code is working finally! :)

Also i have the last question without making new thread:

I have this python code:
# Importing the required libraries
import csv
import xml.etree.ElementTree as Xet

# Parsing the XML file
doc = Xet.parse('in.xml')

# Initialize CSV file for writing
with open('out.csv', 'w', newline='') as csvfile:
    cols = ["Indeks", "Nazwa", "Ean", "Stan_mag", "Cena_zakupu_netto", "Link_do_zdjecia"]
    writer = csv.DictWriter(csvfile, fieldnames=cols)
    writer.writeheader()

    # Iterate through elements and write rows to CSV
    parameters = doc.findall('.//Produkt')
    for Produkt in parameters:
        Indeks = Produkt.findtext("Indeks")
        Nazwa = Produkt.findtext("Nazwa")
        Ean = Produkt.findtext("Ean")
        Stan_mag = Produkt.findtext("Stan_mag")
        Cena_zakupu_netto = Produkt.findtext('Cena_zakupu_netto')
        Link_do_zdjecia = Produkt.findtext('Linki_do_zdjec/Link_do_zdjecia')
  
        writer.writerow({
            "Indeks": Indeks,
        "Nazwa": Nazwa,
        "Ean": Ean,
        "Stan_mag": Stan_mag,
            "Cena_zakupu_netto": Cena_zakupu_netto,
            "Link_do_zdjecia": Link_do_zdjecia
        })

When I use it to convert an XML file with this structure, it's working, but in the output file it only extracts the first link from everything in <Linki_do_zdjec>. How to make the output file include links to pictures 1, 2 and 3, not just to the first photo in <Link_do_zdjecia>. How to deal with cases where three tags have the same name.

 <Produkt>
  <Marka><![CDATA[HP]]></Marka>
  <Indeks>UK707A</Indeks>
  <Nazwa><![CDATA[Gwarancja HP Care Pack -rozszerzenie gwarancji do 3 lat D2D]]></Nazwa>
  <Ean>0884420301066</Ean>
  <Kategoria><![CDATA[Komputery i Monitory]]></Kategoria>
  <Stan_mag>4</Stan_mag>
  <Cena_zakupu_netto>55.00</Cena_zakupu_netto>
  <Vat>23</Vat>
  <Kod_PCN/>
  <Szt_dlugosc>21</Szt_dlugosc>
  <Szt_szerokosc>15</Szt_szerokosc>
  <Szt_wysokosc>1</Szt_wysokosc>
  <Szt_waga_netto>0.0800</Szt_waga_netto>
  <Szt_waga_brutto>0.1000</Szt_waga_brutto>
  <Linki_do_zdjec/>
  <Opis><![CDATA[]]></Opis>
 </Produkt>
 <Produkt>
  <Marka><![CDATA[HP]]></Marka>
  <Indeks>AAJ451AA#HP</Indeks>
  <Nazwa><![CDATA[HP ExpressCard Smart Card Reader]]></Nazwa>
  <Ean>0883585441587</Ean>
  <Kategoria><![CDATA[Akcesoria i peryferia]]></Kategoria>
  <Stan_mag>1017</Stan_mag>
  <Cena_zakupu_netto>5.00</Cena_zakupu_netto>
  <Vat>23</Vat>
  <Kod_PCN>85234110</Kod_PCN>
  <Szt_dlugosc>15</Szt_dlugosc>
  <Szt_szerokosc>22</Szt_szerokosc>
  <Szt_wysokosc>2</Szt_wysokosc>
  <Szt_waga_netto>0.0800</Szt_waga_netto>
  <Szt_waga_brutto>0.1000</Szt_waga_brutto>
  <Linki_do_zdjec>
   <Link_do_zdjecia><![CDATA[https://ckmediator.enovab2b.pl/gfx/content/products/ftp/3474/6958_1.jpg]]></Link_do_zdjecia>
   <Link_do_zdjecia><![CDATA[https://ckmediator.enovab2b.pl/gfx/content/products/ftp/3474/6958_2.jpg]]></Link_do_zdjecia>
   <Link_do_zdjecia><![CDATA[https://ckmediator.enovab2b.pl/gfx/content/products/ftp/3474/6958_3.jpg]]></Link_do_zdjecia>
  </Linki_do_zdjec>
  <Opis><![CDATA[]]></Opis>
 </Produkt>

Normally on SO, you do not need to post a separate answer that largely echoes a solution from a different answer. Everyone understands code must be adjusted to fit actual use case.
To retrieve the other links in child records, consider indexing in path expression: Linki_do_zdjec/Link_do_zdjecia[1], Linki_do_zdjec/Link_do_zdjecia[2], ... And assign as separate dict keys and columns: Link_do_zdjecia_1, Link_do_zdjecia_2, ...

Collectives™ on Stack Overflow

Problem with converting XML to CSV 'NoneType' object has no attribute 'text'

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related