2

I'm using python code that converts my XML to a CSV file and reads specific fields like "full_name", "item_name", "price", "in_stock". Unfortunately, I have a problem with reading the EAN field. During conversion, he receives the error: "AttributeError: 'NoneType' object has no attribute 'text'". I would like to add that when I remove the EAN code, everything works without any problems. How to modify the code so that it can read the EAN as well? I would be grateful for a specific piece of code that I need to add.

Below is a piece of XML file:

<?xml version="1.0" encoding="UTF-8"?>
<catalogue date="2022-08-23 15:58" GMT= "+1">
    <product>
        <id>14726</id>
        <manufacturer>Kieslect</manufacturer>
        <item_name>Kieslect Smart Tag Lite Pack (2 x Black and 1 x White) Black White</item_name>
        <sku>157003-126899-18495_HU03</sku>
        <warehouse>HU03</warehouse>
        <bar_code>157003-126899-18495</bar_code>
        <in_stock><![CDATA[&amp;lt;50]]></in_stock>
        <exp_delivery><![CDATA[0]]></exp_delivery>
        <delivery_date>0000-00-00</delivery_date>
        <price>20.00</price>
        <image>https://images.bluefinmobileshop.com/1637675528/large-full/kieslect-smart-tag-lite-pack-2-x-black-and-1-x-white-black-white.jpg</image>
        <properties>            <full_name>Kieslect Smart Tag Lite (6974377570098)</full_name>
            <ean>6974377570098</ean>
        </properties>
        <category>accessory</category>
    </product>
</catalogue>

Here is my Python code:


# Importing the required libraries
import xml.etree.ElementTree as Xet
import pandas as pd
  
cols = ["full_name", "item_name", "price", "in_stock", "ean"]
rows = []
  
# Parsing the XML file
xmlparse = Xet.parse('in.xml')
root = xmlparse.getroot()

parameters = root.findall('.//product')
for product in parameters:
    item_name = product.find("item_name").text
    in_stock = product.find("in_stock").text
    price = product.find("price").text
    sku = product.find("sku").text
    for child in product.findall('.//properties'):
        full_name = child.find('full_name').text
        ean = child.find('ean').text
  
    rows.append({
        "full_name": full_name,
        "item_name": full_name,
        "price": price,
        "in_stock": in_stock,
        "ean": ean
        })
  
df = pd.DataFrame(rows, columns=cols)
  
# Writing dataframe to csv
df.to_csv('out.csv', index=False)

3
  • Not reproducible for me. The code is working. Commented Sep 9, 2022 at 20:41
  • Jacek Kupiec - is the xml in the post generates this error or its a different one? Commented Sep 10, 2022 at 12:51
  • Why use the large data analysis library, pandas, if converting XML to CSV? Simply use built-in csv and pass parsed dict of values with its DictWriter. Commented Sep 10, 2022 at 16:08

2 Answers 2

1

Likely, the error you receive is due to larger XML (not sample posted) where one or more of elements (not just <EAN>) is not an available element and hence contains no text attribute.

For this reason consider Element.findtext where it defaults to None if node text does not exist. Additionally, consider built-in csv with its DictWriter and avoid the large pandas library.

# Importing the required libraries
import csv
import xml.etree.ElementTree as Xet

# Parsing the XML file
doc = Xet.parse('in.xml')

# Initialize CSV file for writing
with open('out.csv', 'w', newline='') as csvfile:
    cols = ["full_name", "item_name", "price", "in_stock", "ean"]
    writer = csv.DictWriter(csvfile, fieldnames=cols)
    writer.writeheader()

    # Iterate through elements and write rows to CSV
    parameters = doc.findall('.//product')
    for product in parameters:
        item_name = product.findtext("item_name")
        in_stock = product.findtext("in_stock")
        price = product.findtext("price")
        sku = product.findtext("sku")
        full_name = product.findtext('properties/full_name')
        ean = product.findtext('properties/ean')
  
        writer.writerow({
            "full_name": full_name,
            "item_name": item_name,
            "price": price,
            "in_stock": in_stock,
            "ean": ean
        })
Sign up to request clarification or add additional context in comments.

Comments

0

@Parfait, thanks for your help. This code is working finally! :)

Also i have the last question without making new thread:

I have this python code:
# Importing the required libraries
import csv
import xml.etree.ElementTree as Xet

# Parsing the XML file
doc = Xet.parse('in.xml')

# Initialize CSV file for writing
with open('out.csv', 'w', newline='') as csvfile:
    cols = ["Indeks", "Nazwa", "Ean", "Stan_mag", "Cena_zakupu_netto", "Link_do_zdjecia"]
    writer = csv.DictWriter(csvfile, fieldnames=cols)
    writer.writeheader()

    # Iterate through elements and write rows to CSV
    parameters = doc.findall('.//Produkt')
    for Produkt in parameters:
        Indeks = Produkt.findtext("Indeks")
        Nazwa = Produkt.findtext("Nazwa")
        Ean = Produkt.findtext("Ean")
        Stan_mag = Produkt.findtext("Stan_mag")
        Cena_zakupu_netto = Produkt.findtext('Cena_zakupu_netto')
        Link_do_zdjecia = Produkt.findtext('Linki_do_zdjec/Link_do_zdjecia')
  
        writer.writerow({
            "Indeks": Indeks,
        "Nazwa": Nazwa,
        "Ean": Ean,
        "Stan_mag": Stan_mag,
            "Cena_zakupu_netto": Cena_zakupu_netto,
            "Link_do_zdjecia": Link_do_zdjecia
        })

When I use it to convert an XML file with this structure, it's working, but in the output file it only extracts the first link from everything in <Linki_do_zdjec>. How to make the output file include links to pictures 1, 2 and 3, not just to the first photo in <Link_do_zdjecia>. How to deal with cases where three tags have the same name.

 <Produkt>
  <Marka><![CDATA[HP]]></Marka>
  <Indeks>UK707A</Indeks>
  <Nazwa><![CDATA[Gwarancja HP Care Pack -rozszerzenie gwarancji do 3 lat D2D]]></Nazwa>
  <Ean>0884420301066</Ean>
  <Kategoria><![CDATA[Komputery i Monitory]]></Kategoria>
  <Stan_mag>4</Stan_mag>
  <Cena_zakupu_netto>55.00</Cena_zakupu_netto>
  <Vat>23</Vat>
  <Kod_PCN/>
  <Szt_dlugosc>21</Szt_dlugosc>
  <Szt_szerokosc>15</Szt_szerokosc>
  <Szt_wysokosc>1</Szt_wysokosc>
  <Szt_waga_netto>0.0800</Szt_waga_netto>
  <Szt_waga_brutto>0.1000</Szt_waga_brutto>
  <Linki_do_zdjec/>
  <Opis><![CDATA[]]></Opis>
 </Produkt>
 <Produkt>
  <Marka><![CDATA[HP]]></Marka>
  <Indeks>AAJ451AA#HP</Indeks>
  <Nazwa><![CDATA[HP ExpressCard Smart Card Reader]]></Nazwa>
  <Ean>0883585441587</Ean>
  <Kategoria><![CDATA[Akcesoria i peryferia]]></Kategoria>
  <Stan_mag>1017</Stan_mag>
  <Cena_zakupu_netto>5.00</Cena_zakupu_netto>
  <Vat>23</Vat>
  <Kod_PCN>85234110</Kod_PCN>
  <Szt_dlugosc>15</Szt_dlugosc>
  <Szt_szerokosc>22</Szt_szerokosc>
  <Szt_wysokosc>2</Szt_wysokosc>
  <Szt_waga_netto>0.0800</Szt_waga_netto>
  <Szt_waga_brutto>0.1000</Szt_waga_brutto>
  <Linki_do_zdjec>
   <Link_do_zdjecia><![CDATA[https://ckmediator.enovab2b.pl/gfx/content/products/ftp/3474/6958_1.jpg]]></Link_do_zdjecia>
   <Link_do_zdjecia><![CDATA[https://ckmediator.enovab2b.pl/gfx/content/products/ftp/3474/6958_2.jpg]]></Link_do_zdjecia>
   <Link_do_zdjecia><![CDATA[https://ckmediator.enovab2b.pl/gfx/content/products/ftp/3474/6958_3.jpg]]></Link_do_zdjecia>
  </Linki_do_zdjec>
  <Opis><![CDATA[]]></Opis>
 </Produkt>

2 Comments

Normally on SO, you do not need to post a separate answer that largely echoes a solution from a different answer. Everyone understands code must be adjusted to fit actual use case.
To retrieve the other links in child records, consider indexing in path expression: Linki_do_zdjec/Link_do_zdjecia[1], Linki_do_zdjec/Link_do_zdjecia[2], ... And assign as separate dict keys and columns: Link_do_zdjecia_1, Link_do_zdjecia_2, ...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.