1

I have a XML file which I want to convert to CSV using Python. The program should convert the every xml data and also large one.

I tried it for 6 weeks, but without any success. I searched the whole Internet but now I need your help.

The program is printing a csv data but without any contents. Thank you for your help!

5
  • 2
    Please provide a sample of the code you've tried so far and also what the output should look like Commented Aug 9, 2021 at 8:33
  • tag.string is still a bs4 object, try str(tag.string) Commented Aug 9, 2021 at 9:05
  • Could you put in some sample data in your expected CSV file format and show us how it looks like? It isn't clear. Commented Aug 9, 2021 at 9:51
  • 1
    This "unicode" stuff seems like a distraction that has nothing to do with the actual problem of getting from XML to CSV. Just replace xml2csv(unicode(sys.argv[1])) with xml2csv("input.xml"). Please trim your code down as much as possible. You need to provide a proper minimal reproducible example. There are also indentation errors in the code in the question. Commented Aug 9, 2021 at 10:11
  • 1
    Please don't vandalise your posts Commented Aug 10, 2021 at 12:13

1 Answer 1

2

It is not possible to write a generic script that would handle ALL types of XML files, you will need to adapt the script for your own needs.

The following approach should get you started, it first finds all of the Metrics tags and builds up CSV rows from there using a standard csv.DictWriter().

from bs4 import BeautifulSoup
import csv

headers = [
    "Scope", 
    "Project", 
    "Namespace", 
    "Type", 
    "Member",
    "MaintainabilityIndex",
    "CyclomaticComplexity",
    "ClassCoupling",
    "DepthOfInheritance",
    "SourceLines",
    "ExecutableLines",
]

namespace = ''

with open('input.xml') as f_input:
    soup = BeautifulSoup(f_input.read(), "xml")

with open('output.csv', 'w', newline='') as f_output:
    csv_output = csv.DictWriter(f_output, fieldnames=headers)
    csv_output.writeheader()
    
    for metrics in soup.find_all('Metrics'):
        parent = metrics.parent
        
        if parent.name == 'Namespace':
            namespace = parent['Name']
        elif parent.name == 'Method':
            member = parent['Name']
        else:
            member = ''
        
        row = {'Scope' : parent.name, 'Namespace' : namespace, 'Member' : member}
        
        for metric in metrics.find_all('Metric'):
            row[metric['Name']] = metric['Value']
        
        csv_output.writerow(row)

This would give you an output.csv CSV file starting like:

Scope,Project,Namespace,Type,Member,MaintainabilityIndex,CyclomaticComplexity,ClassCoupling,DepthOfInheritance,SourceLines,ExecutableLines
Assembly,,,,,88,17,35,3,100,22
Namespace,,TEST,,,84,7,26,1,63,15
NamedType,,TEST,,,91,2,8,1,14,3
Method,,TEST,,void Program.Main(string[] args),93,1,3,,4,1
Method,,TEST,,IHostBuilder Program.CreateHostBuilder(string[] args),84,1,6,,6,2
NamedType,,TEST,,,78,5,19,1,43,12
Method,,TEST,,Startup.Startup(IConfiguration configuration),96,1,1,,4,1

Hopefully this helps to get you started. I made use of the xml parser as this does not convert all tags to lowercase.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.