1

I have a xml file, mentioned below:

<?xml version="1.0" encoding="UTF-8"?>
<Workbook>
    <ExcelWorkbook
    xmlns="urn:schemas-microsoft-com:office:excel"/>
        <Worksheet ss:Name="Table 1">
            <Table>
                <Row ss:Index="7" ss:AutoFitHeight="0" ss:Height="12">
                <Cell ss:Index="1" ss:StyleID="s05">
                    <ss:Data ss:Type="String"
                        xmlns="http://www.w3.org/TR/REC-html40">
                        <Font html:Size="9" html:Face="Times New Roman" x:Family="Roman" html:Color="#000000">
                        ABCD
                        </Font>
                    </ss:Data>
                </Cell>
            </Row>

How do I extract the data, "ABCD" here, using SAX or XPATH in Java?

EDIT 1:

This is the XML-

<Table>
<Row ss:Index="74" ss:AutoFitHeight="0" ss:Height="14">
    <Cell ss:Index="1" ss:MergeAcross="3" ss:StyleID="s29">
        <ss:Data ss:Type="Number" xmlns="http://www.w3.org/TR/REC-html40">
        0.00
        </ss:Data>
    </Cell>
    <Cell ss:Index="15" ss:MergeAcross="5" ss:StyleID="s29">
        <ss:Data ss:Type="Number" xmlns="http://www.w3.org/TR/REC-html40">
        4.57
        </ss:Data>
    </Cell>
</Row>
5
  • 1
    does it have to be SAX? XPATH is much better suited for searching in XML doc Commented Apr 10, 2016 at 12:19
  • @sharonbn XPATH would be alright, but I am not at all familiar with it. Can you please help me out? Commented Apr 10, 2016 at 12:21
  • @sharonbn I modified your code. String cellStringContent = "/*[@ss:Type='Number']/*[text()]/text()";. But it gives error here- if (n.getNodeType() == Node.TEXT_NODE). Instead of TEXT_NODE i tried using other nodeType named constants, but it didnt work. Please help. Commented Apr 11, 2016 at 14:50
  • what is the error? what is the value of getNodeType in this case? what does the xml look like in this case? Commented Apr 11, 2016 at 15:53
  • try String cellStringContent = "/*[@ss:Type='Number']/text()"; the reason is that there is no <Font> element Commented Apr 11, 2016 at 16:00

2 Answers 2

1

The solution assumes that the question is how to get the text for any cell based on row and column numbers.

It took me a while to get the solution because of the use of namespaces in the input document. apparently, xpath cannot parse qualified elements and attributes without a namespace processor and one hsa to implement an interface for this purpose (there is no default?) so I found a map based implementation here and used it.

So, assuming you have the class from the link in your source tree, the following code works. I broke the search pattern to several variables for the sake of clarity

public static String getCellValue(String filename, int rowIdx, int colIdx) {
    // search for Table element anywhere in the source
    String tableElementPattern = "//*[name()='Table']";
    // search for Row element with given number
    String rowPattern = String.format("/*[name()='Row' and @ss:Index='%d']", rowIdx) ;
    // search for Cell element with given column number
    String cellPattern = String.format("/*[name()='Cell' and @ss:Index='%d']", colIdx) ;  
    // search for element that has ss:Type="String" attribute, search for element with text under it and get text name
    String cellStringContent = "/*[@ss:Type='String']/*[text()]/text()";  
    String completePattern = tableElementPattern + rowPattern + cellPattern + cellStringContent;

    try (FileReader reader = new FileReader(filename)) {
        XPath xPath = getXpathProcessor();
        Node n = (Node)xPath.compile(completePattern)
        .evaluate(new InputSource(reader), XPathConstants.NODE);
        if (n.getNodeType() == Node.TEXT_NODE) {
            return n.getNodeValue().trim();
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
    return null;
}

private static XPath getXpathProcessor() {
    // this is where the custom implementation of NamespaceContext is used
    NamespaceContext context = new NamespaceContextMap(
        "html", "http://www.w3.org/TR/REC-html40", 
        "xsl", "http://www.w3.org/1999/XSL/Transform",
        "o", "urn:schemas-microsoft-com:office:office",
        "x", "urn:schemas-microsoft-com:office:excel",
        "ss", "urn:schemas-microsoft-com:office:spreadsheet");
    XPath xpath =  XPathFactory.newInstance().newXPath();
    xpath.setNamespaceContext(context);
    return xpath;
}

calling:

System.out.println(getCellValue("C://Temp/xx.xml", 7, 1));

produces the desired output

Sign up to request clarification or add additional context in comments.

Comments

0

Below is the code to do query your XML with vtd-xml...

import com.ximpleware.*;

public class queryXML{

 public static void main(String[] s) throws VTDException{

        VTDGen vg = new VTDGen();
        vg.selectLcDepth(5);
        if (!vg.parseFile("d:\\xml\\test11.xml", false))
            return;
        VTDNav vn = vg.getNav();
        AutoPilot ap = new AutoPilot(vn);
        ap.declareNameSpace("ss","urn:schemas-microsoft-com:office:spreadsheet");
              ap.selectXPath("/Workbook/ExcelWorkbook/Worksheet/Table/Cell/ss:data/font/text()");
int i=0;
while((i=ap.evalXPath())!=-1){
 System.out.println(" data content ==>"+vn.toString(i);
}


}


}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.