3

Here are two XMLs , I am trying to compare and put the respective data in excel sheet.

I have a multidimensional array called provisions.

<xml>
<Item type="ItemHeader" name="Plan Features" id="id_1"></Item>
<Item type="Deductible" name="Deductible" id="a">Calendar Year
 <Item type="Text" name="Individual" id="b">5,000</Item>
 <Item type="Text" name="Family" id="c">10,000</Item>
 <Item type="Text" name="Family Out-of-Network" id="id_4">15,000</Item>
</Item>
<Item lock="|delete|" type="Empty" name="Out-of-Pocket Annual Maximum" id="id_2">
 <Item type="Text" name="Individual" id="d">5,000</Item>
 <Item type="Text" name="Family" id="e">10,000</Item>
</Item>
<Item type="Text" name="Life Time Maximum" id="u">Unlimited</Item>
<Item type="Text" name="Coinsurance" id="f"></Item>
<Item type="Text" name="Office Visits" id="g"></Item>
<Item type="Text" name="Routine Physicals" id="h"></Item>
<Item type="Text" name="Preventive Care" id="m"></Item>
<Item type="Text" name="Physician Services" id="i"></Item>
<Item type="Text" name="Emergency Room Services / Urgent Care" id="j"></Item>
<Item type="Text" name="Hospital Admission Services" id="k"></Item>
<Item type="Text" name="Chiropractic" id="n"></Item>
<Item type="Text" name="Prescription Drugs" id="l"></Item>
<Item type="Text" name="Specialty Drugs" id="o"></Item>
<Item type="Boolean" name="Pre Tax Reduction Available" id="t">false</Item>
<Item type="Boolean" name="Conversion Privilege" id="p">false</Item>
<Item type="ItemHeader" name="Plan Setup" id="id_3"></Item>
<Item type="Termination" name="Benefit Termination Date" id="q">Immediate</Item>
<Item type="Determination" name="Premium Redetermination Date" id="r">Not Applicable</Item>
<Item type="Participation" name="Participation Requirement" id="s"></Item>
</xml>

AND

<xml>
<Item type="ItemHeader" name="Plan Features" id="id_1"></Item>
<Item type="Deductible" name="Deductible" id="a">Calendar Year
 <Item type="Text" name="Individual" id="b">3,000</Item>
 <Item type="Text" name="Family" id="c">6,000</Item>
</Item>
<Item lock="|delete|" type="Empty" name="Out-of-Pocket Annual Maximum" id="id_2">
 <Item type="Text" name="Individual" id="d">5,000</Item>
 <Item type="Text" name="Family" id="e">10,000</Item>
</Item>
<Item type="Text" name="Life Time Maximum" id="u">Unlimited</Item>
<Item type="Text" name="Coinsurance" id="f"></Item>
<Item type="Text" name="Office Visits" id="g"></Item>
<Item type="Text" name="Routine Physicals" id="h"></Item>
<Item type="Text" name="Preventive Care" id="m"></Item>
<Item type="Text" name="Physician Services" id="i"></Item>
<Item type="Text" name="Emergency Room Services / Urgent Care" id="j"></Item>
<Item type="Text" name="Hospital Admission Services" id="k"></Item>
<Item type="Text" name="Chiropractic" id="n"></Item>
<Item type="Text" name="Prescription Drugs" id="l"></Item>
<Item type="Text" name="Specialty Drugs" id="o"></Item>
<Item type="Boolean" name="Pre Tax Reduction Available" id="t">false</Item>
<Item type="Boolean" name="Conversion Privilege" id="p">false</Item>
<Item type="ItemHeader" name="Plan Setup" id="id_3"></Item>
<Item type="Termination" name="Benefit Termination Date" id="q">Immediate</Item>
<Item type="Determination" name="Premium Redetermination Date" id="r">Not Applicable</Item>
<Item type="Participation" name="Participation Requirement" id="s"></Item>
</xml>

Now this XML data is for 2 plans and my provisions array contains

provisions == [[Plan Features,,][Deductible,,][Individual,,].....]

This is what I have done

for(int j = 0; j < plans.length; j++){
    Vector<String> vr = (Vector<String>) tagidPlan.get(plans[j].getId());
    for(int i = 0; i < vr.size(); i++){
     provisions[i][j+1] = getValues(plans[j],vr.get(i));
    }
}

The problem happens when that extra node of Family Out-of-network comes into picture. This is my final array is

[[Plan Features:, Medical HMO, Medical PPO], [Deductible Year:, Calendar Year, Calendar Year], [Individual:, 5,000, 3,000], [Family:, 10,000, 6,000], [Family Out-of-Network:, 15,000, null], [Out-of-Pocket Annual Maximum:, null, 5,000], [Individual:, 5,000, 10,000], [Family:, 10,000, Unlimited], [Life Time Maximum:, Unlimited, ], [Coinsurance:, , ], [Office Visits:, , ], [Routine Physicals:, , ], [Preventive Care:, , ], [Physician Services:, , ], [Emergency Room Services / Urgent Care:, , ], [Hospital Admission Services:, , ], [Chiropractic:, , ], [Prescription Drugs:, , ], [Specialty Drugs:, , false], [Pre Tax Reduction Available:, false, false], [Conversion Privilege:, false, ], [Plan Setup:, , Immediate], [Benefit Termination Date:, Immediate, Not Applicable], [Premium Redetermination Date:, Not Applicable, ], [Participation Requirement:, , null]]

I want to get right values into corresponding array element.

More code can be seen here pastie.org/1308625

6
  • 1
    Have you thought about XSLT? It seems like it would be a good fit in this scenario. Commented Nov 11, 2010 at 17:53
  • well, i am not using XSLT , xml data is stored in database in a string format. Commented Nov 11, 2010 at 18:05
  • 2
    Avoid using Vector. It is old and uses a lot of memory. Use a Collection instead, for instance ArrayList. Commented Nov 11, 2010 at 18:22
  • 1
    @Shervin, isn't Vector a collection? Commented Nov 18, 2010 at 6:09
  • @yogsma - It will be helpful to make a jump start if you can make a small test class for this and share the entire code. You can use tools like Pastie or Gist to share the code. Commented Nov 18, 2010 at 10:16

4 Answers 4

4
+50

Don't use an array.

Use: Map<String, Map<String, String>>

so that:

  • the first String (key to the outer map) is the feature name (e.g. "Life Time Maximum")
  • the second String (key to the inner map) is the plan name (there don't seem to be any actual plan names in your XML documents so "Plan1" and "Plan2" could suffice)
  • the third String (value to the inner map) should be the value for that particular feature in that particular plan (e.g. "Unlimited" for "Life Time Maximum" in "Plan1")

You could have:

{ Life Time Maximum: { Plan1: Unlimited, Plan2: Unlimited } }
{ Family Out-Of-Network: { Plan1: 15,000 } }

as, unlike an array, the number of entries for each feature doesn't have to be fixed (different features can have different numbers of entries)

Sign up to request clarification or add additional context in comments.

4 Comments

How would I put same Plan1 as key for different values?
Each feature will have the name of the feature and that name will be mapped to a separate map of plan names vs feature values. Although "Plan1" and "Plan2" are appearing many times, they only appear at most once in any individual map of plan names vs feature values
There is still problem with what you are saying since there are duplicate features of Individual and Family which messes up whole equation.
The simple solution there is to rename those into: "Deductible - Individual"; "Out-of-Pocket Annual Maximum - Individual" and so on
2

Take a look at DiffX (Open Source Java API for XML Diffing). It provides the algorithms for comparing XML documents, and it gives you a nice summary of nodes/attributes/text that were added/deleted/changed (changes are indicated as a delete, followed by an insert). We're using it in a project that I'm currently involved in; it works really well.

Comments

1

Here is a simple algorithm:

  • Load the data into two DOM models
  • Iterate over all nodes in the first model, depth first (i.e. first work on children and then the parent nodes)
  • Try to find the same node in the second model. If you can't find one -> You found a node that only exists in document 1
  • Compare all attributes between the two nodes. Add any differences to your excel sheet
  • Remove the node in the second document unless it has children
  • Iterate over all nodes in the second model, depth first
  • Try to find the same node in the first model. If you can't find one -> You found a node that only exists in document 2

Depending on the structure (i.e. if you are sure that the nodes are always the same and only the attributes/text children can be different), you can omit some steps.

2 Comments

logically it is perfect, the problem is I might not have necessarily 2 DOMs , there can be n number.
N-way compare works the same way but confuses the user. Try harder to reduce the problem to a 2-way compare. For example, define document as the "root" and compare all others against that one.
0

I think you are re-inventing the wheel. Look into these open source alternatives: Link

1 Comment

No, I am not looking at this. I want to compare those two xmls and put the data in excel sheet for users.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.