0

There is an issue with the output XML format due to Record separator (RS = "\n \n"). The group of data is separated by an empty line. Any suggestion would be very much helpful to obtain the desired output.

Following is my input which is present in input.txt

Alex
Marks300
SubjectScience

Robin
Marks200
SubjectChemistry

I am trying to get an output as below:

<candidate>
<name>Alex</name>
<marks>Marks300</marks>
<subject>SubjectScience</subject>
</candidate>

<candidate>
<name>Robin</name>
<marks>Marks200</marks>
<subject>SubjectChemistry</subject>
</candidate>

I am trying to use the following code but it is not working:

awk 'BEGIN{FS = "\\n";RS = "\\n\\n";
print " "}
{ print "<candidate>" }
{ print "<name>"$1"</name>" }
{ print "<marks>"$2"</marks>" }
{ print "<subject>"$3"</subject>" }
{ print "</candidate>" }
{print " " }' input.txt > candiatefinaloutput.xml

With the above code, getting an output as below:

<candidate>
<name>alex<\name>
<marks><\marks>
<subject><\subject>

<name>Marks300<\name>
<marks><\marks>
<subject><\subject>

<name>SubjectScience<\name>
<marks><\marks>
<subject><\subject>

<name>Robin<\name>
<marks><\marks>
<subject><\subject>

and so on.

3 Answers 3

1

You can try something like

awk 'BEGIN{FS = "\n";RS = "\n\n";
print " "}
{ print "<candidate>" }
{ print "<name>"$1"</name>" }
{ print "<marks>"$2"</marks>" }
{ print "<subject>"$3"</subject>" }
{ print "</candidate>" }
{print " " }' input.txt > candiatefinaloutput.xml

And there is what i get:

[romeo.romeo-PC] ➤ cat 3
Alex
Marks300
SubjectScience

Robin
Marks200
SubjectChemistry
                                                                                                                               ✔
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
[2015-05-07 09:00.04]  ~/tmp
[romeo.romeo-PC] ➤ awk 'BEGIN{FS = "\n";RS = "\n\n";
print " "}
{ print "<candidate>" }
{ print "<name>"$1"</name>" }
{ print "<marks>"$2"</marks>" }
{ print "<subject>"$3"</subject>" }
{ print "</candidate>" }
{print " " }' 3

<candidate>
<name>Alex</name>
<marks>Marks300</marks>
<subject>SubjectScience</subject>
</candidate>

<candidate>
<name>Robin</name>
<marks>Marks200</marks>
<subject>SubjectChemistry </subject>
</candidate>
13
  • Sorry. I did not understand the difference between my code and what you have provided. Both look alike. Is there a difference which I am missing. Commented May 7, 2015 at 4:50
  • Yes, for name I use 1st token, for marks second ($2) (you use again first) and for subject I use 3th (you again 1st token) ($3) Commented May 7, 2015 at 4:53
  • My mistake. I intended to write the token $2 $3. I have now edited my question. I have also provided the output I am currently getting with the given code. Commented May 7, 2015 at 5:02
  • I see. I edit my answer, please test it. And try gnu awk: gawk (if UNIX system) Commented May 7, 2015 at 5:19
  • 1
    Sure. I was able to mark answer as useful. But it did not allow me to do an up arrow since I do not have the required reputation count as yet. Commented May 7, 2015 at 6:38
0

I hope it's not that you are missing the closing single quote for awk - hopefully typo! Also remember you can just print "\n" instead of lots of separate print commands (or even use a semicolon to separate them).

1
  • No. the quote was missed in my question but it is present in my actual code. I added it here as well now. Thanks for your tip on the /n :) Commented May 7, 2015 at 6:00
0

Please don't use awk for XML parsing. This is a bad idea, because XML supports things like line feeding, indentation, line wrapping on attributes and unary tags - all of which means that semantically identical XML breaks when you use a line/field/regex oriented approach.

So I would strongly suggest using an XML tool to build your XML - and as an example:

use strict;
use warnings;
use XML::Twig;

my $twig = XML::Twig->new( 'pretty_print' => 'indented_a' );
$twig->set_root( XML::Twig::Elt->new('root') );

open( my $input, "<", "input.txt" ) or die $!;

local $/ = "\n\n";

while (<$input>) {
    my ( $name, $marks, $subject ) = (m/(\w+)\nMarks(\d+)\nSubject(\w+)/s);
    my $candidate = $twig->root->insert_new_elt( 'last_child', 'candidate' );
    $candidate->insert_new_elt( 'last_child', 'name',    $name );
    $candidate->insert_new_elt( 'last_child', 'marks',   $marks );
    $candidate->insert_new_elt( 'last_child', 'subject', $subject );
}
close($input);
$twig->print;

As a result, you can arbitrarily set your output format to something that's neatest for display your content. In the interests of generating 'proper' valid XML, you probably also want to include:

$twig -> set_xml_version('1.0');
$twig -> set_encoding('utf-8'); 

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.