0

Hi:) I am not able to figure out what the error in the program is could you please help me out with it. Thank you..:)

The input file contains the following:

3.  भारत का इतिहास काफी समृद्ध एवं विस्तृत है।
57. जैसे आज के झारखंड प्रदेश से, उन दिनों, बहुत से लोग चाय बागानों में मजदूरी करने के उद्देश्य से असम आए।

( its basically sample sentences for which i need to get word positions in the output appended to each word in hindi)

for e.g the output for the first sentence would look like this:

3.  भारत(1) का(2) इतिहास(3) काफी(4) समृद्ध(5) एवं(6) विस्तृत(7) है(8) ।(9)

I should get a similar op for the following sentence(s)

The code looks like this:

#!/usr/bin/python
# -*- coding: UTF-8 -*-
# encoding: utf-8
separators = [u'।', ',', '.']
text = open("hinstest1.txt").read()
#This converts the encoded text to an internal unicode object, where
# all characters are properly recognized as an entity:
text = text.decode("UTF-8")
#this breaks the text on the white spaces, yielding a list of words:
words = text.split()

counter = 1

output = ""
#if the last char is a separator, and is joined to the word:
for word in words:
    if word[-1] in separators and len(word) > 1:
        #word up to the second to last char:
        output += word[:-1] + u'(%d) ' % counter
        counter += 1
        #last char
        output += word[-1] +  u'(%d) ' % counter
    else:
        output += word + u'(%d) ' % counter
        counter += 1

    print output

The error I am getting is:

  File "pyth_hinwp.py", line 22
    output += word[-1] +  u'(%d) ' % counter
                         ^
SyntaxError: invalid syntax

I know this question is something similar to what I have asked earlier, but since I am not able to successfully execute some of the answers given to me earlier hence I am kinda restructuring the question to the place where I am currently getting stuck.

1
  • Cannot reproduce this error on Python 2.5.2! Commented Feb 20, 2010 at 6:17

2 Answers 2

3

What is posted here does not have the error. Note that what is posted has TWO space characters between the + and the u in output += word[-1] + u'(%d) ' % counter. What is probably happening is that you have a whitespace character other than a space in there. A possibility is NBSP (U+00A0) aka "no-break space". What SO does to format your code is likely to scrub away such things.

Diagnosis: At the Python interactive prompt, type

open("pyth_hinwp.py").readlines()[22-1]

What do you see between the + and the u?

Fix: in your editor, delete both characters between the + and the u. Insert a single space.

By the way, with a syntax error, the problem is entirely within the named SOURCE file; the code has not been run (because it couldn't be compiled) and so what is in your INPUT file has no bearing on the problem.

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you for your response:) , I tried running what you said at the interactive prompt. This is what I got : "\t\toutput += word[-1] +\xc2\xa0u'(%d) ' % counter\r\n" What do you think can I do to rectify this error?
'\xc2\xa0' is as I guessed an NBSP (U+00A0) encoded in UTF-8. Fix == rectify. Generalising what I wrote in my answer, use an editor to delete whatever is between the + and the u and then insert a single space.
Also, do not use any "word processing" editor of any kind to produce Python code ever. You must use the barest, simplest text-only editor. Spacing matters, and invisible characters (like a non-breaking space) are impossible to diagnose. Use idle or komodo edit or BBEdit or some programming tool. Do not use a word processor.
0

If you have syntax error, your editor may be showing it before even running it? I any case why don't you try removing that char where error is being indicated, because I am not able to replicate problem, after copying your code.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.