0

I have a sentence with plain numbers and ordinal numbers and I wanted to convert ordinal digits to words like 2 nd to Second, 56 th to Fifty sixth. I used the library num2words and below code works perfectly.

import num2words
text = "ordinal numbers are like as 42 nd, 67 th, and 5 th and plain numbers such as 1, 2, 3."
numbers = re.findall('(\d+ )[st|nd|rd|th]', text)
numbers
for n in numbers:
    ordinalAsString = num2words.num2words(n, ordinal=True)
    print(ordinalAsString)
    #forty-second
    #sixty-seventh
    #fifth

Now I want to create a lambda function such that,

sentence = "ordinal numbers are like as 42 nd, 67 th, and 5 th and plain numbers such as 1, 2, 3."
o/p sentence = "ordinal numbers are like as fourty-second, sixty-seventh, and fifth and plain numbers such as 1, 2, 3."

I wrote the function like this,

sentence = re.sub(r"(\d+ )[st|nd|rd|th]", lambda x: num2words.num2words(str(x), ordinal=True), sentence)

But that throws an error like,

InvalidOperation: [<class 'decimal.ConversionSyntax'>]

What is wrong in the code?

8
  • Your regular expressions are different, is that the problem? Commented Nov 3, 2021 at 17:54
  • No regrex are same. @JacksonH Commented Nov 3, 2021 at 17:59
  • 2
    str(x) isn't going to give you the string you want; x.group(1) will. [st|nd|rd|th] also does not match what you think it does; it's equivalent to [st|ndrh], with the | acting as a literal character, not an alternation operator. Commented Nov 3, 2021 at 18:16
  • 2
    Yes, that's because the bracket expression isn't matching one of st, nd, rd, or th; it's matching one of s, t, |, n, d, r, or h. Commented Nov 3, 2021 at 18:25
  • 1
    Try (?:st|nd|rd|th). (st|nd|rd|th) would work as well, but no use making it a capture group if you don't need to use the captured suffix. Commented Nov 3, 2021 at 18:25

1 Answer 1

1

There are two problems:

  1. Your regular expression isn't correctly matching the suffix, only the first letter of the suffix. [st|nd|rd|th] matches exactly one of the characters inside the brackets; duplicates are ignored, so it's equivalent to [st|ndrh], with the | treated as a character to match like each of the letters. Use r"(\d+ )(?:st|nd|rd|th)" instead; the | inside the non-capture group (?:...) does work to separate the 4 patterns st, nd, rd, and th.

  2. The callable passed to re.sub takes a Match object as its argument. You need to use its group method to extract the captured sting. lambda x: num2words.num2words(x.group(1), ordinal=True).

Sign up to request clarification or add additional context in comments.

1 Comment

yes, it worked. Thanks a ton. For explanation and answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.