1

I am trying to encode a piece of text that I am getting from an Excel document. It contains all sorts of weird characters like quotation mark, backslashes, parentheses etc. What is the proper way to convert it to Python compatible string so I can process it and write it to a variable?

ExampleText = "MINIMUM TRACK FASTENING SHALL BE 0.145" DIAMETER POWDER ACTUATED FASTENERS (P.A.F.S) SPACED ON 8" CENTERS FOR BEARING WALLS, AND AT 12" O.C. FOR NON-LOAD BEARING WALLS (U.N.O.), WITH 1 1/2" MINIMUM PENETRATION INTO CONCRETE. AT X-BRACED SHEAR WALLS, TRACK SHALL BE ATTACHED PER DETAILS.  At Infinity Shear Panels (ISP’S) attach to slab w/ 0.145" x 1 1/2” powder actuated fasteners spaced on 4” centers (HILTI DS 37 P10 or equal) -OR- (6) 3/8" DIA. 2205 expansion anchors w/ 2 1/2" min. embedment - OR-Simpson "Titen" screws  @ 6" o.c."

I tried: str(ExampleText) but it obviously fails.

Thank you for help!

ps. Here's the error that I get: UnicodeEncodeError: ('unknown', '\x00', 0, 1, '') ps2. I am on IronPython2.7 i know a bummer :-(

7
  • So you have an ExampleText object already? What type is it (print type(ExampleText)) Commented Mar 13, 2015 at 1:02
  • its a string object. When i do ExampleText.GetType() it return System.String Commented Mar 13, 2015 at 1:08
  • also I get this error UnicodeEncodeError: ('unknown', '\x00', 0, 1, '') Commented Mar 13, 2015 at 1:09
  • So the ExampleText object isn't in python, it's in like VBA or something -- but you want to use that value in a python script? What about just enclosing the entire string in single quotes: ExampleText = '"MINIMUM ... o.c."' Commented Mar 13, 2015 at 1:11
  • 1
    There are no single quotes in the string you listed -- if you wrap that string in single quotes, it'll be a start. Commented Mar 13, 2015 at 1:13

3 Answers 3

1

You can use the escape() function from the re package:

>>> import re
>>> re.escape(ExampleText)
    '\\"MINIMUM\\ TRACK\\ FASTENING\\ SHALL\\ BE\\ 0.145\\"\\ DIAMETER ...'
>>> ExampleText = ExampleText.decode('string_escape')
    '"MINIMUM TRACK FASTENING SHALL BE 0.145" DIAMETER ...'

The escape() function will escape all non-alphanumeric characters with their double-backslashed equivalents. This should handle your input string well.

Sign up to request clarification or add additional context in comments.

Comments

1

If the given code precisely matches what you have, it's no wonder it's having problems. You're enclosing it with double quotes, but the string contains double quotes. Left as is, the string will end when the interpreter sees the next double quote, then there will be a bunch of terms it doesn't recognize (like DIAMETER and POWDER), then eventually another string will begin, and so on.

You need to either escape the string's double quotes with a backslash, or enclose the string with three quotes on each side.

ExampleText = "MINIMUM TRACK FASTENING SHALL BE 0.145\" DIAMETER POWDER ACTUATED FASTENERS (P.A.F.S) SPACED ON 8\" CENTERS FOR BEARING WALLS, AND AT 12\" O.C. FOR NON-LOAD BEARING WALLS (U.N.O.), WITH 1 1/2\" MINIMUM PENETRATION INTO CONCRETE. AT X-BRACED SHEAR WALLS, TRACK SHALL BE ATTACHED PER DETAILS.  At Infinity Shear Panels (ISP’S) attach to slab w/ 0.145\" x 1 1/2\" powder actuated fasteners spaced on 4\" centers (HILTI DS 37 P10 or equal) -OR- (6) 3/8\" DIA. 2205 expansion anchors w/ 2 1/2\" min. embedment - OR-Simpson \"Titen\" screws  @ 6\" o.c."

or

ExampleText = """MINIMUM TRACK FASTENING SHALL BE 0.145" DIAMETER POWDER ACTUATED FASTENERS (P.A.F.S) SPACED ON 8" CENTERS FOR BEARING WALLS, AND AT 12" O.C. FOR NON-LOAD BEARING WALLS (U.N.O.), WITH 1 1/2" MINIMUM PENETRATION INTO CONCRETE. AT X-BRACED SHEAR WALLS, TRACK SHALL BE ATTACHED PER DETAILS.  At Infinity Shear Panels (ISP’S) attach to slab w/ 0.145" x 1 1/2” powder actuated fasteners spaced on 4” centers (HILTI DS 37 P10 or equal) -OR- (6) 3/8" DIA. 2205 expansion anchors w/ 2 1/2" min. embedment - OR-Simpson "Titen" screws  @ 6" o.c."""

SO's built-in syntax highlighting indicates that your sample consists of several strings, while mine is one continuous string.

Also, the string contains only forward slashes, no backslashes, so there's no problem there. If there were backslashes and you wanted to resolve that, you would precede the string with an r to denote a raw string: r'hello\nworld prints as hello\nworld. The only thing raw strings can't handle is when the last character in the string is a backslash. Solve that by adding that afterward: r'C:\Users\jsmith' + '\\' or r'C:\Users\jsmith' '\\' (the + isn't strictly necessary when concatenating literal strings).

This is only necessary if you're writing the string into your source code. Strings from external sources like input() or files are processed automatically.

Comments

0

From our conversation in comments

# -*- coding: utf-8 -*-

ExampleText = '"MINIMUM TRACK FASTENING SHALL BE 0.145" DIAMETER POWDER ACTUATED FASTENERS (P.A.F.S) SPACED ON 8" CENTERS FOR BEARING WALLS, AND AT 12" O.C. FOR NON-LOAD BEARING WALLS (U.N.O.), WITH 1 1/2" MINIMUM PENETRATION INTO CONCRETE. AT X-BRACED SHEAR WALLS, TRACK SHALL BE ATTACHED PER DETAILS.  At Infinity Shear Panels (ISP’S) attach to slab w/ 0.145" x 1 1/2” powder actuated fasteners spaced on 4” centers (HILTI DS 37 P10 or equal) -OR- (6) 3/8" DIA. 2205 expansion anchors w/ 2 1/2" min. embedment - OR-Simpson "Titen" screws  @ 6" o.c."'

print(ExampleText)

The coding header line is required since you have non-ascii characters in there.

You could also wrap the literal with ''' or """:

x = '''some string'''
x = """some string"""

Note that a better solution might be to get the string directly from the data instead of copying/pasting into your code using a package like csv.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.