0

I have a bash script what contains several utf-8 string contained variables. These variables are used as parameters of a bash function in the sctript, what calls a cp and a python script with this parameters.

This script runs properly on my machine, but can not work on another one. I tried to debug with set -x and other stuffs, but I can not find the root cause, only this difference.

There is a minimalized example - like Plunker for JS ;)

  1. I have the following test.sh

    #!/bin/bash
    set -x
    
    function aaa() {
        echo "$1"
    }
    echo 'öüóőúéáűíÖÜÓŐÚÉÁŰÍ'
    aaa 'öüóőúéáűíÖÜÓŐÚÉÁŰÍ'
    
  2. I copy to my two hosts

  3. The good shows the following:

    + echo öüóőúéáűíÖÜÓŐÚÉÁŰÍ
    öüóőúéáűíÖÜÓŐÚÉÁŰÍ
    + aaa öüóőúéáűíÖÜÓŐÚÉÁŰÍ
    + echo öüóőúéáűíÖÜÓŐÚÉÁŰÍ
    öüóőúéáűíÖÜÓŐÚÉÁŰÍ
    
  4. However the bad shows this:

    + echo $'\303\266\303\274\303\263\305\221\303\272\303\251\303\241\305\261\303\255\303\226\303\234\303\223\305\220\303\232\303\211\303\201\305\260\303\215'
    öüóőúéáűíÖÜÓŐÚÉÁŰÍ
    + aaa $'\303\266\303\274\303\263\305\221\303\272\303\251\303\241\305\261\303\255\303\226\303\234\303\223\305\220\303\232\303\211\303\201\305\260\303\215'
    + echo $'\303\266\303\274\303\263\305\221\303\272\303\251\303\241\305\261\303\255\303\226\303\234\303\223\305\220\303\232\303\211\303\201\305\260\303\215'
    öüóőúéáűíÖÜÓŐÚÉÁŰÍ
    

Here is some details for debugging:

The good working machine is a Ubuntu Trusty with bash=4.2-2ubuntu2.6, and the bad working machine is a Ubuntu Precise with bash=4.3-7ubuntu1.5.

The locales are identical in both machines:

$ locale                                                                                                                                                                                                                                                           
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE=en_US.UTF-8
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE=en_US.UTF-8
LC_MONETARY=en_US.UTF-8
LC_MESSAGES=POSIX
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=

Updates

  • I was wrong with the cp, sorry.
  • I thought, the python exception is not related in this case, because the it was broken in bash. This backtrace can help anything?

    + /tmp/callrecord-renamer.py --skip --contacts $'/var/datastore/T\303\274nci/Rendszer/DropboxClone/contacts.ini' $'/var/datastore/T\303\274nci/DropboxClone/H\303\215V\303\201SFELV\303\211TELEK'
    Traceback (most recent call last):
      File "/tmp/callrecord-renamer.py", line 316, in <module>
        main()
      File "/tmp/callrecord-renamer.py", line 312, in main
        FileManager(args.recording_path, contacts_path, args.no_change, args.skip_errors).update_files_in_directory()
      File "/tmp/callrecord-renamer.py", line 87, in update_files_in_directory
        self.contacts.load()
      File "/tmp/callrecord-renamer.py", line 56, in load
        self.database.read(self.file_path)
      File "/usr/lib/python3.2/configparser.py", line 689, in read
        self._read(fp, filename)
      File "/usr/lib/python3.2/configparser.py", line 994, in _read
        for lineno, line in enumerate(fp, start=1):
      File "/usr/lib/python3.2/codecs.py", line 300, in decode
        (result, consumed) = self._buffer_decode(data, self.errors, final)
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 3176: invalid start byte
    

For more details, you can check this file on: https://github.com/andras-tim/callrecord-renamer/blob/master/callrecord-renamer.py

Update2

I have checked: this error caused independently from bash code. The .ini file encoding was bad... Sorry for all debugger helpers!

6
  • 1
    I'm not sure that you actually have a problem. The output is correct in both cases; you are just getting a different (but valid) representation in the debugging output on the "bad" host. Commented Oct 22, 2015 at 17:22
  • I have found this article stackoverflow.com/questions/11838597/… - but can't solve my problem... :( Commented Oct 22, 2015 at 17:23
  • @chepner the cp can not find the source path, however this is existing. Commented Oct 22, 2015 at 17:25
  • 1
    If you are having a problem with cp then show us the problem with cp and not some other problem entirely. Commented Oct 22, 2015 at 17:29
  • This doesn't appear to be a shell issue, but a problem with cp on the bad host in dealing with a UTF-8 encoded string. The bad host is just showing the raw UTF-8 stream, rather than displaying the encoded Unicode characters. The data is the same on both machines (\303\266, for example, is in octal. The two bytes are 0xC3 and 0xB6, which is the UTF-8 encoding for U+00F6, ö. Commented Oct 22, 2015 at 17:32

1 Answer 1

2

You are comparing the xtrace debugging output of set -x. You can not and should not expect bash's xtrace output to be in a certain format. If you want a specific format, you need to produce it yourself.

If you look at the non-debug output your script, it's identical on both machines.

Sign up to request clarification or add additional context in comments.

1 Comment

I'm debugging bash codes several years ago, but I can't see this escaping from set -x until now. - I'm blaming the encoding, because now, I have checked on a third machine (this was Precise too), where this script was worked.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.