I have a bash script what contains several utf-8 string contained variables. These variables are used as parameters of a bash function in the sctript, what calls a cp and a python script with this parameters.
This script runs properly on my machine, but can not work on another one. I tried to debug with set -x and other stuffs, but I can not find the root cause, only this difference.
There is a minimalized example - like Plunker for JS ;)
I have the following
test.sh#!/bin/bash set -x function aaa() { echo "$1" } echo 'öüóőúéáűíÖÜÓŐÚÉÁŰÍ' aaa 'öüóőúéáűíÖÜÓŐÚÉÁŰÍ'I copy to my two hosts
The good shows the following:
+ echo öüóőúéáűíÖÜÓŐÚÉÁŰÍ öüóőúéáűíÖÜÓŐÚÉÁŰÍ + aaa öüóőúéáűíÖÜÓŐÚÉÁŰÍ + echo öüóőúéáűíÖÜÓŐÚÉÁŰÍ öüóőúéáűíÖÜÓŐÚÉÁŰÍHowever the bad shows this:
+ echo $'\303\266\303\274\303\263\305\221\303\272\303\251\303\241\305\261\303\255\303\226\303\234\303\223\305\220\303\232\303\211\303\201\305\260\303\215' öüóőúéáűíÖÜÓŐÚÉÁŰÍ + aaa $'\303\266\303\274\303\263\305\221\303\272\303\251\303\241\305\261\303\255\303\226\303\234\303\223\305\220\303\232\303\211\303\201\305\260\303\215' + echo $'\303\266\303\274\303\263\305\221\303\272\303\251\303\241\305\261\303\255\303\226\303\234\303\223\305\220\303\232\303\211\303\201\305\260\303\215' öüóőúéáűíÖÜÓŐÚÉÁŰÍ
Here is some details for debugging:
The good working machine is a Ubuntu Trusty with bash=4.2-2ubuntu2.6, and the bad working machine is a Ubuntu Precise with bash=4.3-7ubuntu1.5.
The locales are identical in both machines:
$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE=en_US.UTF-8
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE=en_US.UTF-8
LC_MONETARY=en_US.UTF-8
LC_MESSAGES=POSIX
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=
Updates
- I was wrong with the
cp, sorry. I thought, the python exception is not related in this case, because the it was broken in bash. This backtrace can help anything?
+ /tmp/callrecord-renamer.py --skip --contacts $'/var/datastore/T\303\274nci/Rendszer/DropboxClone/contacts.ini' $'/var/datastore/T\303\274nci/DropboxClone/H\303\215V\303\201SFELV\303\211TELEK' Traceback (most recent call last): File "/tmp/callrecord-renamer.py", line 316, in <module> main() File "/tmp/callrecord-renamer.py", line 312, in main FileManager(args.recording_path, contacts_path, args.no_change, args.skip_errors).update_files_in_directory() File "/tmp/callrecord-renamer.py", line 87, in update_files_in_directory self.contacts.load() File "/tmp/callrecord-renamer.py", line 56, in load self.database.read(self.file_path) File "/usr/lib/python3.2/configparser.py", line 689, in read self._read(fp, filename) File "/usr/lib/python3.2/configparser.py", line 994, in _read for lineno, line in enumerate(fp, start=1): File "/usr/lib/python3.2/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 3176: invalid start byte
For more details, you can check this file on: https://github.com/andras-tim/callrecord-renamer/blob/master/callrecord-renamer.py
Update2
I have checked: this error caused independently from bash code. The .ini file encoding was bad... Sorry for all debugger helpers!
cpcan not find the source path, however this is existing.cpthen show us the problem withcpand not some other problem entirely.cpon the bad host in dealing with a UTF-8 encoded string. The bad host is just showing the raw UTF-8 stream, rather than displaying the encoded Unicode characters. The data is the same on both machines (\303\266, for example, is in octal. The two bytes are 0xC3 and 0xB6, which is the UTF-8 encoding for U+00F6, ö.