I have some problem with UTF-8 conding. I have read some posts here but still it does not work properly somehow.
That is my code:
#!/bin/env ruby
#encoding: utf-8
def determine
file=File.open("/home/lala.txt")
file.each do |line|
puts(line)
type = line.match(/DOG/)
puts('aaaaa')
if type != nil
puts(type[0])
break
end
end
end
That are the first 3 lines of my file :
;?lalalalal60000065535-1362490443-0000006334-0000018467-0000000041en-lalalalallalalalalalalalaln Cell Generation
text/lalalalala1.0.0.1515
text/lalalala�DOG
When I run this code it shows me an error exactly when reading the third line of the file (where the word dog stands):
;?lalalalal60000065535-1362490443-0000006334-0000018467-0000000041en-lalalalallalalalalalalalaln Cell Generation
aaaaa
text/lalalalala1.0.0.1515
aaaaa
text/lalalala�DOG
/home/kik/Desktop/determine2.rb:16:in `match': invalid byte sequence in UTF-8 (ArgumentError)
BUT: if I run just a a determine function with the following content:
#!/bin/env ruby
#encoding: utf-8
def determine
type="text/lalalala�DOG".match(/DOG/)
puts(type)
end
it works perfectly.
What is going wrong there? Thanks in advance!
EDIT: The third line in the file is:
text/lalalal»DOG
BUT when I print the thirf line of the file in ruby it shows up like:
text/lalalala�DOG
EDIT2:
This format was also developed to support localization. Strings stored within the file are stored as 2 byte UNICODE characters.The format of the file is a binary file with data stored in network byte order (big-endian format).
»in a file, in UTF-8 it encodes to bytes[197, 187], not what you got. What you have is probably invalid.