Ruby Invalid Byte Sequence in UTF-8

Question

I have the following code, which gives me an invalid byte sequence error pointing to the scan method in initialize. Any ideas on how to fix this? For what it's worth, the error does not occur when the (.*) between the h1 tag and the closing > is not there.

#!/usr/bin/env ruby

class NewsParser

  def initialize
      Dir.glob("./**/index.htm") do |file|
        @file = IO.read file 
        parsed = @file.scan(/<h1(.*)>(.*?)<\/h1>(.*)<!-- InstanceEndEditable -->/im)
        self.write(parsed)
      end
  end

  def write output
    @contents = output
    open('output.txt', 'a') do |f| 
      f << @contents[0][0]+"\n\n"+@contents[0][1]+"\n\n\n\n" 
    end
  end

end

p = NewsParser.new

Edit: Here is the error message:

news_parser.rb:10:in 'scan': invalid byte sequence in UTF-8 (ArgumentError)

SOLVED: The combination of using: @file = IO.read(file).force_encoding("ISO-8859-1").encode("utf-8", replace: nil) and encoding: UTF-8 solve the issue.

Thanks!

possible duplicate of ruby 1.9: invalid byte sequence in UTF-8 — AShelly
– AShelly, Commented Mar 7, 2012 at 19:17
@file = IO.read(file).force_encoding("ISO-8859-1").encode("utf-8", replace: nil) ? — fl00r
– fl00r, Commented Mar 7, 2012 at 19:21

redgem · Accepted Answer · 2012-03-08 02:18:12Z

41

The combination of using: @file = IO.read(file).force_encoding("ISO-8859-1").encode("utf-8", replace: nil) and #encoding: UTF-8 solved the issue.

answered Mar 8, 2012 at 2:18

redgem

1,4634 gold badges15 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

BenKoshy Jan 7 at 5:03

Thanks for the answer. An explanation of what it means would be helpful nonetheless.

FriendFX · Accepted Answer · 2021-09-14 00:56:04Z

1

While this question already has an accepted answer, I found it while having the same problem with a different style of opening the file:

File.open(file_name).each_with_index do |line, index|
  line.gsub!(/[{}]/, "'")
  puts "#{index} #{line}"
end

I found that my input file was encoded in ISO-8859-1, so I changed it to the following to avoid the error:

File.open(file_name, 'r:ISO-8859-1:utf-8').each_with_index do |line, index|
  line.gsub!(/[{}]/, "'")
  puts "#{index} #{line}"
end

See the documentation for the optional mode argument of the File.open method for more details.

answered Sep 14, 2021 at 0:56

FriendFX

3,0892 gold badges38 silver badges72 bronze badges

Collectives™ on Stack Overflow

Ruby Invalid Byte Sequence in UTF-8

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related