6

I'm trying to read a .txt file in ruby and split the text line-by-line.

Here is my code:

def file_read(filename)
  File.open(filename, 'r').read
end

puts f = file_read('alice_in_wonderland.txt')

This works perfectly. But when I add the method line_cutter like this:

def file_read(filename)
  File.open(filename, 'r').read
end

def line_cutter(file)
  file.scan(/\w/)
end

puts f = line_cutter(file_read('alice_in_wonderland.txt'))

I get an error:

`scan': invalid byte sequence in UTF-8 (ArgumentError)

I found this online for untrusted website and tried to use it for my own code but it's not working. How can I remove this error?

Link to the file: File

7
  • @sawa Sorry, I wanted to know how to remove this error. Commented Mar 18, 2016 at 14:55
  • 1
    The linked article was written in 2006, you are not using Ruby 1.8, are you? Commented Mar 18, 2016 at 14:59
  • @Stefan no I'm using 2.2.1. Thanks Commented Mar 18, 2016 at 15:00
  • 2
    It is better to do File.read(filename). That won't mess with the file kept open. Commented Mar 18, 2016 at 15:01
  • 1
    @sawa done. I'm still getting the same error though. Commented Mar 18, 2016 at 15:08

2 Answers 2

7

The linked text file contains the following line:

Character set encoding: ISO-8859-1

If converting it isn't desired or possible then you have to tell Ruby that this file is ISO-8859-1 encoded. Otherwise the default external encoding is used (UTF-8 in your case). A possible way to do that is:

s = File.read('alice_in_wonderland.txt', encoding: 'ISO-8859-1')
s.encoding  # => #<Encoding:ISO-8859-1>

Or even like this if you prefer your string UTF-8 encoded (see utf8everywhere.org):

s = File.read('alice_in_wonderland.txt', encoding: 'ISO-8859-1:UTF-8')
s.encoding  # => #<Encoding:UTF-8>
Sign up to request clarification or add additional context in comments.

Comments

2

It seems to work if you read the file directly from the page, maybe there's something funny about the local copy you have. Try this:

require 'net/http'

uri = 'http://www.ccs.neu.edu/home/vip/teach/Algorithms/7_hash_RBtree_simpleDS/hw_hash_RBtree/alice_in_wonderland.txt'
scanned = Net::HTTP.get_response(URI.parse(uri)).body.scan(/\w/)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.