I'm using a for loop to read a file, but I only want to read specific lines, say line #26 and #30. Is there any built-in feature to achieve this?
-
1Possible dup: stackoverflow.com/questions/620367/…Adam Matan– Adam Matan2010-01-17 17:29:18 +00:00Commented Jan 17, 2010 at 17:29
30 Answers
If the file to read is big, and you don't want to read the whole file in memory at once:
fp = open("file")
for i, line in enumerate(fp):
if i == 25:
# 26th line
elif i == 29:
# 30th line
elif i > 29:
break
fp.close()
Note that i == n-1 for the nth line.
In Python 2.6 or later:
with open("file") as fp:
for i, line in enumerate(fp):
if i == 25:
# 26th line
elif i == 29:
# 30th line
elif i > 29:
break
9 Comments
linecache. Are you sure that enumerate(fp) doesn't do that?enumerate(x) uses x.next, so it doesn't need the entire file in memory.The quick answer:
f=open('filename')
lines=f.readlines()
print lines[25]
print lines[29]
or:
lines=[25, 29]
i=0
f=open('filename')
for line in f:
if i in lines:
print i
i+=1
There is a more elegant solution for extracting many lines: linecache (courtesy of "python: how to jump to a particular line in a huge text file?", a previous stackoverflow.com question).
Quoting the python documentation linked above:
>>> import linecache
>>> linecache.getline('/etc/passwd', 4)
'sys:x:3:3:sys:/dev:/bin/sh\n'
Change the 4 to your desired line number, and you're on. Note that 4 would bring the fifth line as the count is zero-based.
If the file might be very large, and cause problems when read into memory, it might be a good idea to take @Alok's advice and use enumerate().
To Conclude:
- Use
fileobject.readlines()orfor line in fileobjectas a quick solution for small files. - Use
linecachefor a more elegant solution, which will be quite fast for reading many files, possible repeatedly. - Take @Alok's advice and use
enumerate()for files which could be very large, and won't fit into memory. Note that using this method might slow because the file is read sequentially.
5 Comments
linecache module, and looks like it reads the whole file in memory. So, if random access is more important than size optimization, linecache is the best method.linecache now appears to only work for python source fileslinecache.getlines('/etc/passwd')[0:4] to read in the first, second, third and fourth lines.For the sake of offering another solution:
import linecache
linecache.getline('Sample.txt', Number_of_Line)
I hope this is quick and easy :)
5 Comments
''.join(file.readlines()).split('\n'))[5:10] gives you row 6 to 10 for example. Not recommended, as it reads the whole file into memory.A fast and compact approach could be:
def picklines(thefile, whatlines):
return [x for i, x in enumerate(thefile) if i in whatlines]
this accepts any open file-like object thefile (leaving up to the caller whether it should be opened from a disk file, or via e.g a socket, or other file-like stream) and a set of zero-based line indices whatlines, and returns a list, with low memory footprint and reasonable speed. If the number of lines to be returned is huge, you might prefer a generator:
def yieldlines(thefile, whatlines):
return (x for i, x in enumerate(thefile) if i in whatlines)
which is basically only good for looping upon -- note that the only difference comes from using rounded rather than square parentheses in the return statement, making a list comprehension and a generator expression respectively.
Further note that despite the mention of "lines" and "file" these functions are much, much more general -- they'll work on any iterable, be it an open file or any other, returning a list (or generator) of items based on their progressive item-numbers. So, I'd suggest using more appropriately general names;-).
4 Comments
whatlines should be a set, because if i in whatlines will execute faster with a set rather than a (sorted) list. I didn't notice it first and instead devised my own ugly solution with sorted list (where I didn't have to scan a list each time, while if i in whatlines does just that), but difference in performance was negligible (with my data) and this solution is much more elegant.For the sake of completeness, here is one more option.
Let's start with a definition from python docs:
slice An object usually containing a portion of a sequence. A slice is created using the subscript notation, [] with colons between numbers when several are given, such as in variable_name[1:3:5]. The bracket (subscript) notation uses slice objects internally (or in older versions, __getslice__() and __setslice__()).
Though the slice notation is not directly applicable to iterators in general, the itertools package contains a replacement function:
from itertools import islice
# print the 100th line
with open('the_file') as lines:
for line in islice(lines, 99, 100):
print line
# print each third line until 100
with open('the_file') as lines:
for line in islice(lines, 0, 100, 3):
print line
The additional advantage of the function is that it does not read the iterator until the end. So you can do more complex things:
with open('the_file') as lines:
# print the first 100 lines
for line in islice(lines, 100):
print line
# then skip the next 5
for line in islice(lines, 5):
pass
# print the rest
for line in lines:
print line
And to answer the original question:
# how to read lines #26 and #30
In [365]: list(islice(xrange(1,100), 25, 30, 4))
Out[365]: [26, 30]
2 Comments
Reading files is incredible fast. Reading a 100MB file takes less than 0.1 seconds (see my article Reading and Writing Files with Python). Hence you should read it completely and then work with the single lines.
What most answer here do is not wrong, but bad style. Opening files should always be done with with as it makes sure that the file is closed again.
So you should do it like this:
with open("path/to/file.txt") as f:
lines = f.readlines()
print(lines[26]) # or whatever you want to do with this line
print(lines[30]) # or whatever you want to do with this line
Huge files
If you happen to have a huge file and memory consumption is a concern, you can process it line by line:
with open("path/to/file.txt") as f:
for i, line in enumerate(f):
pass # process line i
5 Comments
if you want line 7
line = open("file.txt", "r").readlines()[7]
4 Comments
close() the file when opening it this way?with open("file.txt", "r") as file: line = file.readlines()[7]. But mind that this reads the whole file into memory.Some of these are lovely, but it can be done much more simply:
start = 0 # some starting index
end = 5000 # some ending index
filename = 'test.txt' # some file we want to use
with open(filename) as fh:
data = fin.readlines()[start:end]
print(data)
That will use simply list slicing, it loads the whole file, but most systems will minimise memory usage appropriately, it's faster than most of the methods given above, and works on my 10G+ data files. Good luck!
Comments
You can do a seek() call which positions your read head to a specified byte within the file. This won't help you unless you know exactly how many bytes (characters) are written in the file before the line you want to read. Perhaps your file is strictly formatted (each line is X number of bytes?) or, you could count the number of characters yourself (remember to include invisible characters like line breaks) if you really want the speed boost.
Otherwise, you do have to read every line prior to the line you desire, as per one of the many solutions already proposed here.
Comments
with open("test.txt", "r") as fp:
lines = fp.readlines()
print(lines[3])
test.txt is filename
prints line number four in test.txt
1 Comment
def getitems(iterable, items):
items = list(items) # get a list from any iterable and make our own copy
# since we modify it
if items:
items.sort()
for n, v in enumerate(iterable):
if n == items[0]:
yield v
items.pop(0)
if not items:
break
print list(getitems(open("/usr/share/dict/words"), [25, 29]))
# ['Abelson\n', 'Abernathy\n']
# note that index 25 is the 26th item
1 Comment
How about this:
>>> with open('a', 'r') as fin: lines = fin.readlines()
>>> for i, line in enumerate(lines):
if i > 30: break
if i == 26: dox()
if i == 30: doy()
1 Comment
If you don't mind importing then fileinput does exactly what you need (this is you can read the line number of the current line)
Comments
I prefer this approach because it's more general-purpose, i.e. you can use it on a file, on the result of f.readlines(), on a StringIO object, whatever:
def read_specific_lines(file, lines_to_read):
"""file is any iterable; lines_to_read is an iterable containing int values"""
lines = set(lines_to_read)
last = max(lines)
for n, line in enumerate(file):
if n + 1 in lines:
yield line
if n + 1 > last:
return
>>> with open(r'c:\temp\words.txt') as f:
[s for s in read_specific_lines(f, [1, 2, 3, 1000])]
['A\n', 'a\n', 'aa\n', 'accordant\n']
Comments
Here's my little 2 cents, for what it's worth ;)
def indexLines(filename, lines=[2,4,6,8,10,12,3,5,7,1]):
fp = open(filename, "r")
src = fp.readlines()
data = [(index, line) for index, line in enumerate(src) if index in lines]
fp.close()
return data
# Usage below
filename = "C:\\Your\\Path\\And\\Filename.txt"
for line in indexLines(filename): # using default list, specify your own list of lines otherwise
print "Line: %s\nData: %s\n" % (line[0], line[1])
Comments
Fairly quick and to the point.
To print certain lines in a text file. Create a "lines2print" list and then just print when the enumeration is "in" the lines2print list. To get rid of extra '\n' use line.strip() or line.strip('\n'). I just like "list comprehension" and try to use when I can. I like the "with" method to read text files in order to prevent leaving a file open for any reason.
lines2print = [26,30] # can be a big list and order doesn't matter.
with open("filepath", 'r') as fp:
[print(x.strip()) for ei,x in enumerate(fp) if ei in lines2print]
or if list is small just type in list as a list into the comprehension.
with open("filepath", 'r') as fp:
[print(x.strip()) for ei,x in enumerate(fp) if ei in [26,30]]
Comments
File objects have a .readlines() method which will give you a list of the contents of the file, one line per list item. After that, you can just use normal list slicing techniques.
Comments
file = '/path/to/file_to_be_read.txt'
with open(file) as f:
print f.readlines()[26]
print f.readlines()[30]
Using the with statement, this opens the file, prints lines 26 and 30, then closes the file. Simple!
1 Comment
readlines() the iterator will be exhausted and the second call will either return an empty list or throw an error (can't remember which)To print desired line. To print line above/below required line.
def dline(file,no,add_sub=0):
tf=open(file)
for sno,line in enumerate(tf):
if sno==no-1+add_sub:
print(line)
tf.close()
execute---->dline("D:\dummy.txt",6) i.e dline("file path", line_number, if you want upper line of the searched line give 1 for lower -1 this is optional default value will be taken 0)
Comments
Do Not Use readlines!
My solutition is:
with open(filename) as f:
specify = [26, 30]
results = list(
map(lambda line: line[1],
filter(lambda line: line[0] in specify,
enumerate(f))
)
)
Test as follow for a 6.5G file:
import time
filename = 'a.txt'
start = time.time()
with open(filename, 'w') as f:
for i in range(10_000_000):
f.write(f'{str(i)*100}\n')
end1 = time.time()
with open(filename) as f:
specify = [26, 30]
results = list(
map(lambda line: line[1],
filter(lambda line: line[0] in specify,
enumerate(f))
)
)
end2 = time.time()
print(f'write time: {end1-start}')
print(f'read time: {end2-end1}')
# write time: 14.38945460319519
# read time: 8.380386352539062
Comments
I think this would work
open_file1 = open("E:\\test.txt",'r')
read_it1 = open_file1.read()
myline1 = []
for line1 in read_it1.splitlines():
myline1.append(line1)
print myline1[0]
1 Comment
f = open(filename, 'r')
totalLines = len(f.readlines())
f.close()
f = open(filename, 'r')
lineno = 1
while lineno < totalLines:
line = f.readline()
if lineno == 26:
doLine26Commmand(line)
elif lineno == 30:
doLine30Commmand(line)
lineno += 1
f.close()
3 Comments
Reading from specific line:
n = 4 # for reading from 5th line
with open("write.txt",'r') as t:
for i,line in enumerate(t):
if i >= n: # i == n-1 for nth line
print(line)