I have a pretty large logfile (1.3G) which is more or less tabular with spaces padding the individual fields. I also have a list of indices in that file where I would like to overwrite a field's contents with some other string. For the sake of simplicity, assume that the field is previously empty and my replacement is not too long for the cell, so I only have to replace len(replacement) characters starting at index with replacement.
The number of replacements I need to perform in that file is around 10'000. How do I do this efficiently? Does Python have a data structure like a C array where I can just overwrite data?
bytearray, see stackoverflow.com/questions/10572624/mutable-strings-in-python (that would answer your literal question). However, it might be more "efficient" (depending on how you define that) to not load the entire file into memory but to read and write it line by line.AAAAA BBBBB CCCCC\nAAAAA - DDDDD\nAAAAA EEEEE - \nand so on. Put simply, now have a list of all indices of-in the string (in reality it's not a simple dash but a complex regular expression with lookarounds) and want to pasteXXnnnin the place of the dash, overwriting as many spaces that follow the dash as necessary.