Skip to main content
We’ve updated our Terms of Service. A new AI Addendum clarifies how Stack Overflow utilizes AI interactions.

Questions tagged [text-processing]

Manipulation or examining of text by programs, scripts, etc.

Filter by
Sorted by
Tagged with
1 vote
2 answers
127 views

I have a folder with many subfolders full of various Quarto(reg) files & in those files there are links that are located in varying positions in the file lines. UPDATE ON 3 November 2025 in ...
iembry's user avatar
  • 205
7 votes
9 answers
697 views

I've got a text file containing e.g. Success Something Anything Success Somebody Anybody Someone Success (line 8 is deliberately an empty line) and I would like to export every line between the nth ...
Chestal's user avatar
  • 73
4 votes
4 answers
452 views

I would like to be able to find all files in multiple directories whose file names start with the same string, but preferably not if that string is only one word or contains fewer than perhaps 5 ...
EmmaV's user avatar
  • 4,427
3 votes
2 answers
182 views

The Issue I've been parsing a file with sed trying to tweeze out the desired data. This has worked fine for most lines in the file but there appears to be some embedded special characters that are ...
Gandalf's user avatar
  • 33
4 votes
4 answers
478 views

How to remove comments and newline symbols without using two pipes. I have bookmarks.txt file with comments. https://cookies.com # recipes cookbook https://magicwands.com # shopping I can copy link ...
normal_max's user avatar
5 votes
5 answers
365 views

I'm working with several files which come in bundle of four, across groups the bundels have the same number of columns; see below for an example showing the first four rows with header: File1 has ...
Matteo's user avatar
  • 283
2 votes
1 answer
110 views

Today I connected to a long-running process in tmux over ssh for work, to find that the pane the process was running in seems to have started using the wrong character encoding for its output, leading ...
Patronics's user avatar
  • 125
3 votes
1 answer
378 views

I noticed a difference in behavior between an older pcre2grep version (10.22) and a more recent one (10.42), and I am wondering how I can get the old behavior back. Take the following file: aaa bbb ...
ChennyStar's user avatar
  • 2,019
2 votes
1 answer
92 views

System Info alinuxchap@libertus-desktop:/usr/share/X11/xkb $ uname -a Linux libertus-desktop 6.12.25+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.12.25-1+rpt1 (2025-04-30) aarch64 GNU/Linux alinuxchap@...
Signor Pizza's user avatar
6 votes
7 answers
1k views

In Linux there is a file numbers.txt. It contains a few numbers, all separated with space. There are numbers like this: 5476089856 71788143 9999744134 114731731 3179237376 In this example only the ...
Banana's user avatar
  • 241
3 votes
5 answers
716 views

In a certain script that we run routinely we configure hostnames in environment variables. Since hostnames can change overtime, we try to dynamically pick the current set of hosts using linux's ...
y2k-shubham's user avatar
1 vote
7 answers
336 views

What is the correct way to extract paragraphs in this log file using awk: $ cat log.txt par1, line1 par1, line2 par1, line3 par1, line4 par1, line5 par1, last line par2, line1 ...
canupseq's user avatar
  • 1,974
2 votes
5 answers
146 views

I am trying to format and connect git log messages for later processing. I am using git log --pretty=format:'%H %s' to get commit hash and the complete message at the moment. I need commit messages to ...
xerxes's user avatar
  • 359
2 votes
3 answers
224 views

I would like to build a report coming from the output of certain commands. For instance, I have the output of such command: systemctl --type=service --state=running | grep -e cron -e apache2 -e ...
Nick's user avatar
  • 29
1 vote
3 answers
127 views

I have a PDB file (coordinates of atoms in a protein) on a Linux machine: ATOM 1 N GLY A 1 0.535 51.766 5.682 1.00 0.00 ATOM 2 CA GLY A 1 -0.712 50....
Paolo Lorenzini's user avatar
0 votes
5 answers
130 views

MATCH1.MATCH2 { always same MATCH3 } All three MATCH(es) must match. input: foo.bar { always same bus } 1.2 { always same 3 } a.b { always same c } i.ii { always same iii } b.2 { ...
sloppy's user avatar
  • 171
5 votes
6 answers
916 views

Consider this input and output: foo bar baz bar baz How do you achieve with a single AWK? Please explain your approach too. These are a couple tries: $ awk '{ $1 = ""; print(substr($0, 2)) ...
mbigras's user avatar
  • 3,502
1 vote
5 answers
486 views

A huge txt file with 360k lines. Lines needed to be deleted are duplicated in both column 1 (id) and column 2 (nick), but differ in column 3 (category). There're only 2 lines for all duplicates in ...
user729388's user avatar
0 votes
2 answers
134 views

In my Linux Computer there are many files called file1, file2, file3 ... in /dev/mapper/. Now I want to have an overview from the files what cipher is used how often. I tried this for i in /dev/...
user447274's user avatar
9 votes
6 answers
723 views

Regarding this information below: 807:Lipstick:Cosmetics:50:250 808:MixerGrinder:Electronics:10:35000 809:MixerGrinder:Electronics:10:35000 I am expecting to display this information below: 808:...
Ismael Sanchez's user avatar
1 vote
3 answers
104 views

Can anyone help? I've exhausted my knowledge and troubleshooting skills trying to get this working. Here is the example data from "msg": date=2025-03-26 time=12:45:57 devname="this-is-...
user2008555's user avatar
1 vote
2 answers
122 views

Consider a command which takes arguments like this: cmd foo bar baz [arbitrary args...]. How do you build a filter of AND patterns based on those arguments? Something like this pipeline of greps: grep ...
mbigras's user avatar
  • 3,502
0 votes
1 answer
171 views

I'm trying to replace bobearl with jim in the following string "billy" "bobearl" and "johnny" I can do something like this: sed 's/bob/jim/' /tmp/text.txt "billy&...
goswell's user avatar
2 votes
5 answers
710 views

I have a file with a name list as shown below: Ishmael Mark Anton Rajesh Pete I am trying to print something like this: Iae 3 a 1 Ao 2 ae 2 ee 2 I developed this code: cat names.txt | grep -Eo '...
Ismael Sanchez's user avatar
0 votes
0 answers
109 views

Looking for advanced CLI tool/code to determine text Codepage/Language (besides enca). Goal: Automate as much as possible conversion of hundreds/thousands of 8-bit text files (including non-ASCII ...
strider's user avatar
  • 113
0 votes
2 answers
114 views

On Ubuntu 20 server, I have to replace all occurances of the color #640000 with #06172A. I have tried the following commands to replace Go to folder where the relevant files reside: $ cd /path/to/the/...
Lars Bratt's user avatar
9 votes
5 answers
2k views

I have a CSV file and want to run a command for each line, using the fields of the file as separate arguments. For example given the following file: foo,42,red bar,13,blue baz,27,green I want to run ...
luator's user avatar
  • 312
4 votes
3 answers
246 views

I'm dealing with a series of bed files, which look like this: chr1 100 110 0.5 chr1 150 175 0.2 chr1 200 300 1.5 With the columns being chromosome, start, end, score. I have multiple different files ...
Whitehot's user avatar
  • 245
-4 votes
5 answers
196 views

From the script below I need to know the following: EmpNo#Email#Name#JobLevel#Experience 641357#Amrit_Mohanty#Amrit Mohanty#3#2 678522#Puneet_Mishra#Puneet Mishra#3#1 670242#Vikas_Bharti#Vikas Bharti#...
Ismael Sanchez's user avatar
1 vote
3 answers
162 views

Thanks in advance for any ideas you present. My current project has me trying to loop a file containing a list of 1000's of IP addresses through geoiplookup and piping it to sed to delete all lines ...
user avatar
3 votes
5 answers
706 views

A typical latex problem: \SomeStyle{\otherstyle{this is the \textit{nested part} some more text...}} Now I want to remove all \SomeStyle{...} but not the content. Content contains nested braces. The ...
Thierry Blanc's user avatar
6 votes
2 answers
700 views

I acknowledge there are superficially similar questions asked here before, but all of those I've seen are simpler than what I'm trying to achieve. Bash-only solutions are preferred. I have a variable ...
user648855's user avatar
2 votes
2 answers
1k views

On Kubuntu Linux, The Google Chrome browser adds a checksum to the file, preventing simply editing the file by hand. So I'm writing a script to add the checksum. $ cat .config/google-chrome/Default/...
dotancohen's user avatar
  • 16.5k
-2 votes
3 answers
183 views

I try to uncomment specific lines from a file with patterns in oracle linux 8.6 using bash. There are leading white spaces on certain lines where the comments are not removed. I tried to uncomment the ...
Kishan's user avatar
  • 113
-2 votes
3 answers
189 views

In a directory I have a bunch of text files. Some of the files contain double lines with a [tab] char only. I want to find and change these two "tabbed lines" into one line with a new line ...
ludvick's user avatar
  • 21
1 vote
2 answers
363 views

I have a reasonably large personal library with books in various formats. I have tried to organize their metadata, including a text field containing the tables of contents. At the moment I am using ...
Luke's user avatar
  • 13
6 votes
2 answers
389 views

I have a huge JSON object with an array of objects inside it. I have to add key:value pair to a specific object in the array. For example, let the input object is: { "a": { "b&...
Vlado B.'s user avatar
-5 votes
2 answers
103 views

I want to capture an error code (e.g. 502) from a log file. The log file is rollover when it's reached to 100 MB like access.log_126427, access.log_197455, etc. There is no specific pattern of the ...
Ajit Kumar Panda's user avatar
0 votes
1 answer
235 views

I want to apply commands below to all files in a directory instead of one file. cat file.txt | sed -E "s/\@([0-9]+)\W+~(.*?)/\1 \2/g" | tr -d '~' cat file.txt | sed -E "s/\@([0-9]+).*\~...
user1002601's user avatar
1 vote
3 answers
116 views

I have a pretty basic file; 15 Chapter name some text and some more text some text and some more text I was trying to get something like this Book: 15 Chapter name some text and some more text some ...
learningregularexpressions's user avatar
0 votes
2 answers
121 views

My situation is simple : I have an HTML file with several lines containing only the indented <section> block tag, each line followed by an (also indented) <h3 id="YYYY">...</...
sylvansab's user avatar
  • 109
0 votes
2 answers
208 views

This question is closely related to: How to insert text before the first line of a file?. I deliberately made the title similar to that question to highlight this. Except the target file is UTF-8 with ...
Avenger's user avatar
  • 151
1 vote
1 answer
97 views

I have 2 files file1 00:00:00:00:00:01 file2 00:00:00:00:00:02 foo bar 00:00:00:00:00:01 something else What I want to do is compare the two files and remove 00:00:00:00:00:01 from file 2 so I end ...
Lurch's user avatar
  • 125
2 votes
3 answers
115 views

I have a pretty basic text file on a Linux machine that has stuff like Chapters, Dialogues and References. This is what it looks like Chapter: 1 One: Birds and Trees Birds are beautiful and trees ...
learningregularexpressions's user avatar
1 vote
8 answers
229 views

My input file: 1oo+457864227yexaloo+6784536pkp8907654 2oo+499004227yexaloo+69008908pkp8907654 3oo+648968976yexaloo+53589094pkp8907654 4oo+490764578yexaloo+6784536pkp8907654 I want to find out the ...
sre's user avatar
  • 11
2 votes
7 answers
823 views

We have requirement to normalize the data ... Item field is comma delimited and irregular and it may have any items from 0 to max (lets say 100) Input: key1|desc field|item1,item2,item3,item4|extra ...
Sanjay Dubey's user avatar
3 votes
3 answers
486 views

I have a large file with the following format tab-separated: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT recombination chr1 586001 >63041388>63041391 G ...
Matteo's user avatar
  • 283
-1 votes
2 answers
87 views

In linux, how can we use grep command to print the contents that comes inside this tag? <errorPayload>XXXXXXXX</errorPayload> I tried grep -Po '<errorPayload>' abc.log, but it only ...
Prateek's user avatar
  • 101
0 votes
2 answers
139 views

Let's say I have a program blackbox, and a file with the following contents: in this file this line contains =TAG= so does =TAG= this one as =TAG= does this other line this line does ...
wobtax's user avatar
  • 1,193
1 vote
5 answers
123 views

I have an RTF file that contains a list of pdf file paths. Such as Category1: ./Folder1/Folder2/1.pdf:18 ./Folder3/2.pdf:18 ./Folder5/4.pdf:10 Category2: ./Folder3/2.pdf:18 ./Folder5/4.pdf:10 ...
Ronnie's user avatar
  • 29

1
2 3 4 5
171