Questions tagged [text-processing]
Manipulation or examining of text by programs, scripts, etc.
8,526 questions
1
vote
2
answers
127
views
find awk grep - search and replace & passing modified contents to awk to overwrite the existing file
I have a folder with many subfolders full of various Quarto(reg) files & in those files there are links that are located in varying positions in the file lines.
UPDATE ON 3 November 2025 in ...
7
votes
9
answers
697
views
How to get every lines between nth and (n+1)th match of grep in text file
I've got a text file containing e.g.
Success
Something
Anything
Success
Somebody
Anybody
Someone
Success
(line 8 is deliberately an empty line) and I would like to export every line between the nth ...
4
votes
4
answers
452
views
How can I find common prefixes in file names to group them?
I would like to be able to find all files in multiple directories whose file names start with the same string, but preferably not if that string is only one word or contains fewer than perhaps 5 ...
3
votes
2
answers
182
views
Embedded special characters skewing sed output
The Issue
I've been parsing a file with sed trying to tweeze out the desired data. This has worked fine for most lines in the file but there appears to be some embedded special characters that are ...
4
votes
4
answers
478
views
Remove new lines and everything after comment symbol with awk or sed
How to remove comments and newline symbols without using two pipes.
I have bookmarks.txt file with comments.
https://cookies.com # recipes cookbook
https://magicwands.com # shopping
I can copy link ...
5
votes
5
answers
365
views
Compare files and combine rows with matching values based on last column
I'm working with several files which come in bundle of four, across groups the bundels have the same number of columns; see below for an example showing the first four rows with header:
File1 has ...
2
votes
1
answer
110
views
Tmux pane with long-running session using wrong character set?
Today I connected to a long-running process in tmux over ssh for work, to find that the pane the process was running in seems to have started using the wrong character encoding for its output, leading ...
3
votes
1
answer
378
views
How to do non-greedy multiline capture with recent versions of pcre2grep?
I noticed a difference in behavior between an older pcre2grep version (10.22) and a more recent one (10.42), and I am wondering how I can get the old behavior back.
Take the following file:
aaa
bbb
...
2
votes
1
answer
92
views
Redirect `rtf` output to file
System Info
alinuxchap@libertus-desktop:/usr/share/X11/xkb $ uname -a
Linux libertus-desktop 6.12.25+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.12.25-1+rpt1 (2025-04-30) aarch64 GNU/Linux
alinuxchap@...
6
votes
7
answers
1k
views
How to find numbers in a textfile that are not divisible by 4096, round them up and write new file?
In Linux there is a file numbers.txt.
It contains a few numbers, all separated with space.
There are numbers like this: 5476089856 71788143 9999744134 114731731 3179237376
In this example only the ...
3
votes
5
answers
716
views
Randomly pick single line from multiple lines while assigning value to environment variable
In a certain script that we run routinely we configure hostnames in environment variables. Since hostnames can change overtime, we try to dynamically pick the current set of hosts using linux's ...
1
vote
7
answers
336
views
Extracting paragraphs with awk
What is the correct way to extract paragraphs in this log file using awk:
$ cat log.txt
par1, line1
par1, line2
par1, line3
par1, line4
par1, line5
par1, last line
par2, line1
...
2
votes
5
answers
146
views
formatting git log messages for later processing
I am trying to format and connect git log messages for later processing.
I am using git log --pretty=format:'%H %s' to get commit hash and the complete message at the moment.
I need commit messages to ...
2
votes
3
answers
224
views
How to extract specific fields from systemctl output for a custom report
I would like to build a report coming from the output of certain commands.
For instance, I have the output of such command:
systemctl --type=service --state=running |
grep -e cron -e apache2 -e ...
1
vote
3
answers
127
views
edit all the values in a specific column based on row numbers range
I have a PDB file (coordinates of atoms in a protein) on a Linux machine:
ATOM 1 N GLY A 1 0.535 51.766 5.682 1.00 0.00
ATOM 2 CA GLY A 1 -0.712 50....
0
votes
5
answers
130
views
Match multiple vars across two lines and delete entire entry
MATCH1.MATCH2 {
always same MATCH3
}
All three MATCH(es) must match.
input:
foo.bar {
always same bus
}
1.2 {
always same 3
}
a.b {
always same c
}
i.ii {
always same iii
}
b.2 {
...
5
votes
6
answers
916
views
Remove the first field (and leading spaces) with a single AWK
Consider this input and output:
foo bar baz
bar baz
How do you achieve with a single AWK? Please explain your approach too.
These are a couple tries:
$ awk '{ $1 = ""; print(substr($0, 2)) ...
1
vote
5
answers
486
views
How to remove every first duplicate line in a column from mac terminal?
A huge txt file with 360k lines. Lines needed to be deleted are duplicated in both column 1 (id) and column 2 (nick), but differ in column 3 (category). There're only 2 lines for all duplicates in ...
0
votes
2
answers
134
views
List and count ciphers used by cryptsetup in /dev/mapper devices
In my Linux Computer there are many files called file1, file2, file3 ... in /dev/mapper/.
Now I want to have an overview from the files what cipher is used how often.
I tried this
for i in /dev/...
9
votes
6
answers
723
views
How to display duplicate lines with different first field
Regarding this information below:
807:Lipstick:Cosmetics:50:250
808:MixerGrinder:Electronics:10:35000
809:MixerGrinder:Electronics:10:35000
I am expecting to display this information below:
808:...
1
vote
3
answers
104
views
Extracting "devname" from log message with re_extract
Can anyone help? I've exhausted my knowledge and troubleshooting skills trying to get this working.
Here is the example data from "msg":
date=2025-03-26 time=12:45:57 devname="this-is-...
1
vote
2
answers
122
views
Filter for arbitrary AND patterns [duplicate]
Consider a command which takes arguments like this: cmd foo bar baz [arbitrary args...]. How do you build a filter of AND patterns based on those arguments?
Something like this pipeline of greps:
grep ...
0
votes
1
answer
171
views
Use sed to replace only part of a string
I'm trying to replace bobearl with jim in the following string
"billy" "bobearl" and "johnny"
I can do something like this:
sed 's/bob/jim/' /tmp/text.txt
"billy&...
2
votes
5
answers
710
views
How to display and count vowels in file
I have a file with a name list as shown below:
Ishmael
Mark
Anton
Rajesh
Pete
I am trying to print something like this:
Iae 3
a 1
Ao 2
ae 2
ee 2
I developed this code:
cat names.txt | grep -Eo '...
0
votes
0
answers
109
views
Advanced CLI tool/code to determine text encoding (besides enca)
Looking for advanced CLI tool/code to determine text Codepage/Language (besides enca).
Goal: Automate as much as possible conversion of hundreds/thousands of 8-bit text files (including non-ASCII ...
0
votes
2
answers
114
views
On Ubuntu 20 server, I must replace all occurances of the color #640000 with #06172A
On Ubuntu 20 server, I have to replace all occurances of the color #640000 with #06172A. I have tried the following commands to replace
Go to folder where the relevant files reside:
$ cd /path/to/the/...
9
votes
5
answers
2k
views
Run command on each line of CSV file, using fields in different places of the command
I have a CSV file and want to run a command for each line, using the fields of the file as separate arguments.
For example given the following file:
foo,42,red
bar,13,blue
baz,27,green
I want to run ...
4
votes
3
answers
246
views
Add columns from variable number of files to base file
I'm dealing with a series of bed files, which look like this:
chr1 100 110 0.5
chr1 150 175 0.2
chr1 200 300 1.5
With the columns being chromosome, start, end, score. I have multiple different files ...
-4
votes
5
answers
196
views
Command to display all the employees whose first name have more than 6 characters
From the script below I need to know the following:
EmpNo#Email#Name#JobLevel#Experience
641357#Amrit_Mohanty#Amrit Mohanty#3#2
678522#Puneet_Mishra#Puneet Mishra#3#1
670242#Vikas_Bharti#Vikas Bharti#...
1
vote
3
answers
162
views
Loop ip list through geoiplookup and delete lines that do not match criteria
Thanks in advance for any ideas you present.
My current project has me trying to loop a file containing a list of 1000's of IP addresses through geoiplookup and piping it to sed to delete all lines ...
3
votes
5
answers
706
views
removing braces statements containing nested braces inside
A typical latex problem:
\SomeStyle{\otherstyle{this is the \textit{nested part} some more text...}}
Now I want to remove all \SomeStyle{...} but not the content. Content contains nested braces. The ...
6
votes
2
answers
700
views
How can I extract quoted strings within a variable?
I acknowledge there are superficially similar questions asked here before, but all of those I've seen are simpler than what I'm trying to achieve. Bash-only solutions are preferred.
I have a variable ...
2
votes
2
answers
1k
views
Why is the file changing before being written to?
On Kubuntu Linux, The Google Chrome browser adds a checksum to the file, preventing simply editing the file by hand. So I'm writing a script to add the checksum.
$ cat .config/google-chrome/Default/...
-2
votes
3
answers
183
views
Bash script to uncomment lines with leading spaces on a file with specific pattern
I try to uncomment specific lines from a file with patterns in oracle linux 8.6 using bash. There are leading white spaces on certain lines where the comments are not removed. I tried to uncomment the ...
-2
votes
3
answers
189
views
How to replace two lines containing [tab] chars into one line with just [newline] char, using a bash script?
In a directory I have a bunch of text files. Some of the files contain double lines with a [tab] char only. I want to find and change these two "tabbed lines" into one line with a new line ...
1
vote
2
answers
363
views
Extracting table of contents from PDFs
I have a reasonably large personal library with books in various formats. I have tried to organize their metadata, including a text field containing the tables of contents. At the moment I am using ...
6
votes
2
answers
389
views
Update object inside array inside another JSON object
I have a huge JSON object with an array of objects inside it. I have to add key:value pair to a specific object in the array. For example, let the input object is:
{
"a": {
"b&...
-5
votes
2
answers
103
views
How to count the no of occurrences of a particular string in a latest log file to read last 5 min data in linux [closed]
I want to capture an error code (e.g. 502) from a log file.
The log file is rollover when it's reached to 100 MB like access.log_126427, access.log_197455, etc. There is no specific pattern of the ...
0
votes
1
answer
235
views
Find all files in directory and apply commands to each of them
I want to apply commands below to all files in a directory instead of one file.
cat file.txt | sed -E "s/\@([0-9]+)\W+~(.*?)/\1 \2/g" | tr -d '~'
cat file.txt | sed -E "s/\@([0-9]+).*\~...
1
vote
3
answers
116
views
How do I merge bottom line with previous line? [duplicate]
I have a pretty basic file;
15
Chapter name
some text and some more text
some text and some more text
I was trying to get something like this
Book: 15 Chapter name
some text and some more text
some ...
0
votes
2
answers
121
views
BSD sed/awk moving portion of line to line above (switching attribute in HTML file)
My situation is simple : I have an HTML file with several lines containing only the indented <section> block tag, each line followed by an (also indented) <h3 id="YYYY">...</...
0
votes
2
answers
208
views
How to insert text before the first line of an UTF-8 with BOM file
This question is closely related to: How to insert text before the first line of a file?. I deliberately made the title similar to that question to highlight this.
Except the target file is UTF-8 with ...
1
vote
1
answer
97
views
Delete lines containing partial string match
I have 2 files
file1
00:00:00:00:00:01
file2
00:00:00:00:00:02 foo bar
00:00:00:00:00:01 something else
What I want to do is compare the two files and remove 00:00:00:00:00:01 from file 2 so I end ...
2
votes
3
answers
115
views
Printing a specific section everytime search results are matched
I have a pretty basic text file on a Linux machine that has stuff like Chapters, Dialogues and References.
This is what it looks like
Chapter: 1 One: Birds and Trees
Birds are beautiful and trees ...
1
vote
8
answers
229
views
linux shell script to remove 1 char in a particular field in file having lines of around 3000
My input file:
1oo+457864227yexaloo+6784536pkp8907654
2oo+499004227yexaloo+69008908pkp8907654
3oo+648968976yexaloo+53589094pkp8907654
4oo+490764578yexaloo+6784536pkp8907654
I want to find out the ...
2
votes
7
answers
823
views
Shell Script to Normalize the data
We have requirement to normalize the data ... Item field is comma delimited and irregular and it may have any items from 0 to max (lets say 100)
Input:
key1|desc field|item1,item2,item3,item4|extra ...
3
votes
3
answers
486
views
duplicate columns with AWK and separate them by tab
I have a large file with the following format tab-separated:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT recombination
chr1 586001 >63041388>63041391 G ...
-1
votes
2
answers
87
views
How to print the content inside the tag using grep command? [closed]
In linux, how can we use grep command to print the contents that comes inside this tag?
<errorPayload>XXXXXXXX</errorPayload>
I tried grep -Po '<errorPayload>' abc.log, but it only ...
0
votes
2
answers
139
views
Run program only on matching lines
Let's say I have a program blackbox, and a file with the following contents:
in this file
this line contains =TAG=
so does =TAG= this one
as =TAG= does this other line
this line does ...
1
vote
5
answers
123
views
In a list of file paths in an RTF file, count and sort output based on number of occurences of each file name
I have an RTF file that contains a list of pdf file paths. Such as
Category1:
./Folder1/Folder2/1.pdf:18
./Folder3/2.pdf:18
./Folder5/4.pdf:10
Category2:
./Folder3/2.pdf:18
./Folder5/4.pdf:10
...