Text manipulation
Sorting strings: sort
If you have a .txt
file containing text strings, each on a new line you can use the sort function to quickly put them in alphabetical order:
sort file.txt
Note that this will not save the sort, it only presents it as a standard output. To save the sort you need to direct the sort to a file in the standard way:
sort file.txt > output.txt
Options
-r
- reverse sort
c
- check if file is already sorted. If not, it will highlight the strings which are not sorted
Find and replace: sed
The sed
programme can be used to implement find and replace procedures. In sed
, find and replace are covered by the substitution option: /s
:
sed ‘s/word/replacement word/’ file.txt
This however will only change the first instance of word to be replaced, in order to apply to every instance you need to add the global option: -g
.
As sed is a stream editor, any changes you make using it, will only occur within the standard output , they will not be saved to file. In order to save to file you need to specify a new file output (using > output.txt
) in addition to the original file. This hasthe benefit of leaving the original file untouched whilst ensuring the desired outcome is stored permanently.
Alternatively, you can use the -i
option which will make the changes take place in the source file as well as in standard input.
Note that this will overwrite the original version of the file and it cannot be regained. If this is an issue then it is recommended to include a backup command in the overall argument like so:
sed -i.bak ‘s/word/replacement word/’ file.txt
This will create the file file.txt.bak
in the directory you are working within which is the original file before the replacement was carried out.
Remove duplicates
We can use the sort -u
command can be used to remove duplicates:
sort -u file.txt
It is important to sort before attempting to remove duplicates since the -u
flag works on the basis of the strings being adjacent.
Split a large file into multiple smaller files: split
Suppose you have a file containing 1000 lines. You want to break the file up into five separate files, each containing two hundred lines. You can use split
to accomplish this, like so:
split -l 200 big-file.txt new-files
split
will categorise the resulting five files as follows:
- new-file-aa,
- new-file-ab
- new-file-ac,
- newfile-ad,
- new-file-ae.
If you would rather have numeric suffixes, use the option -d
. You can also split a file by its number of bytes, using the option -b
and specifying a constituent file size.
Merge multiple files into one with cat
We can use cat
read multiple files at once and then append a redirect to save them to a file:
cat file_a.txt file_b.txt file_c.txt > merged-file.txt
Count lines, words, etc: wc
To count words:
wc file.txt
When we use the command three numbers are outputted, in order: lines, words, bytes.
You can use modifiers to get just one of the numbers: -l
, -w
, -b
.