Linux系统文本处理

Linux系统文本处理工具

简介：

各种文本工具来查看、分析、统计文本文件；

grep：按关键字查找内容；

正则表达式

扩展正则表达式

Sed

文件截取查看：head 和 tail

1、cat命令：文件查看

cat [OPTION]… [FILE]…

-E：显示行结束符$；

-n：对显示出的每一行进行编号；

-A：显示所有控制符；

-b：显示出行号，空行除外；

-s：压缩；连续的空行成一行；

2、分页查看文件内容：

more分页查看文件：

more [-dlfpcsu] [-num] [+/pattern] [+linenum] [file …]

-d选项：提示space键翻页和退出提示；

按enter键向下逐行滚动查看；

按空格键可以向下翻一屏；

按b键向上翻一屏；

按q键退出并返回到原来的命令环境；

！Command可以直接执行命令，不用退出；

less 查看文件或stdin输出

按enter键向下逐行滚动查看；

按空格键可以向下翻一屏；

按b键向上翻一屏；

按q键退出并返回到原来的命令环境；

！Command可以直接执行命令，不用退出；

/ 文件内容查找；n向下查找，N向上查找；

注：man命令就是使用less命令分页查看器；

3、显示文本前后行内容：

head命令：查看文件开头行内容；

head [OPTION]… [FILE]…

-c#：指定获取前#字节；

-n#：指定获取前#行；

-#：指定行数；

默认显示前10行；

tail命令：查看文件末尾行内容：

-c#：指定获取后#字节；

-n#：指定获取后#行；

-#：指定行数；

默认显示后10行；

-f：用于跟踪日志文件末尾的内容变化；

logger：触发日志生成：

例：

logger “this is a test log” （这样就会生成日志）

只查看最新一条日志，不影响正常工作，放置在后台工作：

[root@centos6 ~]# tail -n 0 -f /var/log/messages &

[1] 4236

[root@centos6 ~]# logger "this is a test log"

Aug 7 12:19:09 centos6 root: this is a test log

[root@centos6 ~]#

fg命令：把后台执行的命令调到前台工作，ctrl+c可以终止命令运行；

4、cut按例抽取文本内容：

cut OPTION… [FILE]…

-d：指明分隔符，默认是tab；

-c：按字符切割；

-f：指明要切割的列：

#：第#列;

#,#,#：离散的多列，例如1,3,6；

#-#：连续的多列，例如 1-6；

混合使用：1-3，5,7；

–output-delimter=sting：指定输出分隔符；

例：

[root@centos6 testdir]# cat passwd | tail -n 3 | cut -d: -f 1,7

pulse:/sbin/nologin

sshd:/sbin/nologin

tcpdump:/sbin/nologin

[root@centos6 testdir]#

按字符切割：

[root@centos6 testdir]# cat passwd | tail -n 1 | cut -c 1-7

tcpdump

[root@centos6 testdir]#

按字符切割取出磁盘使用率列：

[root@centos6 testdir]# df | cut -c 44-46

Use

[root@centos6 testdir]#

或

[root@centos6 testdir]# df | tr -s " " | tr -t " " ":"|cut -d: -f 5 |tr -d "%"

Use

[root@centos6 testdir]#

取出ipconfig中的IP地址：

[root@centos6 testdir]# ifconfig |head -2 | cut -d: -f 2 | tr -d "[[:alpha:]]" | tail -1

192.168.3.3

[root@centos6 testdir]#

5、paste合并两个文件同行号的列到一行：

paste [OPTION]… [FILE]…

-d：指定分隔符，默认tab；

-s：所有行合并成一行显示；

例：

paste命令默认：

[root@centos6 testdir]# paste aa f1

CentOS release 6.8 (Final) CentOS release 6.8 (Final)

Kernel \r on an \m Kernel \r on an \m

[root@centos6 testdir]#

paste命令：结合-d选项：

[root@centos6 testdir]# paste -d: aa f1

CentOS release 6.8 (Final):CentOS release 6.8 (Final)

Kernel \r on an \m:Kernel \r on an \m

[root@centos6 testdir]#

paste命令：结合-s选项

[root@centos6 testdir]# paste -s aa f1

CentOS release 6.8 (Final) Kernel \r on an \m

[root@centos6 testdir]#

6、wc命令：统计文本数据：

wc [OPTION]… [FILE]…

wc [OPTION]… –files0-from=F

-l：统计文件的行数；

-w：统计文件的单词数；

-c：统计文件的字节数；

-m：统计文件的字符数；

注：默认不加选项时统计文件的行数、单词数、字符数；

例：

[root@centos6 testdir]# cat aa

CentOS release 6.8 (Final)

Kernel \r on an \m

[root@centos6 testdir]# wc -l aa

3 aa

[root@centos6 testdir]# wc -w aa

9 aa

[root@centos6 testdir]# wc -c aa

47 aa

[root@centos6 testdir]# wc -m aa

47 aa

[root@centos6 testdir]#

7、sort命令：文本排序：

sort [OPTION]… [FILE]…

-r：执行反方向排序整理；

-n：执行按数字大小排序；

-f：选项忽略字符串中的字符大小写；

-u：选项删除输出中重复的行；

-t：指定排序时所用的排序分隔符；

-k：指定排序时所依照的列；

例：

sort -n：以数字执行正向排序：

[root@centos6 testdir]# cat passwd | sort -t: -k3 -n

aaaaaa

AAAAAA

tcpdump:x:72:72::/:/sbin/nologin

sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin

pulse:x:497:495:PulseAudio System Daemon:/var/run/pulse:/sbin/nologin

[root@centos6 testdir]#

sort -r：执行反向排序：

[root@centos6 testdir]# cat passwd | sort -t: -k3 -nr

pulse:x:497:495:PulseAudio System Daemon:/var/run/pulse:/sbin/nologin

sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin

tcpdump:x:72:72::/:/sbin/nologin

AAAAAA

aaaaaa

[root@centos6 testdir]#

sort -u：删除输出中重复的行：

[root@centos6 testdir]# cat passwd | sort -u

aaaaaa

AAAAAA

tcpdump:x:72:72::/:/sbin/nologin

sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin

pulse:x:497:495:PulseAudio System Daemon:/var/run/pulse:/sbin/nologin

[root@centos6 testdir]#

sort -f ：忽略字符串中字符的大小写

[root@centos6 testdir]# cat passwd | sort -uf

aaaaaa

tcpdump:x:72:72::/:/sbin/nologin

sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin

pulse:x:497:495:PulseAudio System Daemon:/var/run/pulse:/sbin/nologin

[root@centos6 testdir]#

从ifconfig中取出所有IPv4地址：

[root@centos6 testdir]# ifconfig | tr -c "[[:digit:]]." "\n"| sort -u -t. -k3|tail -5

255.0.0.0

127.0.0.1

255.255.255.0

192.168.3.255

192.168.3.3

[root@centos6 testdir]#

8、uniq命令：从输出中删除重复前后相接的行

uniq [OPTION]… [INPUT [OUTPUT]]

-c：显示每行重复出现的次数；

-d：仅显示重复过的行；

-u：仅显示不曾重复的行；

注：uniq命令常和sort命令一起配合使用；

例：

查找出/etc/init.d/functions文件中重复次数最多的字符：

…………………….

75 pid

77 then

83 if

333

[root@centos6 testdir]#

查找出远程连接本机次数最多的IP：

1 96.7.54.187

4 192.168.3.4

[root@centos6 testdir]#

9、diff命令：比较两个文件之间的区别：

diff [OPTION]… FILES

-u：详细的显示出两个文件的不同之处；

例：

[root@centos6 testdir]# diff f1 f2

2c2,3

< wwww

—

> ddddddddddd

> dddddddddddss

3a5

> dddddddddddssfcccccf

[root@centos6 testdir]#

patch命令：复制在其它文件中进行的改变

patch [options] [originalfile [patchfile]]

-b：用来自动备份改变了的文件；

例：

模仿我们误把f1文件删除了，利用patch命令找回f1文件；

但这样恢复回来的f1文件有个问题，patch命令会把恢复回来的f1文件命名成f2，而把原来的f2文件命名f2.orig；如需改名可以使用mv命令；

[root@centos6 testdir]# echo aaaaaaaaaaa > f1

[root@centos6 testdir]# echo aaaaaaaaaaa > f2

[root@centos6 testdir]# echo bbbbbbbbbbb >> f2

[root@centos6 testdir]# diff -u f1 f2 > diff.log

[root@centos6 testdir]# rm -rf f1

[root@centos6 testdir]# patch -b f2 diff.log

patching file f2

Reversed (or previously applied) patch detected! Assume -R? [n] y

[root@centos6 testdir]# ll

total 12

-rw-r–r–. 1 root root 126 Aug 7 23:13 diff.log

-rw-r–r–. 1 root root 12 Aug 7 23:13 f2

-rw-r–r–. 1 root root 24 Aug 7 23:13 f2.orig

[root@centos6 testdir]# cat f2

aaaaaaaaaaa

[root@centos6 testdir]# cat f2.orig

aaaaaaaaaaa

bbbbbbbbbbb

[root@centos6 testdir]#

10、Linux上文本处理三剑客:

grep：文本过滤工具（grep、egrep、fgrep）；

sed：stream editor，文本编辑工具；

grep：文本搜索工具，根据用户指定的“模式”对目标文本逐行进行匹配检查；打印匹配到的行；

grep [OPTIONS] PATTERN [FILE…]

–colcr=auto：对匹配到的文本着色显示；

-v：反转查找，即输出与查找条件不相符的行；

-i：忽略字符大小写；

-n：显示匹配的行号；

-c：统计匹配到的行数；

-o：仅显示匹配到的查找关键字；

-q：静默模式，不输出任何信息；

-A#：显示出匹配到的行，连同后#行也一并显示；

-B#：显示出匹配到的行，连同前#行也一并显示；

-C#：显示出匹配到的行，连同前后#行也一并显示；

-e：可以实现多个选项间的or匹配；

-w：整个单词进行匹配，匹配到的是完整的单词；

-E：等同于egrep命令；

同样也可以使用变量和命令引用

[root@centos6 Desktop]# grep "$USER" /etc/passwd

[root@centos6 Desktop]# grep `whoami` /etc/passwd

字符匹配：

.：匹配任意单个字符；

[]：匹配指定范围内的任意单个字符；

[^]：匹配指定范围外的任意单个字符；

[:digit:]：表示所有数字；

[:lower:]：表示所有小写字母；

[:upper:]：表示所有大写字母；

[:alpha:]：表示所哟的字母（不区分大小写）；

[:alnum:]：表示所有字母和数字；

[:punct:]：表示所有的标点符号；

[:space:]：表示所有的空白字符；

匹配次数：

匹配次数：用在要指定次数的字符后面，用于指定前面的字符要出现的次数；

*：匹配前面的字符任意次，包括0次；

贪婪模式：尽可能长的匹配；

.*：任意长度的任意字符；

\?：匹配其前面的字符0或1次；

\+：匹配前面的字符至少1次；

\{m\}：匹配前面的字符m次；

\{m,n\}：匹配前面的字符至少m次，至多n次；

\{,n\}：匹配前面的字符至多n次；

\{m,\}：匹配前面的字符至少m次；

位置锚定：定位出现的位置

^：行首锚定，用于模式的最左侧；

$：行尾锚定，用于模式的最右侧；

^PATTERN$：用于模式匹配整行；

^$：表示空行；

^[[:space:]]*$：空白行；

\<或\b：词首锚定，用于单词模式的左侧；

\>或\b：词尾锚定，用于单词模式的右侧；

\<PATTERN\>：匹配整个单词；

分组：

分组的意义：将一个或多个字符捆绑在一起，当作一个整体进行处理；也被称为 “后向引用 “引用前面分组括号中的模式所匹配到的字符而非模式本身；

\（\）

分组括号中的模式匹配到的内容会被正则表达式引擎记录到内部的变量中，这些变量的命名方式为：\1 \2 \3……………；

例：

在/etc/passwd过滤出用户名同shell名的行：

[root@centos7 ~]# grep "^$[[:alnum:]]\+$\>.*\1$" /etc/passwd

sync:x:5:0:sync:/sbin:/bin/sync

shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown

halt:x:7:0:halt:/sbin:/sbin/halt

[root@centos7 ~]#

egrep扩展的正则表达式：

egrep=grep -E

grep [OPTIONS] PATTERN [FILE…]

扩展正则表达式的元字符：

字符匹配：

.：匹配任意单个字符；

[]：匹配指定范围内的任意单个字符；

[^]：匹配除指定范围内的任意字符；

[:digit:]：表示所有数字；

[:lower:]：表示所有小写字母；

[:upper:]：表示所有大写字母；

[:alpha:]：表示所哟的字母（不区分大小写）；

[:alnum:]：表示所有字母和数字；

[:punct:]：表示所有的标点符号；

[:space:]：表示所有的空白字符；

次数匹配：

*：匹配前面字符任意次，包括0次；

？：匹配前面字符0次或1次；

+：匹配前面字符至少1次；

{ m}：匹配前面字符m次；

{n,m}：匹配前面字符至少n次，至多m次；

{0,m}：匹配前面字符至多m次；

{n,0}：匹配前面字符至少n次；

位置锚定：

^：锚定行首；

$：锚定行尾；

^PATTERN$：用于模式匹配整行；

^$：表示空行；

^[[:space:]]*$：空白行；

\<或\b：词首锚定，用于单词模式的左侧；

\>或\b：词尾锚定，用于单词模式的右侧；

\<PATTERN\>：匹配整个单词；

分组：

（）

例：

在/etc/passwd过滤出用户名同shell名的行：

[root@centos7 ~]# grep -E "^([[:alpha:]]*)\>.*\1$" /etc/passwd

sync:x:5:0:sync:/sbin:/bin/sync

shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown

halt:x:7:0:halt:/sbin:/sbin/halt

[root@centos7 ~]#

或者：

^S|s：表示以大写S开头的行或包含小写s的行；

例：

[root@centos7 ~]# grep -E "^S|s" /proc/meminfo

Buffers: 792 kB

SwapCached: 0 kB

Shmem: 10188 kB

PageTables: 21652 kB

NFS_Unstable: 0 kB

[root@centos7 ~]#

^（S|s）：表示以大写S或小写s开头的行；

例：

[root@centos7 ~]# grep -E "^(S|s)" /proc/meminfo

SwapCached: 0 kB

SwapTotal: 1023996 kB

SwapFree: 1023996 kB

Shmem: 10188 kB

Slab: 119596 kB

[root@centos7 ~]#

例：

取出ifconfig中的ipv4地址：

[root@centos7 ~]# ifconfig | grep "inet\b" | tr -s " "|cut -d " " -f 3| grep -v "127.0.0.1"

192.168.3.2

[root@centos7 ~]#

找出/etc/passwd 中的两位或三位数：

oot@centos7 ~]#cat /etc/passwd | grep -E -o "\b[[:digit:]]{2,3}\b"

………

992

990

…………

[root@centos7 ~]#

显示/etc/grub2.cfg文件中，至少以一个空白字符开头的且后面存非空白字符的行：

root@centos7 ~]# cat /etc/grub2.cfg | grep -E "^[[:space:]]+[^[:space:]]"

…………….

initrd16 /initramfs-0-rescue-27cebe594b5a45138a2e15e32a1cf607.img

source ${config_directory}/custom.cfg

source $prefix/custom.cfg;

[root@centos7 ~]#

显示/proc/meminfo文件中以大小s开头的行（要求用两种方法）：

[root@centos7 ~]# cat /proc/meminfo | grep "^[Ss]"

[root@centos7 ~]# cat /proc/meminfo | grep -E "^(S|s)"

利用扩展的正则表达式分别表示0-9、10-99、100-199、200-249、250-255：

[([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])

添加用户bash、testbash、basher以及nologin其shell为/sbin/nologin,而后找出/etc/passwd 文件中用户名同shell名的行：

[root@centos7 ~]# cat /etc/passwd | grep -E "(^[[:alnum:]]+)\b.*\1$"

sync:x:5:0:sync:/sbin:/bin/sync

shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown

halt:x:7:0:halt:/sbin:/sbin/halt

nologin:x:1004:1004::/home/nologin:/sbin/nologin

[root@centos7 ~]#

显示本机中所有IPv4地址：

[root@centos7 profile.d]# ifconfig | grep -E -o "(([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])"| grep -v -E -e "^255\>" -e "\<255$"| grep -v "^127\b"

192.168.3.2

[root@centos7 profile.d]#

使用echo命令输出/etc/sysconfig，使用egrep取出基名：

[root@centos7 ~]# echo /etc/sysconfig | grep -E -o "[^/]+/?$"

sysconfig

[root@centos7 ~]#

使用echo命令输出/etc/sysconfig，使用egrep取出目录名：

[root@centos7 ~]# echo /etc/sysconfig | grep -E -o "^[/][[:alpha:]]+/?"

/etc/

[root@centos7 ~]#

取出磁盘/dev/sda分区使用率数值：

[root@centos7 profile.d]#

sed：处理文本工具：

sed是一种流编辑器，它一次处理一行内容，处理时，把当前处理的行存储在临时缓冲区中，称为“模式空间”接着用sed命令处理模式空间中的内容，处理完成后，把模式空间的内容送往屏幕，接着出例下一行，这样不断重复，直到文件尾部，文件内容并没有改变，除非你使用重定向存储输出。sed主要用来自动编辑一个或多个文件，简化对文件的反复操作，编写转换程序；

sed [OPTION]… {script-only-if-no-other-script} [input-file]…

-n：不输出模式空间中的内容至屏幕；

-e：多点编辑；

-f：从指定文件中读取编辑脚本；

-r：支持使用扩展正则表达式；

-i：直接编辑原文件；

地址定界：

空地址：对全文进行处理；

单地址：#指定行；

/pattern/：被此模式匹配的每一行；

地址范围：

#,#：匹配指定行；

#,+#：指定行后再加+#行；

#,/pattern/指定行后第一个被模式匹配到的行；

/pattern/,/pattern/

步进：

1~2：说有奇数行；

2~2：所有偶数行；

编辑命令：

d：删除:

p：显示模式空间中的内容；

a \text：在匹配到行后追加文本，支持使用\n实现多行追加；

i \text：在匹配到行前插入文本，支持使用\n实现多行追加；

c \text：把匹配到的行替换为此处指定的文本；

w /path/to/somefiel ：保存模式空间匹配到的行至指定的文件中；

r /path/from/somefiel ：读取指定文件的内容至当前文件被模式匹配到的行处；实现文件合并；

=：为模式匹配到的行打印行号；

！：条件取反；

例：

[root@centos7 testdir]# cat /etc/fstab | sed "1,8"d

[root@centos7 testdir]# cat /etc/fstab | sed "/^UUID/a \new line"

[root@centos7 testdir]# cat /etc/fstab | sed "/^UUID/i \new line"

[root@centos7 testdir]# cat /etc/fstab | sed "/^UUID/c \new line"

[root@centos7 testdir]# cat /etc/fstab | sed "3r /etc/issue"

[root@centos7 testdir]# cat /etc/fstab | sed "/^UUID/w /testdir/f2"

[root@centos7 testdir]# cat /etc/fstab | sed "/^UUID/="

[root@centos7 testdir]# cat /etc/fstab | sed '/^#/!d'

s///：查找替换，其分隔符可自行指定，常用的由s@@@,s###

替换标记：

g:全局替换；

w /path/to/somefile ：将替换成功的结果保存至指定的文件中；

p：显示替换成功的行；

高级编辑命令：

h：把模式空间中的内容覆盖至保持空间中；

H：把模式空间中的内容追加至保持空间；

g：把保持空间中的内容覆盖至模式空间；

G：把保持空间中的内容追加至模式空间；

x：把模式空间中的内容与保持空间中的内容互换；

n：覆盖读取匹配到的行的下一行至模式空间中；

N：追加读取匹配到的行的下一行至模式空间中；

d：删除模式空间中的行；

D：删除多行模式空间中的所有行；

原创文章，作者：zhengyibo，如若转载，请注明出处：http://www.178linux.com/35745

Linux系统文本处理

相关推荐

shell脚本的一点补充

linux 用户与组管理详解

iptables：iptables工具详解

grep的使用和正则表达式

bind-9.9.5编译安装

文件查找find和locate

分享到: