常见的文本处理工具及正则表达式的相关知识

1.cat命令使用详解

cat [option]… [file]…

-A equivalent=vET

-b 非空行编号

-E 行为显示$

-n 显示所有行的行号

-s 行号并压缩连续空行为一行

-T 显示tab为^M

实例：显示a文件的行号及所有控制符

2.(1)head使用详解

head -n x 显示前x行

head -c x 显示前x字节

（2）tail 使用详解

tail -n x 显示后x行

tail -c x 显示后x字节

tail -f 跟踪显示文件新追加内容（可用于日志文件的监控）

实例：显示/etc/man.config的6-8行

head -n 8 /etc/man.config | tail -n 2

3.cut命令使用详解

cut -d '分割字符' -f fields(一行当中，取出部分列)

cut -c 字符范围以“字符的单位”取出固定字符区间

字符范围：n 1…n

n-m

m- m…..

–output-delieter=string 输出分隔符

4.paste命令使用详解

-d 指定分隔符

-s 所有行合为一行显示

5.wc命令使用详解

wc -l 按行

wc -w 字（英文单字）

wc -m 字符

6.diff与patch使用

diff patch

diff -u old new > patch -u 选项用来输出统一的（unified）格式，最适用于补丁文件

patch -b old patch -b 自动备份改变的文件

注意：人清命令及参数顺序

7.sort命令使用详解

sort [option] file|stdin

-b 忽略最前边空格

-f 忽略大小写（fold lower case to upper case character）

-M 按照月份compare<jan<….<dec

-r 反向排序

-t 分隔符默认‘tab’分割

-k 以第几个区间排序

-u 唯一排序

8.正则表达式

.正则表达式是处理字符串的方法，以行为单位处理，以特殊符号的辅助，可以让用户轻易达到查找，删除，替换某特定字符串

（1）

grep (以行为单位)

grep [options] 'str' filename

-E 拓展正则表达式

-P –perl-regexp

-c 计算找到包含字符串的行的次数 –count

-e实现多个选项间的逻辑or关系

-o 仅显示匹配到的字符串

-q 静默模式，不输出信息

-w 整行匹配整个单词

-i 忽略大小写

-n 顺便输出行号

-v 反向选择

(2)字符匹配： "."：匹配任意单个字符

“[ ]"：匹配自定范围内的单个字符 [a-z]仅仅指得是ab c d ….z，不是通配符下的意义

“[^]”:匹配范围内之外字符

[:alnum:]：任意字符和数字

[:punt:]:punctuation symbol标点符号

[:space:]:代表tab和空格

（3）匹配次数养成加引号的习惯

* ：代表零次或任意多次匹配前一个字符

\?:匹配零次或一次前边的字符

\+:至少匹配一次前边的字符"

\{n\}:匹配n次

\{m,n\}：最少m次，最多n次

\{n,\}：最少n次

\{.n\}：最多n次

（4）位置锚定

^:行首锚定，用于模式最左侧

$:行尾锚定，用于模式最右侧

[[:space:]]* ：空白行

\<:词首锚定，位于右侧

\>：词尾锚定，位于左侧

\<pattern\> 匹配整个单词

（5）分组

：将一个或多个捆绑到一起，当做整体使用

分组括号中的模式匹配到的内容会被正则表达式引擎记录与内部变量中去，变量命名方式为\1,\2…

\1:匹配到内容为第一个左括号与相匹配的右边的括号之间

实例：$str1\+\(str2$\*)

\1:str1\+$str2$*

\2:str2

9.拓展正则表达式

（1）字符匹配 "."匹配任意单个字符

[ ]匹配指定范围内的单个字符

[^]匹配指定范围外的单个字符

（2）匹配次数

* 重复零次或任意多次前边的字符

？匹配零次或一次前边的字符

+匹配一次以上前边的字符

{n}匹配n次前边的字符

{n,m}匹配n到m次前边的字符

（3）位置锚定

^行首

$行尾

\<词首

\>词尾

\<pattern\>匹配整个单词

（4）分组

( )

后向引用 \1 \2

(5)或者

a|b a或者b

C|cat C或者cat

（C|c）at Cat或cat

10.作业一

解答：

1.
grep -i ^s.* /proc/meminfo
[root@localhost ~]# grep -i ^s.* /proc/meminfo
SwapCached:            0 kB
SwapTotal:       2047996 kB
SwapFree:        2047996 kB
Shmem:               264 kB
Slab:              85832 kB
SReclaimable:      24532 kB
SUnreclaim:        61300 kB
2.
[root@localhost ~]# grep -v /bin/bash$ /etc/passwd
3.
[root@localhost ~]# grep "^rpc\>" /etc/passwd | cut -d ':' -f 1,7
rpc:/sbin/nologin
[root@localhost ~]# 
4.
[root@localhost ~]# grep -w "[1-9][0-9]\{1,2\}" /etc/passwd
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
uucp:x:10:14:uucp:/var/spool/uucp:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
gopher:x:13:30:gopher:/var/gopher:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
usbmuxd:x:113:113:usbmuxd user:/:/sbin/nologin
rpc:x:32:32:Rpcbind Daemon:/var/cache/rpcbind:/sbin/nologin
5.
6.[root@localhost ~]# netstat -tan | grep "LISTEN[[:space:]]*"
tcp        0      0 0.0.0.0:33993               0.0.0.0:*                   LISTEN      
tcp        0      0 0.0.0.0:111                 0.0.0.0:*                   LISTEN      
tcp        0      0 0.0.0.0:22                  0.0.0.0:*                   LISTEN      
tcp        0      0 127.0.0.1:631               0.0.0.0:*                   LISTEN      
tcp        0      0 127.0.0.1:25                0.0.0.0:*                   LISTEN      
tcp        0      0 :::44200                    :::*                        LISTEN      
tcp        0      0 :::111                      :::*                        LISTEN      
tcp        0      0 :::22                       :::*                        LISTEN      
tcp        0      0 ::1:631                     :::*                        LISTEN      
tcp        0      0 ::1:25                      :::*                        LISTEN      
7.[root@localhost ~]# grep "\(^.*\>\).*/\1$" /etc/passwd
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
bash:x:1507:1508::/home/bash:/bin/bash

作业二.

解答：

1.[root@localhost ~]# egrep "^(mage|wang|root)\>" /etc/passwd | cut -d ":" -f 1,3,7
root:0:/bin/bash
mage:1508:/bin/bash
wang:1509:/bin/bash
2.[root@localhost ~]# grep -E  "^[[:alpha:]]|_\(\).*" /etc/rc.d/init.d/functions 
TEXTDOMAIN=initscripts
umask 022
PATH="/sbin:/usr/sbin:/bin:/usr/bin"
export PATH
if [ -f /etc/sysconfig/i18n -a -z "${NOLOCALE:-}" -a -z "${LANGSH_SOURCED:-}" ] ; then
fi
if [ -z "${BOOTUP:-}" ]; then
fi
fstab_decode_str() {
checkpid() {
daemon() {
killproc() {
3[root@localhost ~]# grep -E  "^[[:alpha:]_]+\(\)" /etc/rc.d/init.d/functions 
fstab_decode_str() {
checkpid() {
__readlink() {
__fgrep() {
__kill_pids_term_kill_checkpids() {
__kill_pids_term_kill() {
__umount_loop() {
__source_netdevs_fstab() {
__source_netdevs_mtab() {
__umount_loopback_loop() {
__pids_var_run() {
__pids_pidof() {
daemon() {
killproc() {
pidfileofproc() {
4.[root@localhost ~]# echo "/etc/rc.d/init.d/functions" | grep -E "[^/$]" | grep -Eo\
>  "/.*/"
/etc/rc.d/init.d/
5.[root@localhost ~]# last | grep -E "^root\>.*[[:digit:]\.]{3}[[:digit:]]" | tr -s ' ' 'f' | cut -d "f" -f 3 | sort | uniq -c
     43 10.1.250.50
6.[0-9] [1-9][0-9]  [1][0-9][0-9] [2][0-4][1-9] 25[0-5]   
7.[root@localhost ~]# ifconfig | grep -E -o "(([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])"
10.1.253.19
10.1.255.255
255.255.0.0
127.0.0.1
255.0.0.0

作业三

1、取各分区利用率的数值

[root@localhost ~]# df | tr -s " " ":" | cut -d ":" -f 1,5
Filesystem:Use%
/dev/sda2:17%
tmpfs:0%
/dev/sda1:19%
/dev/sda3:1%

2、统计/etc/init.d/functions 文件中每个单词出现的次数，并按频率从高到低显示

[root@localhost ~]# cat /etc/init.d/functions | tr -cs "[[:alpha:]]" "\n" | sort | uniq -c | sort -nr 
    168 [
    161 ]
     83 if
     77 then
     75 pid
     73 echo
     72 fi
     61 return
     57 dev
     54 file
     50 n
     46 local
     42 kill
     39 z
     36 base
     35 remaining

3、/etc/rc.d/init.d/functions或/etc/rc.d/init.d/functions/" 取目录名

[root@localhost ~]# echo "/etc/rc.d/init.d/functions" | grep -E -o "/.*[^/$]" | grep -Eo "/.*/"
/etc/rc.d/init.d/
[root@localhost ~]#

原创文章，作者：mengzhiqian，如若转载，请注明出处：http://www.178linux.com/30093

常见的文本处理工具及正则表达式的相关知识

相关推荐

马哥教育21期网络班—第14周课程+练习——>iptables 练习

bash编程函数select语句的使用

Linux学习小结3

M20-1扩展正则表达式作业

linux基础知识：计算机的组成及其功能

shuc之学习目标

分享到: