作为运维人员,经常会需要会对日志中的某些重要信息进行筛选,比如说ip等参数。
案例一:筛选出IP地址信息
日志信息如下:
[root@C67-X64-A1 hanghang]# cat test.txt Jul 13 08:13:09 localhost sshd[14678]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172 Jul 13 08:13:09 localhost sshd[14679]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=222.73.173.143 user=root Jul 13 08:13:11 localhost sshd[14691]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172 user=admin Jul 13 08:13:11 localhost sshd[14692]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=222.73.173.143 Jul 13 08:13:14 localhost sshd[14707]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172 Jul 13 08:13:14 localhost sshd[14711]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=222.73.173.143 user=root Jul 13 08:13:17 localhost sshd[14722]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172 Jul 13 08:13:17 localhost sshd[14724]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=222.73.173.143 Jul 13 08:13:20 localhost sshd[14739]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172 user=root Jul 13 08:13:23 localhost sshd[14753]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172 user=root Jul 13 08:13:26 localhost sshd[14767]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172 Jul 13 08:13:29 localhost sshd[14781]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172 Jul 13 08:13:32 localhost sshd[14795]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172 Jul 13 08:13:35 localhost sshd[14809]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172 Jul 13 08:13:38 localhost sshd[14823]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172 Jul 13 08:13:41 localhost sshd[14837]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172 user=apache Jul 13 08:13:44 localhost sshd[14851]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172 Jul 13 08:13:47 localhost sshd[14865]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172 Jul 13 08:13:49 localhost sshd[14876]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172 Jul 13 08:13:53 localhost sshd[14895]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=61.152.95.172
方法1:利用awk命令进行筛选
[root@C67-X64-A1 hanghang]# awk -F "rhost=" '{print $NF}' test.txt |awk '{print $1'}|sort -r|uniq 61.152.95.172 222.73.173.143
方法2:利用grep的扩展命令egrep进行筛选
[root@C67-X64-A1 hanghang]# egrep -o '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' test.txt |sort -r|uniq 61.152.95.172 222.73.173.143
方法3:利用sed命令进行筛选
[root@C67-X64-A1 hanghang]# sed -nr 's/.*[^0-9](([0-9]+\.){3}[0-9]+).*/\1/p' test.txt |sort -r|uniq 61.152.95.172 222.73.173.143 [root@C67-X64-A1 hanghang]# sed -nr 's/(^|.*[^0-9])(([0-9]+\.){3}[0-9]+).*/\2/p' test.txt |sort -r|uniq 61.152.95.172 222.73.173.143
案例二:根据需求对日志信息进行筛选
需求:
最近需要处理下网站日志:
例如
A 1.1.1.1 用户访问 有index.html 和a.jpg 的日志
B 20.20.20.20 用户访问 有index.html 的日志 没其他文件记录的日志
现在需要提取B的IP 不需要A的IP
日志信息如下:
[root@C67-X64-A1 hanghang]# cat files 1.1.1.1 - - [19/Jul/2013:15:01:39 +0800] "GET /index.html HTTP/1.1 10.10.10.10 - - [19/Jul/2013:15:01:39 +0800] "GET /index.html HTTP/1.1 10.10.10.10 - - [19/Jul/2013:15:01:39 +0800] "GET /logo.jpg HTTP/1.1 10.10.10.10 - - [19/Jul/2013:15:01:39 +0800] "GET /a.js HTTP/1.1 3.3.3.3 - - [19/Jul/2013:15:01:39 +0800] "GET /index.html HTTP/1.1 20.20.20.20 - - [19/Jul/2013:15:01:39 +0800] "GET /index.html HTTP/1.1 20.20.20.20 - - [19/Jul/2013:15:01:39 +0800] "GET /logo.jpg HTTP/1.1 20.20.20.20 - - [19/Jul/2013:15:01:39 +0800] "GET /a.js HTTP/1.1 30.30.30.30 - - [19/Jul/2013:15:01:39 +0800] "GET /index.html HTTP/1.1 30.30.30.30 - - [19/Jul/2013:15:01:39 +0800] "GET /logo.jpg HTTP/1.1 30.30.30.30 - - [19/Jul/2013:15:01:39 +0800] "GET /a.js HTTP/1.1 4.4.4.4 - - [19/Jul/2013:15:01:39 +0800] "GET /index.html HTTP/1.1 5.5.5.5 - - [19/Jul/2013:15:01:39 +0800] "GET /index.html HTTP/1.1 1.1.1.1 - - [20/Jul/2013:15:01:39 +0800] "GET /index.html HTTP/1.1 2.2.2.2 - - [21/Jul/2013:15:01:39 +0800] "GET /index.html HTTP/1.1 3.3.3.3 - - [21/Jul/2013:15:01:55 +0800] "GET /index.html HTTP/1.1 4.4.4.4 - - [21/Jul/2013:16:01:55 +0800] "GET /index.html HTTP/1.1 5.5.5.5 - - [21/Jul/2013:17:02:55 +0800] "GET /index.html HTTP/1.1
Shell脚本实现:
#!/bin/bash #author molewan for i in `grep -v "/index.html" files | awk '{print $1}' | uniq`;do echo "| grep -v "$i" " >> tmp_title done M=`cat tmp_title | tr "\n" " " | sed 's#^#cat files | sort -r | uniq#'` echo $M | bash | awk '{print $1}' rm -rf tmp_title
Python脚本实现:
假设日志信息是放在文件log.dat里面的:
#! /usr/bin/env python import re Dip_reso = {} pattern = re.compile('(\d+\.\d+\.\d+\.\d+).*GET /(.*) .*') f = open('log.dat') for line in f: resource = re.match(pattern, line) key = resource.group(1) value = resource.group(2) if key in Dip_reso: if value not in Dip_reso[key]: Dip_reso[key].append(value) else: continue else: Dip_reso[key] = [] Dip_reso[key].append(value) f.close() for k in Dip_reso: if len(Dip_reso[k]) == 1 and cmp(Dip_reso[k][0], 'index.html') == 1: print k
#如果你要搜集数据,可以这样
# ip_data = [ip for ip in Dip_reso if len(Dip_reso[ip]) == 1 and cmp(Dip_reso[ip][0], 'index.html') == 1]
这样,ip_data就是所有的ip了。
原创文章,作者:Net21-冰冻vs西瓜,如若转载,请注明出处:http://www.178linux.com/24749