linux捕获进程crash信息(core_pattern)
2019-08-26
本博客所有文章采用的授权方式为 自由转载-非商用-非衍生-保持署名 ,转载请务必注明出处,谢谢。
ubuntu core dump(core_pattern )control
前面的文章讲过了内核crash后,如何捕获内核的crash现场;还有种情况是进程crash,
Linux系统中,如果进程崩溃了,系统内核会捕获到进程崩溃信息,然后将进程的coredump 信息写入到文件中,这个文件名默认是core,但是也可以通过配置修改这个文件名。比如可以通过修改/proc/sys/kernel/core_pattern 文件的内容来指定.
core_pattern
自Linux 内核2.6.19 之后 core_pattern 不仅仅可以包含一个指定报文coredump信息的文件名,还可以是Linux 管道加一个用户空间的程序或者一个脚本.如果core_pattern 中第一个字符是 Linux管道符 | , 那么Linux 内核在捕获进程崩溃信息的时候,就会以root权限执行管道符后门的程序或者脚本,将进程崩溃信息传递给这个程序或者脚本,这就给我们提供了一个隐藏系统后门的方法,我们可以在管道符后面隐藏我们的后门脚本,以实现在特定条件下反弹shell |
配置文件位于
less /etc/sysctl.d/30-ezs3.conf
kernel.panic = 10
kernel.printk = 3 4 1 3
vm.swappiness = 5
vm.max_map_count = 1048576
kernel.core_pattern = |/usr/local/bin/ezs3-coredump %e %s %p
kernel.pid_max = 1048576
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.all.arp_filter = 1
net.ipv4.conf.default.arp_filter = 1
net.ipv4.conf.all.arp_announce = 1
net.ipv4.conf.default.arp_announce = 1
net.ipv4.conf.all.arp_ignore = 2
net.ipv4.conf.default.arp_ignore = 2
net.core.rmem_max = 104857600
net.core.wmem_max = 104857600
net.core.optmem_max = 104857600
net.core.netdev_max_backlog = 300000
net.ipv4.tcp_rmem = 65536 20971520 104857600
net.ipv4.tcp_wmem = 65536 20971520 104857600
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_congestion_control = htcp
其中,kernel.core_pattern = |/usr/local/bin/ezs3-coredump %e %s %p 参数定义了core_dump 执行的脚步,之前该脚本不够智能,产生该脚本 经过bean的升级,处理
看core dump文件 是否 生效
cat /proc/sys/kernel/core_pattern
测试 core 是否 生效
sleep 200 &
然后 发sig 6 杀死 上面进程
kill -6 pid
正常的情况 会产生 下面的日志文件
/var/log/ezcloudstor# tailf ezs3-coredump.log
如果发现,日志文件没有 产生,可以 手动 执行
root@node1:/var/log/ezcloudstor# /usr/local/bin/ezs3-coredump
看是否 有明显报错。在5.5环境上使用时,发现 缺少 putil的包。
注意:
自定义core dump文件路径时需要注意配置好路径的权限。
在普通用户运行设置了setuid的程序一定要将suid_dumpable设置为2才能生成coredump文件。
要保证全部用户能在指定路径生成core dump,需要进行以下配置:
- 创建一个777的core dump文件路径。
- 将/proc/sys/kernel/core_pattern参数为core dump路径。
- 将/proc/sys/fs/suid_dumpable设定为2,保证使用setuid的程序能生成dump。
core dump 脚本(bash)
亚信5.5 版本 c集群
root@Storage-c2:~# cat /usr/local/bin/coredump_gen
#!/bin/bash
# echo "|/usr/local/bin/coredump_gen %e %p %s > /proc/sys/kernel/core_pattern"
MAX_CORE_FILES=3
MAX_CUR_CORE_FILES=$((MAX_CORE_FILES-1))
CORE_PATH="/var/log"
DEBUG_LOG=/var/log/debug_core.log
rotate_same_core()
{
local dest_prefix="core\.${1}\."
local core_num=`ls $CORE_PATH --sort=time |grep ${dest_prefix} | wc -l`
if [ $core_num -gt $MAX_CUR_CORE_FILES ]; then
cores_delete=`ls $CORE_PATH --sort=time |grep ${dest_prefix} |tail -n $(($core_num-MAX_CUR_CORE_FILES))`
for core in ${cores_delete}
do
echo `date` " DELETE $CORE_PATH/${core}" >> $DEBUG_LOG
rm -f $CORE_PATH/${core}
done
fi
}
# core pattern %e.%s.%p
prog_name=$1
pid_num=$2
sig_num=$3
exit 0
dest_filename="$CORE_PATH/core."${prog_name}.${pid_num}.sig${sig_num}
#rotate_same_core ${prog_name}
#cat <&0 >$dest_filename
echo `date` " GENERATE $dest_filename" >> $DEBUG_LOG
root@Storage-c2:~#
5.5 版本 还不能 直接用 7.0 core dump 脚本,报错如下:
root@Storage-b6:~# source /etc/sysctl.d/30-ezs3.conf
kernel.panic: command not found
kernel.printk: command not found
vm.swappiness: command not found
Traceback (most recent call last):
File "/usr/local/bin/ezs3-coredump", line 7, in <module>
import psutil
ImportError: No module named psutil
kernel.core_pattern: command not found
net.ipv4.conf.all.rp_filter: command not found
net.ipv4.conf.default.rp_filter: command not found
net.core.rmem_max: command not found
root@Storage-b6:~# ll
core dump 脚本(最新的7.0版本)python
root@converger-124:~# cat /usr/local/bin/ezs3-coredump
#! /usr/bin/python
import os
import re
import sys
import time
import glob
import psutil
import tarfile
import logging
from ezs3.log import EZLog
from ezs3.command import do_cmd
EZLog.init_handler(logging.INFO, '/var/log/ezcloudstor/ezs3-coredump.log')
logger = EZLog.get_logger('ezs3-coredump')
MB = 1024 * 1024
# if os is created in small partition, skip core dump
TINY_OS_PARTITION_THRESHOLD = 32*1024*1024*1024
# if core is generated by same program and same signal,
# in SAME_CORE_RECENT_THRESHOLD second ,we will skip the second core
SAME_CORE_RECENT_THRESHOLD = 10*60
MAX_DUPLICATED_CORE = 3
# preserve 30% os space at least
OS_SPACE_TO_RESERVED = 30
def purge_extra_cores(core_path, name, sig):
logger.info('purging extra cores start')
if not re.search(r'_[0-9]+$', name):
prefix = 'ezcore.{}.sig{}.*'.format(name, sig)
else:
name = '_'.join(name.split('_')[:-1]) + '_[0-9]*'
prefix = 'ezcore.{}.sig{}.*'.format(name, sig)
old_cores = glob.glob(os.path.join(core_path, prefix))
if len(old_cores) >= MAX_DUPLICATED_CORE:
old_cores.sort(key=os.path.getmtime, reverse=True)
try:
for core in old_cores[MAX_DUPLICATED_CORE-1:]:
logger.info("purge {}".format(core))
os.unlink(core)
except Exception as e:
logger.exception("Exception happened when purge extra core {} ({})"
.format(core,str(e)))
logger.info('purging extra cores done')
def dump_core(core_path, name, sig, pid):
core_file = os.path.join(core_path, 'ezcore.{}.sig{}.{}'.format(name, sig, pid))
try:
logger.info('start dumping %s', core_file)
loop = 0
with open(core_file, 'wb+') as f:
while True:
if loop % 100 == 0:
du = psutil.disk_usage(core_file)
if du.free * 100.0 / du.total < OS_SPACE_TO_RESERVED:
raise RuntimeError('not enough disk space, skip core dumping. du={} to_preserve={}'
.format(du, OS_SPACE_TO_RESERVED))
loop += 1
data = sys.stdin.read(MB)
if data:
f.write(data)
else:
logger.info('finish dumping %s', core_file)
os.chdir(core_path)
old_file_name = 'ezcore.{}.sig{}.{}'.format(name, sig, pid)
new_file_name = 'ezcore.{}.sig{}.{}.tar'.format(name, sig, pid)
do_cmd("echo 'tar -zcf %s %s && rm %s'|at now" %(new_file_name, old_file_name, old_file_name))
return
except Exception:
logger.exception('unable to dump %s', core_file)
if os.path.isfile(core_file):
os.remove(core_file)
def same_core_exists_recently(core_path, name, sig):
now = time.time()
if not re.search(r'_[0-9]+$', name):
prefix = 'ezcore.{}.sig{}.*'.format(name, sig)
else:
name = '_'.join(name.split('_')[:-1]) + '_[0-9]*'
prefix = 'ezcore.{}.sig{}.*'.format(name, sig)
old_cores = glob.glob(os.path.join(core_path, prefix))
for old_core in old_cores:
interval = now -os.stat(old_core).st_mtime
if interval < SAME_CORE_RECENT_THRESHOLD:
logger.info("same core {} have exists recently ({} < {})"
.format(old_core, interval, SAME_CORE_RECENT_THRESHOLD))
return True
return False
def os_partition_too_small():
du = psutil.disk_usage('/')
if du.total < TINY_OS_PARTITION_THRESHOLD:
logger.info("will not store the core file , because of the tiny os partition")
return True
return False
def main(name, sig, pid):
logger.info("program {} signal {} pid {}".format(name,sig,pid))
if os_partition_too_small():
return
core_path = "/var/log/cores"
if not os.path.isdir(core_path):
os.makedirs(core_path)
if same_core_exists_recently(core_path, name, sig):
return
purge_extra_cores(core_path, name, sig)
dump_core(core_path, name, sig, pid)
if __name__ == '__main__':
main(sys.argv[1], sys.argv[2], sys.argv[3])
root@converger-124:~#
参考链接
https://www.jianshu.com/p/20d7326cc07a
下面这个链接,是利用core_pattern,设计出的一个 后门程序,挺有意思