web安全之机器学习入门笔记-图算法与知识图谱
webshell具有很多访问特征,和有向图相关的为:
入度出度均为0
- 独立的页面
入度出度均为1且自己指向自己
0.处理流程:
- 1.原始日志数据
- 2.提取请求和refer字段(开启自定义日志格式)
- 3.导入图数据库
- 4.查询入度出度均为0或1的节点
1.原始日志数据:
“
2.提取请求和refer字段:
处理后的日志数据:
reffer -> path
- -> http://180.76.190.79/wordpress/wp-admin/1.php
- -> http://180.76.190.79/wordpress/wp-admin/admin-ajax.php
- -> http://180.76.190.79/wordpress/wp-admin/customize.php
- -> http://180.76.190.79/wordpress/wp-admin/load-styles.php
- -> http://180.76.190.79/wordpress/wp-admin/post-new.php
- -> http://180.76.190.79/wordpress/wp-login.php
http://180.76.190.79/wordpress/ -> http://180.76.190.79/wordpress/wp-admin/edit-comments.php
http://180.76.190.79/wordpress/ -> http://180.76.190.79/wordpress/wp-admin/profile.php
http://180.76.190.79/wordpress/ -> http://180.76.190.79/wordpress/wp-login.php
http://180.76.190.79/wordpress/ -> http://180.76.190.79/wordpress/xmlrpc.php
http://180.76.190.79/wordpress/wp-admin/ -> http://180.76.190.79/wordpress/wp-login.php
http://180.76.190.79/wordpress/wp-admin/1.php为webshell
3.导入图数据库
neo4j数据库脚本操作
- 删除:
MATCH (n:Page) detach delete n
RETURN n
- 查询疑似webshell链接:
match (n:Page) where (n.in=1 and n.out=0) or (n.in=1 and n.out=1) return n.url
逐行读取,生成节点以及关联关系:
代码经过修改才能跑
for line in file_object:
matchObj = re.match( r'(\S+) -> (\S+)', line, re.M|re.I)
if matchObj:
ref = matchObj.group(1)
path = matchObj.group(2)
if path in nodes.keys(): # 如果该节点是已有节点
path_node = nodes[path] #
else: # 节点不存在
path_node = "Page%d" % index #
nodes[path] = path_node
sql = "create (%s:Page {url:\"%s\" , id:\"%d\",in:0,out:0})" %(path_node,path,index) # 初始化节点属性 出入度均为0
index=index+1
session.run(sql)
print sql
if ref in nodes.keys(): # 如果该节点是已有节点
ref_node = nodes[ref]
else:
ref_node = "Page%d" % index
nodes[ref] = ref_node
sql = "create (%s:Page {url:\"%s\",id:\"%d\",in:0,out:0})" %(ref_node,ref,index)
index=index+1
session.run(sql)
print sql
更新节点出入度属性:
sql = "match (n:Page {url:\"%s\"}) SET n.out=n.out+1" % ref # 来源页面设置出度为1
session.run(sql)
print sql
sql = "match (n:Page {url:\"%s\"}) SET n.in=n.in+1" % path # 目标页面设置入度为1
session.run(sql)
print sql
# 插入边,插入关系
sql = '''
MATCH (a:Page),(b:Page)
WHERE a.url = '{path}' AND b.url = '{ref}'
CREATE (b)-[r:Point]->(a);
'''.format(path=path,ref=ref)
session.run(sql)
print sql
4.查询入度出度均为0或1的节点:
网页关联关系可视化结果:
查询入度出度均为0或1的节点:疑似webshell的链接
match (n:Page) where (n.in=1 and n.out=0) or (n.in=1 and n.out=1) return n.url
http://180.76.190.79/wordpress/wp-admin/1.php为webshell
其他为误报
常见误报有:
- 主页,各种index页面
- Phpmyadmin、Zabbix 等运维管理后台
- Hadoop、ELK等开源软件的控制台
- API接口
难点在于 扫描器对结果的影响,这部分需要通过 扫描器指纹 或 人机算法 来去掉干扰
参考:
- neo4j教程
- 《Web安全之机器学习入门》