最近因工作需要,研究了一下mitmproxy做代理抓取,遇到一个比较典型的问题,就是在抓取过程中,需要定期更滑二级代理ip(懂的都懂 =。=)
网上面能找到的都是18年的结果,早都不能用了,通过mitmproxy官方github,找到了最新的解决方案,分享出来,给需要的同学。
def request(flow: http.HTTPFlow) -> None:
address = proxy_address(flow)
is_proxy_change = address != flow.server_conn.via.address
server_connection_already_open = flow.server_conn.timestamp_start is not None
if is_proxy_change and server_connection_already_open:
# server_conn already refers to an existing connection (which cannot be modified),
# so we need to replace it with a new server connection object.
flow.server_conn = Server(flow.server_conn.address)
flow.server_conn.via = ServerSpec("http", address)
PS:要启用二级代理需要再启动服务的时候增加option
# Usage: mitmdump
# -s change_upstream_proxy.py
# --mode upstream:http://default-upstream-proxy:8080/
# --set connection_strategy=lazy
# --set upstream_cert=false
原文地址:https://github.com/mitmproxy/mitmproxy/discussions/5173
此外还有个问题,上面的方案只是每次修改了当前请求的二级代理设置,并没有同步修改当前mitproxy服务的配置(就是我们启动时给的mode那个参数),所以会导致所有请求总是会现请求之前老的代理,从而出现请求时间长,502等问题。
经过研究源码,增加了修改服务配置的部分,解决这个问题
is_proxy_change = proxy_address != flow.server_conn.via.address
server_connection_already_open = flow.server_conn.timestamp_start is not None
if is_proxy_change and server_connection_already_open:
# server_conn already refers to an existing connection (which cannot be modified),
# so we need to replace it with a new server connection object.
flow.server_conn = Server(flow.server_conn.address)
if is_proxy_change:
print("原代理" + str(flow.server_conn.via.address) + '|新代理' + str(proxy_address))
flow.server_conn.via = ServerSpec('http', proxy_address)
mode_option = {'mode': str('upstream:' + proxyinfo)}
server = getServer()
# 更新运行环境中的代理设置
print("当前运行环境代理配置:" + ctx.master.options.__getattr__('mode'))
ctx.master.options.update(**mode_option)
print("当前运行环境配置更新后:" + ctx.master.options.__getattr__('mode'))