分布式引擎测试平台 FishTest 改中国象棋以及离线使用

前言

最近想给自己写的引擎跑谱调优,需要一个离线测试平台,看到有老外用 Python 写的,还不错,但需要做大量改动才行,老外的是线上跑谱,和 GitHub 绑定,需要修改代码。

坑倒是不少,有些没完全写出来。
引擎要有 Tune 模块才行,调优是作为参数修改。
使用时需要掌握 SPRT、SPSA 以及数学统计类知识。

部署

服务端

  1. 需要使用 Ubuntu 18.04 作为系统来部署(推荐使用 Server 版本)。

  2. 复制脚本 setup-fishtest.sh

    • 修改 usrpwd 变量为你的密码。
    • 修改 hostname 变量为你的域名(扩展,如果使用 HTTPS 的话)。
  3. 使用如下命令运行脚本:

sudo bash setup-fishtest.sh 2>&1 | tee setup-fishtest.sh.log

setup-fishtest.sh

#!/bin/bash
# 201025
# to setup a fishtest server on Ubuntu 18.04 (bionic), simply run:
# sudo bash setup_fishtest.sh 2>&1 | tee setup_fishtest.sh.log
#
# to use fishtest connect a browser to:
# http://<ip_address> or http://<fully_qualified_domain_name>

user_name='fishtest'
user_pwd='<your_password>'
server_name=$(hostname --all-ip-addresses)
# use a fully qualified domain names (http/https)
# server_name='<fully_qualified_domain_name>'

git_user_name='your_name'
git_user_email='[email protected]'

# create user for fishtest
useradd -m -s /bin/bash ${user_name}
echo ${user_name}:${user_pwd} | chpasswd
usermod -aG sudo ${user_name}
sudo -i -u ${user_name} << EOF
mkdir .ssh
chmod 700 .ssh
touch .ssh/authorized_keys
chmod 600 .ssh/authorized_keys
EOF

# get the user $HOME
user_home=$(sudo -i -u ${
     
     user_name} << 'EOF'
echo ${HOME}
EOF
)

# add some bash variables
sudo -i -u ${user_name} << 'EOF'
cat << 'EOF0' >> .profile

export FISHTEST_HOST=127.0.0.1
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export VENV="$HOME/fishtest/server/env"
EOF0
EOF

# set secrets
sudo -i -u ${user_name} << EOF
echo '' > fishtest.secret
echo '' > fishtest.captcha.secret
echo '' > fishtest.upload

cat << EOF0 > .netrc
# GitHub authentication to raise API rate limit
# create a <personal-access-token> https://github.com/settings/tokens
#machine api.github.com
#login <personal-access-token>
#password x-oauth-basic
EOF0
chmod 600 .netrc
EOF

# install required packages
apt update && apt full-upgrade -y && apt autoremove -y && apt clean
apt purge -y apache2 apache2-data apache2-doc apache2-utils apache2-bin
apt install -y ufw git bash-completion nginx mutt curl procps

# configure ufw
ufw allow ssh
ufw allow http
ufw allow https
ufw allow 6542
ufw --force enable
ufw status verbose

# configure nginx
cat << EOF > /etc/nginx/sites-available/fishtest.conf

upstream backend_tests {
    
    
    server 127.0.0.1:6543;
}

upstream backend_all {
    
    
    server 127.0.0.1:6544;
}

server {
    
    
    listen 80;
    listen [::]:80;

    server_name ${server_name};

    location ~ ^/(css|html|img|js|favicon.ico|robots.txt) {
    
    
        root        ${user_home}/fishtest/server/fishtest/static;
        expires     1y;
        add_header  Cache-Control public;
        access_log  off;
    }

    location / {
    
    
        proxy_pass http://backend_all;

        proxy_set_header X-Forwarded-Proto  \$scheme;
        proxy_set_header X-Forwarded-For    \$proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Host   \$host:\$server_port;
        proxy_set_header X-Forwarded-Port   \$server_port;

        client_max_body_size        100m;
        client_body_buffer_size     128k;
        proxy_connect_timeout       60s;
        proxy_send_timeout          90s;
        proxy_read_timeout          90s;
        proxy_buffering             off;
        proxy_temp_file_write_size  64k;
        proxy_redirect              off;

        location ~ ^/api/(active_runs|download_pgn|download_pgn_100|request_version|upload_pgn) {
    
    
            proxy_pass http://backend_all;
        }

        location /api/ {
    
    
            proxy_pass http://backend_tests;
        }

        location ~ ^/tests/(finished|user/) {
    
    
            proxy_pass http://backend_all;
        }

        location /tests {
    
    
            proxy_pass http://backend_tests;
        }
    }
}
EOF

unlink /etc/nginx/sites-enabled/default
ln -sf /etc/nginx/sites-available/fishtest.conf /etc/nginx/sites-enabled/fishtest.conf
systemctl enable nginx.service
systemctl restart nginx.service

# setup pyenv and install the latest python version
# https://github.com/pyenv/pyenv
apt update
apt install -y --no-install-recommends make build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev

sudo -i -u ${user_name} << 'EOF'
cat << 'EOF0' >> .profile

# pyenv: keep at the end of the file
export PYENV_ROOT="$HOME/.pyenv"
export PATH="$PYENV_ROOT/bin:$PATH"
if command -v pyenv &>/dev/null; then
  eval "$(pyenv init -)"
fi
EOF0
EOF

sudo -i -u ${user_name} << 'EOF'
python_ver="3.8.6"
git clone https://github.com/pyenv/pyenv.git "${PYENV_ROOT}"
pyenv install ${python_ver}
pyenv global ${python_ver}
EOF

# install mongodb community edition for Ubuntu 18.04 (bionic), change for other releases
wget -qO - https://www.mongodb.org/static/pgp/server-4.4.asc | sudo apt-key add -
echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.4 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-4.4.list
apt update
apt install -y mongodb-org

# set the cache size in /etc/mongod.conf
#  wiredTiger:
#    engineConfig:
#      cacheSizeGB: 1.75
cp /etc/mongod.conf mongod.conf.bkp
sed -i 's/^#  wiredTiger:/  wiredTiger:\n    engineConfig:\n      cacheSizeGB: 1.75/' /etc/mongod.conf

# setup logrotate for mongodb
sed -i '/^  logAppend: true/a\  logRotate: reopen' /etc/mongod.conf

cat << 'EOF' > /etc/logrotate.d/mongod
/var/log/mongodb/mongod.log
{
    
    
    daily
    missingok
    rotate 14
    compress
    delaycompress
    notifempty
    create 0600 mongodb mongodb
    sharedscripts
    postrotate
        /bin/kill -SIGUSR1 $(pgrep mongod 2>/dev/null) 2>/dev/null || true
    endscript
}
EOF

# download fishtest
sudo -i -u ${user_name} << EOF
git clone --single-branch --branch master https://github.com/glinscott/fishtest.git
cd fishtest
git config user.email "${git_user_email}"
git config user.name "${git_user_name}"
EOF

# setup fishtest
sudo -i -u ${user_name} << 'EOF'
python3 -m venv ${VENV}
${VENV}/bin/python3 -m pip install --upgrade pip setuptools wheel
cd ${HOME}/fishtest/server
${VENV}/bin/python3 -m pip install -e .
EOF

# install fishtest as systemd service
cat << EOF > /etc/systemd/system/[email protected]
[Unit]
Description=Fishtest Server port %i
After=network.target mongod.service

[Service]
Type=simple
ExecStart=${user_home}/fishtest/server/env/bin/pserve production.ini http_port=%i
Restart=on-failure
RestartSec=3
User=${user_name}
WorkingDirectory=${user_home}/fishtest/server

[Install]
WantedBy=multi-user.target
EOF

# install also fishtest debug as systemd service
cat << EOF > /etc/systemd/system/fishtest_dbg.service
[Unit]
Description=Fishtest Server Debug port 6542
After=network.target mongod.service

[Service]
Type=simple
ExecStart=${user_home}/fishtest/server/env/bin/pserve development.ini --reload
User=${user_name}
WorkingDirectory=${user_home}/fishtest/server

[Install]
WantedBy=multi-user.target
EOF

# enable the autostart for mongod.service and [email protected]
# check the log with: sudo journalctl -u [email protected]
systemctl daemon-reload
systemctl enable mongod.service
systemctl enable fishtest@{
    
    6543..6544}.service

# start fishtest server
systemctl start mongod.service
systemctl start fishtest@{
    
    6543..6544}.service

# add mongodb indexes
sudo -i -u ${user_name} << 'EOF'
${VENV}/bin/python3 ${HOME}/fishtest/server/utils/create_indexes.py actions flag_cache pgns runs users
EOF

# add some default users:
# "user00" (with password "user00"), as approver
# "user01" (with password "user01"), as normal user
sudo -i -u ${user_name} << 'EOF'
${VENV}/bin/python3 << EOF0
from fishtest.rundb import RunDb
rdb = RunDb()
rdb.userdb.create_user('user00', 'user00', '[email protected]')
rdb.userdb.add_user_group('user00', 'group:approvers')
user = rdb.userdb.get_user('user00')
user['blocked'] = False
user['machine_limit'] = 100
rdb.userdb.save_user(user)
rdb.userdb.create_user('user01', 'user01','[email protected]')
user = rdb.userdb.get_user('user01')
user['blocked'] = False
user['machine_limit'] = 100
rdb.userdb.save_user(user)
EOF0
EOF

sudo -i -u ${user_name} << 'EOF'
(crontab -l; cat << EOF0
VENV=${HOME}/fishtest/server/env
UPATH=${HOME}/fishtest/server/utils

# Backup mongodb database and upload to s3
# keep disabled on dev server
# 3 */6 * * * /usr/bin/cpulimit -l 50 -f -m -- sh ${UPATH}/backup.sh

# Update the users table
1,16,31,46 * * * * \${VENV}/bin/python3 \${UPATH}/delta_update_users.py

# Purge old pgn files
33 3 * * * \${VENV}/bin/python3 \${UPATH}/purge_pgn.py

# Clean up old mail (more than 9 days old)
33 5 * * * screen -D -m mutt -e 'push D~d>9d<enter>qy<enter>'

EOF0
) | crontab -
EOF

cat << EOF
connect a browser to:
http://${server_name}
EOF

效果图,里面有一个已完成的测试:

首页

客户端

其实服务端有客户端的代码,但需要折腾(运行、编译环境什么的),有老外已经做好的可移植版(Windows 端)。

跑谱器需要自己实现,又是一个坑,要改大量 C++ 代码,可以根据开源项目 cutechess 改,此处省略 1 万行代码。

客户端目录结构

扫描二维码关注公众号,回复: 12907657 查看本文章

改离线使用

逻辑差异

在线使用

新建任务时:会向 GitHub 查询信息,获取 Bench 值、提交日志等等……
运行任务时:在线下载源码、开局库、权重、跑谱器等等,离线编译,然后运行。

离线使用

新建任务时:自己生成信息(sha1),不管 Bench 值。
运行任务时:手动复制需要跑谱的程序、跑谱器等等,不需要编译,直接运行。

总体来说,就是移除所有在线的代码,例如访问 GitHub 的代码。

缺点:好像只能跑一个任务,解决方法是有的,复制多个引擎进跑谱器目录,提前算好 sha1 值(重命名,后缀式),分开拷贝几个客户端。

修改代码

服务端

先改前端,把那些依赖的文件下载好,改成离线使用,不然卡死。

我一共下载了 3 个,bootstrap.min.jsbootstrap-combined.min.cssjquery-3.5.1.min.js,放以下路径,根据文件类型区分。

文件路径结构图

不需要联网查询 IP 地址详细信息,注释以下代码。

api.py

    def get_flag(self):
        # ip = self.request.remote_addr
        # if ip in flag_cache:
        #     return flag_cache.get(ip, None)  # Handle race condition on "del"
        # # concurrent invocations get None, race condition is not an issue
        # flag_cache[ip] = None
        # result = self.request.userdb.flag_cache.find_one({"ip": ip})
        # if result:
        #     flag_cache[ip] = result["country_code"]
        #     return result["country_code"]
        # try:
        #     # Get country flag from worker IP address
        #     FLAG_HOST = "https://freegeoip.app/json/"
        #     r = requests.get(FLAG_HOST + self.request.remote_addr, timeout=1.0)
        #     if r.status_code == 200:
        #         country_code = r.json()["country_code"]
        #         self.request.userdb.flag_cache.insert_one(
        #             {
    
    
        #                 "ip": ip,
        #                 "country_code": country_code,
        #                 "geoip_checked_at": datetime.utcnow(),
        #             }
        #         )
        #         flag_cache[ip] = country_code
        #         return country_code
        #     raise Error("flag server failed")
        # except:
        #     del flag_cache[ip]
        #     print("Failed GeoIP check for {}".format(ip))
        return None

在新建测试任务的时候,会向 GitHub 查询信息,需要去掉,以下是改动的地方。

worker.py

def worker(worker_info, password, remote):
    global ALIVE, FLEET

    payload = {
    
    "worker_info": worker_info, "password": password}

    try:
        print("Fetch task...")
        # if not get_rate():
        #     raise Exception("Near API limit")

注释掉 if not get_rate() 这两行,这是检测 GitHub 用量是否达到上限的。

games.py

    # create new engines
    sha_new = run["args"]["resolved_new"]
    sha_base = run["args"]["resolved_base"]
    new_engine_name = "chameleon_" + sha_new
    base_engine_name = "chameleon_" + sha_base

    new_engine = os.path.join(testing_dir, new_engine_name + EXE_SUFFIX)
    base_engine = os.path.join(testing_dir, base_engine_name + EXE_SUFFIX)
    sylvan = os.path.join(testing_dir, "sylvan-cli" + EXE_SUFFIX)

    print("new_engine_name " + str(new_engine_name))
    print("base_engine_name " + str(base_engine_name))

    # Build from sources new and base engines as needed
    # if not os.path.exists(new_engine):
    #     setup_engine(
    #         new_engine,
    #         worker_dir,
    #         testing_dir,
    #         remote,
    #         sha_new,
    #         repo_url,
    #         worker_info["concurrency"],
    #     )
    # if not os.path.exists(base_engine):
    #     setup_engine(
    #         base_engine,
    #         worker_dir,
    #         testing_dir,
    #         remote,
    #         sha_base,
    #         repo_url,
    #         worker_info["concurrency"],
    #     )

    os.chdir(testing_dir)

    # Download book if not already existing
    # if (
    #     not os.path.exists(os.path.join(testing_dir, book))
    #     or os.stat(os.path.join(testing_dir, book)).st_size == 0
    # ):
    #     zipball = book + ".zip"
    #     setup(zipball, testing_dir)
    #     zip_file = ZipFile(zipball)
    #     zip_file.extractall()
    #     zip_file.close()
    #     os.remove(zipball)

    # Download sylvan if not already existing
    # if not os.path.exists(sylvan):
    #     if len(EXE_SUFFIX) > 0:
    #         zipball = "sylvan-cli-win.zip"
    #     else:
    #         zipball = "sylvan-cli-linux-{}.zip".format(platform.architecture()[0])
    #     setup(zipball, testing_dir)
    #     zip_file = ZipFile(zipball)
    #     zip_file.extractall()
    #     zip_file.close()
    #     os.remove(zipball)
    #     os.chmod(sylvan, os.stat(sylvan).st_mode | stat.S_IEXEC)

    # verify that an available sylvan matches the required minimum version
    # verify_required_sylvan(sylvan)

    # clean up old networks (keeping the 10 most recent)
    networks = glob.glob(os.path.join(testing_dir, "nn-*.nnue"))
    if len(networks) > 10:
        networks.sort(key=os.path.getmtime)
        for old_net in networks[:-10]:
            try:
                os.remove(old_net)
            except:
                print("Failed to remove an old network " + str(old_net))

    # Add EvalFile with full path to sylvan options, and download networks if not already existing
    # net_base = required_net(base_engine)
    # if net_base:
    #     base_options = base_options + [
    #         "option.EvalFile={}".format(os.path.join(testing_dir, net_base))
    #     ]
    # net_new = required_net(new_engine)
    # if net_new:
    #     new_options = new_options + [
    #         "option.EvalFile={}".format(os.path.join(testing_dir, net_new))
    #     ]

    # for net in [net_base, net_new]:
    #     if net:
    #         if not os.path.exists(os.path.join(testing_dir, net)) or not validate_net(
    #             testing_dir, net
    #         ):
    #             download_net(remote, testing_dir, net)
    #             if not validate_net(testing_dir, net):
    #                 raise Exception("Failed to validate the network: {}".format(net))

    # pgn output setup
    pgn_name = "results-" + worker_info["unique_key"] + ".pgn"
    if os.path.exists(pgn_name):
        os.remove(pgn_name)
    pgnfile = os.path.join(testing_dir, pgn_name)

    # Verify signatures are correct
    verify_signature(
        new_engine,
        run["args"]["new_signature"],
        remote,
        result,
        games_concurrency * threads,
    )
    base_nps = verify_signature(
        base_engine,
        run["args"]["base_signature"],
        remote,
        result,
        games_concurrency * threads,
    )

注释以下地方

  • 在线下载源码的地方。
  • 下载开局库(暂时没有,也用不到)。
  • 所有校验引擎的地方。
        # Limit worker Github API calls
        # if "rate" in worker_info:
        #     rate = worker_info["rate"]
        #     limit = rate["remaining"] <= 2 * math.sqrt(rate["limit"])
        # else:
        limit = False

注释掉那个限制,如上图所示,否则会报错。

客户端

worker.py

    if cpu_count <= 0:
        sys.stderr.write("Not enough CPUs to run fishtest (it requires at least two)\n")
        worker_exit()

    """ try:
        gcc_version()
    except Exception as e:
        print(e, file=sys.stderr)
        worker_exit() """

    with open(config_file, "w") as f:
        config.write(f)
    if options.only_config == "True":
        worker_exit(0)

注释需要 gcc 编译器的地方,前面说了离线不需要编译,太麻烦。

worker.py

def worker(worker_info, password, remote):
    global ALIVE, FLEET

    payload = {
    
    "worker_info": worker_info, "password": password}

    try:
        print("Fetch task...")

        # if not get_rate():
        #     raise Exception("Near API limit")

注释掉 if not get_rate() 这两行,这是检测 GitHub 用量是否达到上限的。

games.py

        # if int(bench_sig) != int(signature):
        #     message = "Wrong bench in {} Expected: {} Got: {}".format(
        #         os.path.basename(engine),
        #         signature,
        #         bench_sig,
        #     )
        #     payload["message"] = message
        #     send_api_post_request(remote + "/api/stop_run", payload)
        #     raise Exception(message)

注释掉 if int(bench_sig) != int(signature): 这两行,离线使用不需要检测引擎。

games.py

    print("new_engine_name " + str(new_engine_name))
    print("base_engine_name " + str(base_engine_name))

    # Build from sources new and base engines as needed
    # if not os.path.exists(new_engine):
    #     setup_engine(
    #         new_engine,
    #         worker_dir,
    #         testing_dir,
    #         remote,
    #         sha_new,
    #         repo_url,
    #         worker_info["concurrency"],
    #     )
    # if not os.path.exists(base_engine):
    #     setup_engine(
    #         base_engine,
    #         worker_dir,
    #         testing_dir,
    #         remote,
    #         sha_base,
    #         repo_url,
    #         worker_info["concurrency"],
    #     )

    os.chdir(testing_dir)

    # Download book if not already existing
    # if (
    #     not os.path.exists(os.path.join(testing_dir, book))
    #     or os.stat(os.path.join(testing_dir, book)).st_size == 0
    # ):
    #     zipball = book + ".zip"
    #     setup(zipball, testing_dir)
    #     zip_file = ZipFile(zipball)
    #     zip_file.extractall()
    #     zip_file.close()
    #     os.remove(zipball)

    # Download sylvan if not already existing
    # if not os.path.exists(sylvan):
    #     if len(EXE_SUFFIX) > 0:
    #         zipball = "sylvan-cli-win.zip"
    #     else:
    #         zipball = "sylvan-cli-linux-{}.zip".format(platform.architecture()[0])
    #     setup(zipball, testing_dir)
    #     zip_file = ZipFile(zipball)
    #     zip_file.extractall()
    #     zip_file.close()
    #     os.remove(zipball)
    #     os.chmod(sylvan, os.stat(sylvan).st_mode | stat.S_IEXEC)

    # verify that an available sylvan matches the required minimum version
    # verify_required_sylvan(sylvan)

    # clean up old networks (keeping the 10 most recent)
    networks = glob.glob(os.path.join(testing_dir, "nn-*.nnue"))
    if len(networks) > 10:
        networks.sort(key=os.path.getmtime)
        for old_net in networks[:-10]:
            try:
                os.remove(old_net)
            except:
                print("Failed to remove an old network " + str(old_net))

    # Add EvalFile with full path to sylvan options, and download networks if not already existing
    # net_base = required_net(base_engine)
    # if net_base:
    #     base_options = base_options + [
    #         "option.EvalFile={}".format(os.path.join(testing_dir, net_base))
    #     ]
    # net_new = required_net(new_engine)
    # if net_new:
    #     new_options = new_options + [
    #         "option.EvalFile={}".format(os.path.join(testing_dir, net_new))
    #     ]

    # for net in [net_base, net_new]:
    #     if net:
    #         if not os.path.exists(os.path.join(testing_dir, net)) or not validate_net(
    #             testing_dir, net
    #         ):
    #             download_net(remote, testing_dir, net)
    #             if not validate_net(testing_dir, net):
    #                 raise Exception("Failed to validate the network: {}".format(net))

新增两个 print,方便重命名引擎后缀式。

注释以下地方

  • 在线下载源码的地方。
  • 下载开局库(暂时没有,也用不到)。
  • 所有校验引擎的地方。

坑点

  1. 离线连不上,报错。

打开 http://192.168.90.128/api/request_version,发现显示如下:

Internal Server Error

The server encountered an unexpected internal server error

(generated by waitress)

这个时候,我觉得是改离线时候出的问题,直接看代码,发现 api.py 报错,接着是在 rundb.py 里面,当时我第一感觉,肯定是老外又写傻逼代码了。

上文说了注释掉那个限制,就可以解决,如果还报错,打开 6542 端口看下原因。

  1. 账号莫名其妙损坏或丢失。

没深入研究过,任务却都在,为了继续使用,只好重新创建回去。

附几个常用脚本:

db.users.update({
    
    'username':'user00'},{
    
    $set:{
    
    'password':'user00'}},{
    
    multi:true})
db.users.update({
    
    'username':'user01'},{
    
    $set:{
    
    'password':'user01'}},{
    
    multi:true})
db.getCollection("users").insert( {
    
    
    username: "user00",
    password: "user00",
    "registration_time": ISODate("2020-02-15T06:47:50.853Z"),
    blocked: false,
    email: "[email protected]",
    groups: [
        "group:approvers",
        "group:approvers",
        "group:approvers",
        "group:approvers"
    ],
    "tests_repo": "",
    "machine_limit": NumberInt("100")
} );

db.users.update({
    
     "_id" : ObjectId("5ea7fc06b8fbd777f6352fc3")},{
    
    $set:{
    
    "password":"user00"}})

db.users.remove({
    
     "_id" : ObjectId("5ea2dcacf911ef92007f2e1d") })

里面的 _id,修改成自己的。
创建时,密码是 5 位数,但前端限制必须 8 位数以上,所以只能通过命令改。

  1. 参数趋势图卡 Loading graph……

这个要连 Google 服务器才行,离线使用被 Google 明令禁止,暂时没有办法(也没多大用)。我的做法是在前端删了这个控件。

  1. 调优速度太慢。
    请参阅:关于 SPSA 调优实用的指导方针

猜你喜欢

转载自blog.csdn.net/ad44275783/article/details/115165736