Rasa_NLU_Chi学习笔记(一):顺藤摸瓜

本文记录通过批量测试和配置环境来逐渐掌握该项目的过程。

通过进行以下实验,可以学会:如何利用py.test进行批量测试;如何在临时文件夹中写文件;如何获取项目根目录路径;如何在github上Commit and push自己的代码;安装缺失包的一种方法;从spaCy1.8.2的源码中可以看到对哪些语言进行过支持;了解如何将.pyx文件转化为可以import的.pyd文件。

下面只重点列出测试会报错的几个文件。

test_config.py

直接运行test_config.py文件会出现路径错误,需要进行以下修改:

defaults = utils.read_yaml_file(ProjectUtil.project_root_path() + CONFIG_DEFAULTS_PATH)

在tests/base目录下(cd E:\workspace-python\Rasa_NLU_Chi\tests\base)运行py.test -q test_config.py,报错如下:

________________________________________ ERROR at setup of test_default_config ________________________________________

    @pytest.fixture(scope="session")
    def default_config():
>       return config.load(CONFIG_DEFAULTS_PATH)

..\conftest.py:41:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
..\..\rasa_nlu\config.py:44: in load
    file_config = utils.read_yaml_file(filename)
..\..\rasa_nlu\utils\__init__.py:236: in read_yaml_file
    return yaml.load(read_file(filename, "utf-8"))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

filename = 'sample_configs/config_defaults.yml', encoding = 'utf-8'

    def read_file(filename, encoding="utf-8-sig"):
        """Read text from a file."""
>       with io.open(filename, encoding=encoding) as f:
E       FileNotFoundError: [Errno 2] No such file or directory: 'sample_configs/config_defaults.yml'

..\..\rasa_nlu\utils\__init__.py:202: FileNotFoundError
1 failed, 4 passed, 108 warnings, 2 errors in 4.30s

感觉路径修改位置不对,直接对conftest.py中的CONFIG_DEFAULTS_PATH进行修改。修改后运行,passed数量和errors数量发生了变化。

2 failed, 5 passed, 109 warnings in 4.13s

不过依然存在错误:

============================================================== FAILURES ==============================================================
______________________________________________________ test_invalid_config_json ______________________________________________________

    def test_invalid_config_json():
        file_config = """pipeline: [spacy_sklearn"""  # invalid yaml
        with tempfile.NamedTemporaryFile("w+", suffix="_tmp_config_file.json") as f:
            f.write(file_config)
            f.flush()
            with pytest.raises(rasa_nlu.config.InvalidConfigError):
>               config.load(f.name)

test_config.py:39:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
..\..\rasa_nlu\config.py:44: in load
    file_config = utils.read_yaml_file(filename)
..\..\rasa_nlu\utils\__init__.py:236: in read_yaml_file
    return yaml.load(read_file(filename, "utf-8"))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

filename = 'C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\tmp126n5gab_tmp_config_file.json', encoding = 'utf-8'

    def read_file(filename, encoding="utf-8-sig"):
        """Read text from a file."""
>       with io.open(filename, encoding=encoding) as f:
E       PermissionError: [Errno 13] Permission denied: 'C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\tmp126n5gab_tmp_config_file.json'

..\..\rasa_nlu\utils\__init__.py:202: PermissionError
_

进到对应的临时文件目录一看,文件太多,看得眼花缭乱,删掉4个多G的临时文件后重新运行,依然找不到对应的json文件。

应该是没有获得写文件的权限,不过用管理员模式打开cmd运行还是不对。经过一番查找找到了原因:在命名的临时文件仍处于打开状态时,是否可以使用该名称第二次打开文件,会因平台而异(可以在Unix上使用;在Windows NT或更高版本上不能使用)。如果delete为true(默认设置),则在关闭文件后立即将其删除。解决方法:将delete设为False,再次运行failed数量进一步减少。

1 failed, 6 passed, 110 warnings in 3.92s
====================================================== FAILURES =======================================================
_____________________________________________ test_set_attr_on_component ______________________________________________

default_config = <rasa_nlu.config.RasaNLUModelConfig object at 0x0000026CDD30A190>

    def test_set_attr_on_component(default_config):
>       cfg = config.load("sample_configs/config_spacy.yml")

test_config.py:65:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
..\..\rasa_nlu\config.py:44: in load
    file_config = utils.read_yaml_file(filename)
..\..\rasa_nlu\utils\__init__.py:236: in read_yaml_file
    return yaml.load(read_file(filename, "utf-8"))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

filename = 'sample_configs/config_spacy.yml', encoding = 'utf-8'

    def read_file(filename, encoding="utf-8-sig"):
        """Read text from a file."""
>       with io.open(filename, encoding=encoding) as f:
E       FileNotFoundError: [Errno 2] No such file or directory: 'sample_configs/config_spacy.yml'

..\..\rasa_nlu\utils\__init__.py:202: FileNotFoundError

依然是项目路径没有配置正确,简单修改即可解决。运行结果如下:

7 passed, 112 warnings in 3.36s

test_evaluation.py

测试该文件报错:

======================================================= ERRORS ========================================================
____________________________________ ERROR at setup of test_get_entity_extractors _____________________________________

component_builder = <rasa_nlu.components.ComponentBuilder object at 0x000001A31BDEDB50>
tmpdir_factory = TempdirFactory(_tmppath_factory=TempPathFactory(_given_basetemp=None, _trace=<pluggy._tracing.TagTracerSub object at 0x000001A31BC64BB0>, _basetemp=WindowsPath('C:/Users/Administrator/AppData/Local/Temp/pytest-of-Administrator/pytest-0')))

    @pytest.fixture(scope="session")
    def duckling_interpreter(component_builder, tmpdir_factory):
        conf = RasaNLUModelConfig({"pipeline": [{"name": "ner_duckling"}]})
>       return utilities.interpreter_for(
                component_builder,
                data="./data/examples/rasa/demo-rasa.json",
                path=tmpdir_factory.mktemp("projects").strpath,
                config=conf)

test_evaluation.py:32:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
..\utilities.py:38: in interpreter_for
    (trained, _, path) = do_train(config, data, path,
..\..\rasa_nlu\train.py:143: in do_train
    trainer = Trainer(cfg, component_builder)
..\..\rasa_nlu\model.py:146: in __init__
    components.validate_requirements(cfg.component_names)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

component_names = ['ner_duckling']

    def validate_requirements(component_names):
        # type: (List[Text], Text) -> None
        """Ensures that all required python packages are installed to
        instantiate and used the passed components."""
        from rasa_nlu import registry

        # Validate that all required packages are installed
        failed_imports = set()
        for component_name in component_names:
            component_class = registry.get_component_class(component_name)
            failed_imports.update(find_unavailable_packages(
                    component_class.required_packages()))
        if failed_imports:  # pragma: no cover
            # if available, use the development file to figure out the correct
            # version numbers for each requirement
>           raise Exception("Not all required packages are installed. " +
                            "To use this pipeline, you need to install the "
                            "missing dependencies. " +
                            "Please install {}".format(", ".join(failed_imports)))
E           Exception: Not all required packages are installed. To use this pipeline, you need to install the missing dependencies. Please install duckling

..\..\rasa_nlu\components.py:60: Exception
------------------------------------------------ Captured stderr setup ------------------------------------------------
2021-03-02 22:00:36.034038: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2021-03-02 22:00:36.035369: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
DEBUG:tensorflow:Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.
------------------------------------------------- Captured log setup --------------------------------------------------
DEBUG    tensorflow:tpu_cluster_resolver.py:34 Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.
2 failed, 8 passed, 104 warnings, 4 errors in 3.90s

原因是缺少duckling包,没有利用python setup.py install来安装所有缺失的包,这样做速度会非常慢。感觉还是利用国内镜像逐个安装比较靠谱。安装完以后报错内容发生了变化:

======================================================= ERRORS ========================================================
____________________________________ ERROR at setup of test_get_entity_extractors _____________________________________

component_builder = <rasa_nlu.components.ComponentBuilder object at 0x0000029CFF6F1AF0>
tmpdir_factory = TempdirFactory(_tmppath_factory=TempPathFactory(_given_basetemp=None, _trace=<pluggy._tracing.TagTracerSub object at 0x0000029CFF575BB0>, _basetemp=WindowsPath('C:/Users/Administrator/AppData/Local/Temp/pytest-of-Administrator/pytest-2')))

    @pytest.fixture(scope="session")
    def duckling_interpreter(component_builder, tmpdir_factory):
        conf = RasaNLUModelConfig({"pipeline": [{"name": "ner_duckling"}]})
>       return utilities.interpreter_for(
                component_builder,
                data="./data/examples/rasa/demo-rasa.json",
                path=tmpdir_factory.mktemp("projects").strpath,
                config=conf)

test_evaluation.py:32:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
..\utilities.py:38: in interpreter_for
    (trained, _, path) = do_train(config, data, path,
..\..\rasa_nlu\train.py:148: in do_train
    training_data = load_data(data, cfg.language)
..\..\rasa_nlu\training_data\loading.py:53: in load_data
    files = utils.list_files(resource_name)
..\..\rasa_nlu\utils\__init__.py:113: in list_files
    return [fn for fn in list_directory(path) if os.path.isfile(fn)]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

path = './data/examples/rasa/demo-rasa.json'

    def list_directory(path):
        # type: (Text) -> List[Text]
        """Returns all files and folders excluding hidden files.

        If the path points to a file, returns the file. This is a recursive
        implementation returning files in any depth of the path."""

        if not isinstance(path, six.string_types):
            raise ValueError("Resourcename must be a string type")

        if os.path.isfile(path):
            return [path]
        elif os.path.isdir(path):
            results = []
            for base, dirs, files in os.walk(path):
                # remove hidden files
                goodfiles = filter(lambda x: not x.startswith('.'), files)
                results.extend(os.path.join(base, f) for f in goodfiles)
            return results
        else:
>           raise ValueError("Could not locate the resource '{}'."
                             "".format(os.path.abspath(path)))
E           ValueError: Could not locate the resource 'E:\workspace-python\Rasa_NLU_Chi\tests\base\data\examples\rasa\demo-rasa.json'.

..\..\rasa_nlu\utils\__init__.py:103: ValueError
------------------------------------------------ Captured stderr setup ------------------------------------------------
2021-03-02 22:17:01.936974: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2021-03-02 22:17:01.939095: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
DEBUG:tensorflow:Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.
------------------------------------------------- Captured log setup --------------------------------------------------
DEBUG    tensorflow:tpu_cluster_resolver.py:34 Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.
2 failed, 8 passed, 104 warnings, 4 errors in 120.10s (0:02:00)

错误变成了无法找到文件:demo-rasa.json。需要修改的代码块如下:

重新运行后passed数量增加了:

2 failed, 12 passed, 104 warnings in 167.14s (0:02:47)

如法炮制,修改test_drop_intents_below_freq和test_run_cv_evaluation函数中的对应路径,报错情况继续得到改善:

1 failed, 13 passed, 105 warnings in 132.11s (0:02:12)

继续运行,发现缺少 sklearn_crfsuite和spacy。

2021/3/2 spacy的版本已经升级到了3.03,而Rasa_NLU_Chi用的spacy版本为1.8.2,导致函数参数发生了变化:

nlp = spacy.load(spacy_model_name, parser=False)

并且对于空模型的加载方法也发生了变化,对应代码段应该修改为:

其中,需要使用命令python -m spacy download en单独下载安装whl文件以解决spacy无法load('en')的问题。继续运行报错: ModuleNotFoundError: No module named 'spacy.gold'

发现spacy3.03里面找不到GoldParse方法。为了解决这个问题,我在https://github.com/explosion/spaCy/releases/tag/v1.8.2网址中下载了1.8.2来一探究竟,发现gold是pyx和pxd文件,于是准备将它拷到Rasa_NLU_Chi项目里试试看能不能调用。从spaCy1.8.2的源码中可以看到它能够处理的语言比较多,包括en, de, zh, es, it, hu, fr, pt, nl, sv, fi, bn, he。参考http://forum.digitser.cn/thread-2227-1-1.html,.pyx 文件必须先被编译成 .c 文件,再编译成 .pyd (Windows 平台) 或 .so (Linux 平台) 文件,才可作为模块 import 导入使用。感觉有点麻烦,还是先将spacy从3.03降到1.8.2比较省事。利用pycharm直接降级版本会失败,在cmd中先将spacy卸载再安装指定版本也会报错。

ERROR: Command errored out with exit status 1: 'd:\programs\python\python38\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\Administrator\\AppData\\Local\\Temp\\pip-install-1clcgu3o\\cytoolz_a1730a4647454b7ab52f89e55a0845fc\\setup.py'"'"'; __file__='"'"'C:\\Users\\Administrator\\AppData\\Local\\Temp\\pip-install-1clcgu3o\\cytoolz_a1730a4647454b7ab52f89e55a0845fc\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\Administrator\AppData\Local\Temp\pip-record-xz5pohfm\install-record.txt' --single-version-externally-managed --compile --install-headers 'd:\programs\python\python38\Include\cytoolz' Check the logs for full command output.

更快捷的方法是跳过测试该问题函数,刚开始不太明白cv是什么意思,还以为是opencv,仔细看了看发现是交叉验证(cross validation)的缩写。

@unittest.skip("spacy 1.8.2中的gold模块在3.0.3中去掉了,所以该方法在新的版本中无法正常运行。")
def test_run_cv_evaluation():

再次运行测试,OK。

13 passed, 1 skipped, 104 warnings in 171.17s (0:02:51)

test_extractors.py

第一遍检测报错:

4 failed, 3 passed, 110 warnings in 154.94s (0:02:34)

错误千奇百怪,暂时不通过这个思路继续排错了。

猜你喜欢

转载自blog.csdn.net/dragon_T1985/article/details/114293389