python使用Rasa_NLU/Rasa_NLU_Chi将中文自然语言转化为结构化数据(中文自然语言理解)

Song • 171 次浏览 • 0 个回复 • 2019年05月07日

一、您应该拥有的文件:

  • data/total_word_feature_extractor_zh.dat

通过MITIE wordrep工具从中文语料库训练(训练需要2-3天)

如需训练,请构建MITIE Wordrep工具。请注意,在进入训练工具之前,应首先对中文语料库进行标记。最符合用户案例的近域语料库最有效。

可以从中文博客下载中文维基百科转储和百度百科的训练模型。

  • data/examples/rasa/demo-rasa_zh.json 应尽可能多地添加示例。

二、用法:

1、克隆这个项目,然后运行安卓

python setup.py install

2、修改配置。

目前对于中国人我们有两条管道:

  • a、使用MITIE+Jieba (sample_configs/config_jieba_mitie.yml)
language: "zh"

pipeline:
- name: "nlp_mitie"
  model: "data/total_word_feature_extractor_zh.dat"
- name: "tokenizer_jieba"
- name: "ner_mitie"
- name: "ner_synonyms"
- name: "intent_entity_featurizer_regex"
- name: "intent_classifier_mitie"
  • b、推荐:使用MITIE + Jieba + sklearnsample_configs/config_jieba_mitie_sklearn.yml):
language: "zh"

pipeline:
- name: "nlp_mitie"
  model: "data/total_word_feature_extractor_zh.dat"
- name: "tokenizer_jieba"
- name: "ner_mitie"
- name: "ner_synonyms"
- name: "intent_entity_featurizer_regex"
- name: "intent_featurizer_mitie"
- name: "intent_classifier_sklearn"

3、

(可选)使用Jieba用户自定义词典或Switch Jieba默认词典:

您可以将文件路径或目录路径作为user_dicts值。(sample_configs/config_jieba_mitie_sklearn_plus_dict_path.yml

language: "zh"

pipeline:
- name: "nlp_mitie"
  model: "data/total_word_feature_extractor_zh.dat"
- name: "tokenizer_jieba"
  default_dict: "./default_dict.big"
  user_dicts: "./jieba_userdict"
#  user_dicts: "./jieba_userdict/jieba_userdict.txt"
- name: "ner_mitie"
- name: "ner_synonyms"
- name: "intent_entity_featurizer_regex"
- name: "intent_featurizer_mitie"
- name: "intent_classifier_sklearn"

4、训练模型:

如果在配置文件中指定项目名称,则会将模型保存在/models/your_project_name中文博客;否则,您的模型将保存在/models/default

python -m rasa_nlu.train -c sample_configs/config_jieba_mitie_sklearn.yml --data data/examples/rasa/demo-rasa_zh.json --path models

5、运行rasa_nlu服务器:

python -m rasa_nlu.server -c sample_configs/config_jieba_mitie_sklearn.yml --path models

6、测试数据

打开一个新终端,现在您可以从服务器卷曲结果,例如:

$ curl -XPOST localhost:5000/parse -d '{"q":"我发烧了该吃什么药?", "model": "model_20171019-095732"}' | python -mjson.tool

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   652    0   552  100   100    157     28  0:00:03  0:00:03 --:--:--   157
{
    "entities": [
        {
            "end": 3,
            "entity": "disease",
            "extractor": "ner_mitie",
            "start": 1,
            "value": "发烧"
        }
    ],
    "intent": {
        "confidence": 0.5397186422631861,
        "name": "medical"
    },
    "intent_ranking": [
        {
            "confidence": 0.5397186422631861,
            "name": "medical"
        },
        {
            "confidence": 0.16206323981749196,
            "name": "restaurant_search"
        },
        {
            "confidence": 0.1212448457737397,
            "name": "affirm"
        },
        {
            "confidence": 0.10333600028547868,
            "name": "goodbye"
        },
        {
            "confidence": 0.07363727186010374,
            "name": "greet"
        }
    ],
    "text": "我发烧了该吃什么药?"
}

三、安装中遇到的问题:

1、Exception: Unable to load total_word_feature_extractor detector from data/total_word_feature_extractor_zh.dat

解决办法:查看sample_configs/config_jieba_mitie_sklearn.yml文件中的data/total_word_feature_extractor_zh.dat路径是否正确,具体可以参考关于total_word_feature_extractor_zh.dat文件

2、No project found with name 'rasa_nlu_test'

$ curl -XPOST localhost:5000/parse -d '{"q":"我发烧了该吃什么药?", "project": "rasa_nlu_test", "model": "model_20170921-170911"}' | python -mjson.tool

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   160    0    60  100   100   6093  10155 --:--:-- --:--:-- --:--:-- 11111
{
    "error": "No project found with name 'rasa_nlu_test'."
}

[1]+  Stopped                 curl -XPOST localhost:5000/parse -d '{"q":"我发烧了该吃什么药?", "project": "rasa_nlu_test", "model": "model_20170921-170911"}' | python -mjson.tool

解决办法:去掉project参数即可。

3、bad value(s) in fds_to_keep

sklearn新版本导致的问题,降级:

pip3 install scikit-learn==0.19.2

重新训练NLU即可,参考"error": "bad value(s) in fds_to_keep"

4、ValueError: bad input shape (1, 5)

运行python API出现ValueError: bad input shape (1, 5)可以参考ValueError: bad input shape (1, 5)


更多也可以参考:Rasa_NLU_Chi 中文处理实践

项目地址:crownpku/Rasa_NLU_Chi


原创文章,转载请注明 :python使用Rasa_NLU/Rasa_NLU_Chi将中文自然语言转化为结构化数据(中文自然语言理解) - pytorch中文网
原文出处: https://ptorch.com/news/243.html
问题交流群 :168117787
提交评论
要回复文章请先登录注册
用户评论
  • 没有评论
Pytorch是什么?关于Pytorch! python安装Rasa NLU实现聊天机器人中的意图分类和实体提取