solr api: 集群管理,数据读写

1,使用Rest Api, 管理solr

1.1, 管理configSet

参考:https://lucene.apache.org/solr/guide/7_7/configsets-api.html

#1, 查看cdh solr默认的configSet
[root@test-c6 ~]# solrctl instancedir --list
managedTemplate
managedTemplateSecure
predefinedTemplate
predefinedTemplateSecure
schemalessTemplate
schemalessTemplateSecure

[root@test-c6 ~]# find  /opt/cloudera/parcels/CDH/lib/solr/   -iname "*template*"
/opt/cloudera/parcels/CDH/lib/solr/predefinedTemplate
/opt/cloudera/parcels/CDH/lib/solr/managedTemplateSecure
/opt/cloudera/parcels/CDH/lib/solr/predefinedTemplateSecure
/opt/cloudera/parcels/CDH/lib/solr/managedTemplate
/opt/cloudera/parcels/CDH/lib/solr/schemalessTemplateSecure
/opt/cloudera/parcels/CDH/lib/solr/coreconfig-template
/opt/cloudera/parcels/CDH/lib/solr/schemalessTemplate
/opt/cloudera/parcels/CDH/lib/solr/coreconfig-schemaless-template

#上传configSet配置文件--- 在cdh solr版 不管用
#(cd /opt/cloudera/parcels/CDH/lib/solr/predefinedTemplate && zip -r - *) > /tmp/demo.myconfigset.zip
#
#[root@test-c6 tmp]# ll /tmp/demo.myconfigset.zip -h
#-rw-r--r-- 1 root root 145K Dec  4 17:32 /tmp/demo.myconfigset.zip
#
#curl -X POST --header "Content-Type:application/octet-stream" -d @'/tmp/demo.myconfigset.zip'  "http://test-c6:8983/solr/admin/configs?action=upload&name=demo"

#2, 列出已有的configSet
curl 'http://localhost:8983/solr/admin/configs?action=LIST&omitHeader=true&wt=json'

#3, 创建一个自定义名称的configSet:基于默认的某个配置
curl 'http://localhost:8983/solr/admin/configs?action=CREATE&name=demo4&baseConfigSet=predefinedTemplate&configSetProp.immutable=false&wt=json'

#4, 删除某个configSet
curl 'http://localhost:8983/solr/admin/configs?action=DELETE&name=demo2'

1.2, 修改schema.xml

参考: https://lucene.apache.org/solr/guide/7_7/schema-api.html

  • add-field, replace-field, delete-field
  • add-dynamic-field, replace-dynamic-field, delete-dynamic-field
#  添加字段 -----cdh solr 不适用
curl -X POST -H 'Content-type:application/json' --data-binary '{
  "add-field":{
     "name":"sell_by",
     "type":"pdate",
     "stored":true }
}' http://localhost:8983/solr/gettingstarted/schema
  • add-field-type, delete-field-type, replace-field-type
#  添加字段类型 -----cdh solr 不适用
curl -X POST -H 'Content-type:application/json' --data-binary '{
  "add-field-type" : {
     "name":"myNewTxtField",
     "class":"solr.TextField",
     "positionIncrementGap":"100",
     "analyzer" : {
        "charFilters":[{
           "class":"solr.PatternReplaceCharFilterFactory",
           "replacement":"$1$1",
           "pattern":"([a-zA-Z])\\\\1+" }],
        "tokenizer":{
           "class":"solr.WhitespaceTokenizerFactory" },
        "filters":[{
           "class":"solr.WordDelimiterFilterFactory",
           "preserveOriginal":"0" }]}}
}' http://localhost:8983/solr/gettingstarted/schema

1.3, 修改solrconfig.xml

curl http://localhost:8983/solr/test1/config -H 'Content-type:application/json'  -d '{
  "add-requesthandler" : {
    "name": "/dataimport",
    "class":"solr.DataImportHandler",
    "defaults":{ "config":"/opt/cloudera/parcels/CDH/share/doc/solr-doc-4.10.3+cdh5.12.0+513/example/example-DIH/solr/db/conf/db-data-config.xml" }
  }
}'

1.4, 管理collection

参考:https://lucene.apache.org/solr/guide/7_7/collections-api.html

action分类:

  • collection 管理:CREATE,LIST, DELETE, MODIFYCOLLECTION, RELOAD
  • shard 分片管理:SPLITSHARD,CREATESHARD ,DELETESHARD
  • replica 副本管理: DELETEREPLICA ,ADDREPLICA
#创建collection:
#curl 'http://test-c6:8983/solr/admin/collections' -d 'action=CREATE&name=demo&numShards=1&replicationFactor=1&maxShardsPerNode=2&collection.configName=managedTemplate&wt=json'
curl 'http://test-c6:8983/solr/admin/collections?action=CREATE&name=demo&numShards=1&replicationFactor=1&maxShardsPerNode=2&collection.configName=demo&wt=json'

#列出所有collection
curl 'http://test-c6:8983/solr/admin/collections?action=LIST'

#删除指定集合
curl 'http://test-c6:8983/solr/admin/collections?action=DELETE&name=demo'  

#查看集群状态:OVERSEERSTATUS, CLUSTERSTATUS
curl 'http://test-c6:8983/solr/admin/collections?action=CLUSTERSTATUS'

1.2.1, 查看shard分片范围

  • curl "http://test-c6:8983/solr/admin/collections?action=CLUSTERSTATUS&collection=test32&indent=true&wt=json"
start(十六进制) end(十六进制)
0 2^31-1 =2147483647=(7fff ffff)8
2^31=2147483648=(8000 0000)8 2^32-1 =4294967295 = (ffff ffff)8

在这里插入图片描述

1.2.2, 指定分片查询

[root@test-c6 ~]# curl "http://test-c6:8983/solr/solr_hs2_shard2_replica1/select?q=*%3A*&wt=json&indent=true&shards=shard1" -s |grep numFound
  "response":{
    
    "numFound":389922,"start":0,"maxScore":1.0,"docs":[
  
[root@test-c6 ~]# curl "http://test-c6:8983/solr/solr_hs2_shard2_replica1/select?q=*%3A*&wt=json&indent=true&shards=shard1,shard2" -s |grep numFound
  "response":{
    
    "numFound":779446,"start":0,"maxScore":1.0,"docs":[

1.2.3, shard分裂: (路由策略: compositeId)

  • SPLITSHARD: 可以指定分片固定范围( 将其分成两部分,并作为两个(新)分片写入磁盘。原始分片将继续按原样包含相同的数据,但是它将开始将请求重新路由到新分片)
  • curl "http://test-c6:8983/solr/admin/collections?action=CLUSTERSTATUS&collection=test32&indent=true&wt=json"
    在这里插入图片描述
#参数说明
#ranges:以逗号分隔的十六进制哈希范围列表,例如ranges=0-1f4,1f5-3e8,3e9-5dc
#numSubShards: 将父分片拆分为的子分片的数量。允许的值在2-范围内,8默认为2。仅当ranges或未split.key指定时,才可以使用此参数
#splitMethod:rewrite(默认值)重新创建子索引; link 使用文件系统级硬链接创建原始索引文件的副本,生成的子索引仍然与原始索引一样大,因为它们仍然包含不属于该分区的文档中的数据
curl "http://test-c6:8983/solr/admin/collections?action=SPLITSHARD&collection=test1&shard=shard1&numSubShards=2&wt=xml"
curl "http://test-c6:8983/solr/admin/collections?action=DELETESHARD&shard=shard1&collection=test1"

1.2.4, shard创建/删除: (路由策略: implicit )

#路由策略: implicit --> CREATESHARD 
# 均衡添加数据的方式1: 
#	schema.xml添加字段 <field name="_route_" type="string"/>
#	指定doc具体落在哪个shard上:doc.addField("_route_", "shard_X");
# 均衡添加数据的方式2 (字段必须存在,并且是指定的shard名称,否则插入数据失败): 
#	router.field=shard_id

curl "http://test-c6:8983/solr/admin/collections?action=CREATE&name=test32&router.name=implicit&shards=shard1,shard2&replicationFactor=1&maxShardsPerNode=1&collection.configName=test"

#查看collection状态及shard分布
curl "http://test-c6:8983/solr/admin/collections?action=clusterstatus&collection=test32&indent=true&wt=json"
返回数据: {
    
    
  "responseHeader":{
    
    
    "status":0,
    "QTime":20},
  "cluster":{
    
    
    "collections":{
    
    
      "test32":{
    
    
        "routerSpec":{
    
    "name":"implicit"},
        "replicationFactor":"1",
        "shards":{
    
    
          "shard1":{
    
    
            "range":null,
            "state":"active",
            "replicas":{
    
    "core_node1":{
    
    
                "core":"test32_shard1_replica1",
                "base_url":"http://test-c6:8983/solr",
                "node_name":"test-c6:8983_solr",
                "state":"active",
                "leader":"true"}}},
          "shard2":{
    
    
            "range":null,
            "state":"active",
            "replicas":{
    
    "core_node2":{
    
    
                "core":"test32_shard2_replica1",
                "base_url":"http://test-c62:8983/solr",
                "node_name":"test-c62:8983_solr",
                "state":"active",
                "leader":"true"}}}},
        "router":"implicit",
        "maxShardsPerNode":"1",
        "autoAddReplicas":"false"}},
    "properties":{
    
    "urlScheme":"http"},
    "live_nodes":["test-c6:8983_solr",
      "test-c62:8983_solr"]}}

#删除shard: inactive, or which have no range given for custom sharding
curl "http://test-c6:8983/solr/admin/collections?action=DELETESHARD&shard=shard2&collection=test32"

#路由策略: implicit --> CREATESHARD 创建shard
curl "http://test-c6:8983/solr/admin/collections?action=CREATESHARD&shard=shard2&collection=test32"

#指定分片查询: 查看分片管理的doc 数量 (总数= shard1 + shard2 )
[root@test-c6 ~]# curl "http://test-c6:8983/solr/test32/select?q=price%3A%5B++20000+TO+*+%5D&wt=json&indent=true&rows=0"  -s |grep response\"
  "response":{
    
    "numFound":9999,"start":0,"maxScore":1.0,"docs":[]
[root@test-c6 ~]# curl "http://test-c6:8983/solr/test32/select?q=price%3A%5B++20000+TO+*+%5D&wt=json&indent=true&rows=0&shards=shard1"  -s |grep response\"
  "response":{
    
    "numFound":0,"start":0,"docs":[]
[root@test-c6 ~]# curl "http://test-c6:8983/solr/test32/select?q=price%3A%5B++20000+TO+*+%5D&wt=json&indent=true&rows=0&shards=shard2"  -s |grep response\"
  "response":{
    
    "numFound":9999,"start":0,"maxScore":1.0,"docs":[]

1.2.5, collection别名/core重命名

  • 使用场景: 一个旧的collection仅有一个shard,现业务增加,需要扩容为3个sahrd:
    新建collection, 导入数据,停止旧的collection, 创建别名,以使旧的接口能够继续使用
#solr cloud : collection别名 (test1--> test1, test1_ali )
curl "http://localhost:8983/solr/admin/collections?action=CREATEALIAS&name=test1_ali&collections=test1"

#solr stand : core重命名 (test --> test2)
curl "http://localhost:8089/solr/admin/cores?action=RENAME&core=test&other=test2"

2,使用api ,读写数据

2.1, 读写流程

图片来源:https://blog.csdn.net/hxpjava1/article/details/78134251
在这里插入图片描述

2.2, solradmin 读写数据/原子更新

参考:https://lucene.apache.org/solr/guide/7_7/uploading-data-with-index-handlers.html#uploading-data-with-index-handlers
原子更新:https://cwiki.apache.org/confluence/display/solr/UpdateXmlMessages#UpdateXmlMessages-Optionalattributesfor%22add%22

#添加doc(覆盖)
<add>
  <doc>
     <field name="id">11</field>
     <field name="manu">Patrick Eagar</field>
  </doc>
</add>

#修改doc(原子添加/更新/删除字段)
<add>
  <doc>
      <field name="id">1</field>
      <field name="title" update="add">addmutivalue_or_addfiled</field>
      <field name="name" update="set">add_or_update_name</field>   
      <field name="subject" update="set" null="true" />
  </doc>
</add>

#删除(doc)
<delete>
    <id>11</id>
    <query>manu:"Patrick Eagar"</query>
</delete>

#立即提交
<commit waitSearcher="false"/>
<commit waitSearcher="false" expungeDeletes="true"/>
<optimize waitSearcher="false"/>

在这里插入图片描述

高亮显示

  • 默认斜体显示高亮内容:hl.simple.pre/post: <em> xxx </em>
  • 设置红色字体显示高亮内容:hl.simple.pre/post: 可改为<font color=“red”> xxx </font>
    在这里插入图片描述
"response": {
    
    
    "numFound": 2,
    "start": 0,
    "maxScore": 0.6043929,
    "docs": [
      {
    
    
        "id": "id111",
        "title": [
          "t111xx"
        ],
        "_version_": 1686024315139522600
      },
      {
    
    
        "id": "id5",
        "title": [
          "123555 test test123 xasdfasdf"
        ],
        "_version_": 1686025007598141400
      }
    ]
  },
  "highlighting": {
    
    
    "id111": {
    
    
      "title": [
        "<em>t111xx</em>"
      ],
      "id": [
        "<em>id111</em>"
      ]
    },
    "id5": {
    
    
      "title": [
        "<em>123555</em> test <em>test123</em> xasdfasdf"
      ]
    }
  }
}

2.3, 使用post.jar 导入文本数据

#jar包路径
[root@test-c6 solr]# find /opt/cloudera/parcels/ -name post.jar
/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/jars/post.jar
/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/share/doc/solr-doc-4.10.3+cdh5.12.0+513/example/exampledocs/post.jar
/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/hue/apps/search/examples/bin/post.jar

#jar包使用方法
[root@test-c6 solr]# java -jar /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/jars/post.jar   -h
SimplePostTool version 1.5
Usage: java [SystemProperties] -jar post.jar [-h|-] [<file|folder|url|arg> [<file|folder|url|arg>...]]
....
Examples:
  java -jar post.jar *.xml
  java -Ddata=args  -jar post.jar '<delete><id>42</id></delete>'
  java -Ddata=stdin -jar post.jar < hd.xml
  java -Ddata=web -jar post.jar http://example.com/
  java -Dtype=text/csv -jar post.jar *.csv
  java -Dtype=application/json -jar post.jar *.json
  java -Durl=http://localhost:8983/solr/update/extract -Dparams=literal.id=a -Dtype=application/pdf -jar post.jar a.pdf
  java -Dauto -jar post.jar *
  java -Dauto -Drecursive -jar post.jar afolder
  java -Dauto -Dfiletypes=ppt,html -jar post.jar afolder

#导入测试数据
[root@test-c6 solr]# find /opt/cloudera/parcels/ -name *.xml |grep solr |grep exampledocs
/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/share/doc/solr-doc-4.10.3+cdh5.12.0+513/example/multicore/exampledocs/ipod_other.xml
/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/share/doc/solr-doc-4.10.3+cdh5.12.0+513/example/multicore/exampledocs/ipod_video.xml

[root@test-c6 solr]# java  -Durl=http://localhost:8983/solr/jcse_shard1_replica1/update \
	-jar /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/jars/post.jar \
	/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/share/doc/solr-doc-4.10.3+cdh5.12.0+513/example/exampledocs/*.xml
SimplePostTool version 1.5
Posting files to base url http://localhost:8983/solr/jcse_shard1_replica1/update using content-type application/xml..
POSTing file gb18030-example.xml
POSTing file hd.xml
POSTing file ipod_other.xml
POSTing file ipod_video.xml
POSTing file manufacturers.xml
POSTing file mem.xml
POSTing file money.xml
POSTing file monitor2.xml
POSTing file monitor.xml
POSTing file mp500.xml
POSTing file sd500.xml
POSTing file solr.xml
POSTing file utf8-example.xml
POSTing file vidcard.xml
14 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/jcse_shard1_replica1/update..
Time spent: 0:00:00.384
You have new mail in /var/spool/mail/root

2.4, solrj 读写数据:性能对比

操作 类名 写入性能(一万条) 读取性能(一万条)
CloudSolrServer 58,572ms (HttpSolrServer写入: 102,330ms) 1,254ms (HttpSolrServer读取: 640ms)
ConcurrentUpdateSolrServer 3,787ms (CloudSolrServer写入: 58,572ms)
//添加maven依赖
 <dependency>
     <groupId>org.apache.solr</groupId>
     <artifactId>solr-solrj</artifactId>
     <version>4.10.3</version>
 </dependency>


	public static void batchadd() throws IOException, SolrServerException {
    
    
        String solrServerUrl = "http://192.168.56.161:8983/solr/test2";
        HttpClientBuilder httpClientBuilder = HttpClientBuilder.create();
        //用户名密码登录
        //CredentialsProvider provider = new BasicCredentialsProvider();
        //UsernamePasswordCredentials credentials = new UsernamePasswordCredentials("user", "passwd");
        //provider.setCredentials(AuthScope.ANY, credentials);
        //httpClientBuilder = httpClientBuilder.addInterceptorFirst(new PreemptiveAuthInterceptor()).setDefaultCredentialsProvider(provider);
        HttpClient httpClient = httpClientBuilder.build();
        HttpSolrClient solr = new HttpSolrClient.Builder().withBaseSolrUrl(solrServerUrl)
                .withHttpClient(httpClient)
                .build();

        List<SolrInputDocument> collect= new ArrayList<>();
        for (int id = 0; id < 9999; id++) {
    
    
            SolrInputDocument document = new SolrInputDocument();
            document.addField("id", "" + id);
            document.addField("name", "book3中文" + id);
            document.addField("price", "" + id);
            collect.add(document);
            //commit per 2000 doc
            if (id % 2000 ==0 ) {
    
    
                solr.add(collect);
                solr.commit();
                collect.clear();
            }
        }
        //commit last docs
        if (collect.size()>0 ){
    
    
            solr.add(collect);
            solr.commit();
            collect.clear();
        }
    }



    public static void query() throws SolrServerException, IOException {
    
    
        String solrServerUrl = "http://192.168.56.161:8983/solr/test2";
        HttpClientBuilder httpClientBuilder = HttpClientBuilder.create();
        //用户名密码登录
        //CredentialsProvider provider = new BasicCredentialsProvider();
        //UsernamePasswordCredentials credentials = new UsernamePasswordCredentials("user", "passwd");
        //provider.setCredentials(AuthScope.ANY, credentials);
        //httpClientBuilder = httpClientBuilder.addInterceptorFirst(new PreemptiveAuthInterceptor()).setDefaultCredentialsProvider(provider);
        HttpClient httpClient = httpClientBuilder.build();
        HttpSolrClient solr = new HttpSolrClient.Builder().withBaseSolrUrl(solrServerUrl)
                .withHttpClient(httpClient)
                .build();

        SolrQuery solrQuery = new SolrQuery();
        solrQuery.setTimeAllowed(-1);
        solrQuery.setQuery("*:*");
//        solrQuery.setFields("*");
//        solrQuery.setParam("fl", "id, DATETIME, DATE_BIRTH");
//        solrQuery.setParam("sort", "id asc");
//        solrQuery.addSort("id", SolrQuery.ORDER.asc); // Pay attention to this line
//        solrQuery.setRows(0);

        QueryRequest req = new QueryRequest(solrQuery);
        QueryResponse response = req.process(solr);
        long nDocuments = response.getResults().getNumFound();
        System.out.println("Found " + nDocuments + " documents");
        System.out.println(response);
        SolrDocumentList list = response.getResults();
        //long numFound = list.getNumFound();
        for (SolrDocument doc : list) {
    
    
            Set<String> keys = doc.keySet();
            for (String k : keys) {
    
    
                Object value = doc.get(k);
                System.out.println(k + "=>" + value);
            }
            System.out.println("---------");
        }
        solr.close();
    }

原子更新:添加某doc某字段/修改字段的数据

参考:https://cwiki.apache.org/confluence/display/solr/UpdateXmlMessages#UpdateXmlMessages-Optionalattributesfor%22add%22
在这里插入图片描述

import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.CloudSolrServer;
import org.apache.solr.common.SolrInputDocument;

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

public class Test2 {
    
    
    public static void main(String[] args) throws IOException, SolrServerException {
    
    
        String zk= "192.168.56.161:2181";
        String root="/solr";
        CloudSolrServer solrClient = new CloudSolrServer(zk+root);
        solrClient.setDefaultCollection("test1");
        SolrInputDocument doc = new SolrInputDocument();

        //添加一个新字段 或往一个mutivalue字段中添加一个数据: 注意字段重复会报错
        Map<String,String> map1=new HashMap<String, String>();
        map1.put("add","title3");
        doc.addField("title",map1);

        //更新某字段的值:存在则更新,否则添加新字段
        Map<String,String> map2=new HashMap<String, String>();
        map2.put("set","sub2");
        doc.addField("subject", map2);

        //使某字段值:自增一个数
        Map<String,String> map3=new HashMap<String, String>();
        map3.put("inc","10");
        doc.addField("price", map3);

        doc.addField("id","1");
        solrClient.add(doc);
        solrClient.shutdown();
    }
}

猜你喜欢

转载自blog.csdn.net/eyeofeagle/article/details/110646900