1,使用Rest Api, 管理solr
1.1, 管理configSet
参考:https://lucene.apache.org/solr/guide/7_7/configsets-api.html
#1, 查看cdh solr默认的configSet
[root@test-c6 ~]# solrctl instancedir --list
managedTemplate
managedTemplateSecure
predefinedTemplate
predefinedTemplateSecure
schemalessTemplate
schemalessTemplateSecure
[root@test-c6 ~]# find /opt/cloudera/parcels/CDH/lib/solr/ -iname "*template*"
/opt/cloudera/parcels/CDH/lib/solr/predefinedTemplate
/opt/cloudera/parcels/CDH/lib/solr/managedTemplateSecure
/opt/cloudera/parcels/CDH/lib/solr/predefinedTemplateSecure
/opt/cloudera/parcels/CDH/lib/solr/managedTemplate
/opt/cloudera/parcels/CDH/lib/solr/schemalessTemplateSecure
/opt/cloudera/parcels/CDH/lib/solr/coreconfig-template
/opt/cloudera/parcels/CDH/lib/solr/schemalessTemplate
/opt/cloudera/parcels/CDH/lib/solr/coreconfig-schemaless-template
#上传configSet配置文件--- 在cdh solr版 不管用
#(cd /opt/cloudera/parcels/CDH/lib/solr/predefinedTemplate && zip -r - *) > /tmp/demo.myconfigset.zip
#
#[root@test-c6 tmp]# ll /tmp/demo.myconfigset.zip -h
#-rw-r--r-- 1 root root 145K Dec 4 17:32 /tmp/demo.myconfigset.zip
#
#curl -X POST --header "Content-Type:application/octet-stream" -d @'/tmp/demo.myconfigset.zip' "http://test-c6:8983/solr/admin/configs?action=upload&name=demo"
#2, 列出已有的configSet
curl 'http://localhost:8983/solr/admin/configs?action=LIST&omitHeader=true&wt=json'
#3, 创建一个自定义名称的configSet:基于默认的某个配置
curl 'http://localhost:8983/solr/admin/configs?action=CREATE&name=demo4&baseConfigSet=predefinedTemplate&configSetProp.immutable=false&wt=json'
#4, 删除某个configSet
curl 'http://localhost:8983/solr/admin/configs?action=DELETE&name=demo2'
1.2, 修改schema.xml
参考: https://lucene.apache.org/solr/guide/7_7/schema-api.html
- add-field, replace-field, delete-field
- add-dynamic-field, replace-dynamic-field, delete-dynamic-field
# 添加字段 -----cdh solr 不适用
curl -X POST -H 'Content-type:application/json' --data-binary '{
"add-field":{
"name":"sell_by",
"type":"pdate",
"stored":true }
}' http://localhost:8983/solr/gettingstarted/schema
- add-field-type, delete-field-type, replace-field-type
# 添加字段类型 -----cdh solr 不适用
curl -X POST -H 'Content-type:application/json' --data-binary '{
"add-field-type" : {
"name":"myNewTxtField",
"class":"solr.TextField",
"positionIncrementGap":"100",
"analyzer" : {
"charFilters":[{
"class":"solr.PatternReplaceCharFilterFactory",
"replacement":"$1$1",
"pattern":"([a-zA-Z])\\\\1+" }],
"tokenizer":{
"class":"solr.WhitespaceTokenizerFactory" },
"filters":[{
"class":"solr.WordDelimiterFilterFactory",
"preserveOriginal":"0" }]}}
}' http://localhost:8983/solr/gettingstarted/schema
1.3, 修改solrconfig.xml
curl http://localhost:8983/solr/test1/config -H 'Content-type:application/json' -d '{
"add-requesthandler" : {
"name": "/dataimport",
"class":"solr.DataImportHandler",
"defaults":{ "config":"/opt/cloudera/parcels/CDH/share/doc/solr-doc-4.10.3+cdh5.12.0+513/example/example-DIH/solr/db/conf/db-data-config.xml" }
}
}'
1.4, 管理collection
参考:https://lucene.apache.org/solr/guide/7_7/collections-api.html
action分类:
- collection 管理:CREATE,LIST, DELETE, MODIFYCOLLECTION, RELOAD
- shard 分片管理:SPLITSHARD,CREATESHARD ,DELETESHARD
- replica 副本管理: DELETEREPLICA ,ADDREPLICA
#创建collection:
#curl 'http://test-c6:8983/solr/admin/collections' -d 'action=CREATE&name=demo&numShards=1&replicationFactor=1&maxShardsPerNode=2&collection.configName=managedTemplate&wt=json'
curl 'http://test-c6:8983/solr/admin/collections?action=CREATE&name=demo&numShards=1&replicationFactor=1&maxShardsPerNode=2&collection.configName=demo&wt=json'
#列出所有collection
curl 'http://test-c6:8983/solr/admin/collections?action=LIST'
#删除指定集合
curl 'http://test-c6:8983/solr/admin/collections?action=DELETE&name=demo'
#查看集群状态:OVERSEERSTATUS, CLUSTERSTATUS
curl 'http://test-c6:8983/solr/admin/collections?action=CLUSTERSTATUS'
1.2.1, 查看shard分片范围
curl "http://test-c6:8983/solr/admin/collections?action=CLUSTERSTATUS&collection=test32&indent=true&wt=json"
start(十六进制) | end(十六进制) |
---|---|
0 | 2^31-1 =2147483647=(7fff ffff)8 |
2^31=2147483648=(8000 0000)8 | 2^32-1 =4294967295 = (ffff ffff)8 |
1.2.2, 指定分片查询
[root@test-c6 ~]# curl "http://test-c6:8983/solr/solr_hs2_shard2_replica1/select?q=*%3A*&wt=json&indent=true&shards=shard1" -s |grep numFound
"response":{
"numFound":389922,"start":0,"maxScore":1.0,"docs":[
[root@test-c6 ~]# curl "http://test-c6:8983/solr/solr_hs2_shard2_replica1/select?q=*%3A*&wt=json&indent=true&shards=shard1,shard2" -s |grep numFound
"response":{
"numFound":779446,"start":0,"maxScore":1.0,"docs":[
1.2.3, shard分裂: (路由策略: compositeId)
- SPLITSHARD: 可以指定分片固定范围( 将其分成两部分,并作为两个(新)分片写入磁盘。原始分片将继续按原样包含相同的数据,但是它将开始将请求重新路由到新分片)
curl "http://test-c6:8983/solr/admin/collections?action=CLUSTERSTATUS&collection=test32&indent=true&wt=json"
#参数说明
#ranges:以逗号分隔的十六进制哈希范围列表,例如ranges=0-1f4,1f5-3e8,3e9-5dc
#numSubShards: 将父分片拆分为的子分片的数量。允许的值在2-范围内,8默认为2。仅当ranges或未split.key指定时,才可以使用此参数
#splitMethod:rewrite(默认值)重新创建子索引; link 使用文件系统级硬链接创建原始索引文件的副本,生成的子索引仍然与原始索引一样大,因为它们仍然包含不属于该分区的文档中的数据
curl "http://test-c6:8983/solr/admin/collections?action=SPLITSHARD&collection=test1&shard=shard1&numSubShards=2&wt=xml"
- DELETEREPLICA
Only non-active slices can be deleted:使用admin Web ui(核心admin下的红色按钮)将其卸载
https://solr.apache.org/guide/6_6/coreadmin-api.html: unload core
curl "http://test-c6:8983/solr/admin/collections?action=DELETESHARD&shard=shard1&collection=test1"
1.2.4, shard创建/删除: (路由策略: implicit )
#路由策略: implicit --> CREATESHARD
# 均衡添加数据的方式1:
# schema.xml添加字段 <field name="_route_" type="string"/>
# 指定doc具体落在哪个shard上:doc.addField("_route_", "shard_X");
# 均衡添加数据的方式2 (字段必须存在,并且是指定的shard名称,否则插入数据失败):
# router.field=shard_id
curl "http://test-c6:8983/solr/admin/collections?action=CREATE&name=test32&router.name=implicit&shards=shard1,shard2&replicationFactor=1&maxShardsPerNode=1&collection.configName=test"
#查看collection状态及shard分布
curl "http://test-c6:8983/solr/admin/collections?action=clusterstatus&collection=test32&indent=true&wt=json"
返回数据: {
"responseHeader":{
"status":0,
"QTime":20},
"cluster":{
"collections":{
"test32":{
"routerSpec":{
"name":"implicit"},
"replicationFactor":"1",
"shards":{
"shard1":{
"range":null,
"state":"active",
"replicas":{
"core_node1":{
"core":"test32_shard1_replica1",
"base_url":"http://test-c6:8983/solr",
"node_name":"test-c6:8983_solr",
"state":"active",
"leader":"true"}}},
"shard2":{
"range":null,
"state":"active",
"replicas":{
"core_node2":{
"core":"test32_shard2_replica1",
"base_url":"http://test-c62:8983/solr",
"node_name":"test-c62:8983_solr",
"state":"active",
"leader":"true"}}}},
"router":"implicit",
"maxShardsPerNode":"1",
"autoAddReplicas":"false"}},
"properties":{
"urlScheme":"http"},
"live_nodes":["test-c6:8983_solr",
"test-c62:8983_solr"]}}
#删除shard: inactive, or which have no range given for custom sharding
curl "http://test-c6:8983/solr/admin/collections?action=DELETESHARD&shard=shard2&collection=test32"
#路由策略: implicit --> CREATESHARD 创建shard
curl "http://test-c6:8983/solr/admin/collections?action=CREATESHARD&shard=shard2&collection=test32"
#指定分片查询: 查看分片管理的doc 数量 (总数= shard1 + shard2 )
[root@test-c6 ~]# curl "http://test-c6:8983/solr/test32/select?q=price%3A%5B++20000+TO+*+%5D&wt=json&indent=true&rows=0" -s |grep response\"
"response":{
"numFound":9999,"start":0,"maxScore":1.0,"docs":[]
[root@test-c6 ~]# curl "http://test-c6:8983/solr/test32/select?q=price%3A%5B++20000+TO+*+%5D&wt=json&indent=true&rows=0&shards=shard1" -s |grep response\"
"response":{
"numFound":0,"start":0,"docs":[]
[root@test-c6 ~]# curl "http://test-c6:8983/solr/test32/select?q=price%3A%5B++20000+TO+*+%5D&wt=json&indent=true&rows=0&shards=shard2" -s |grep response\"
"response":{
"numFound":9999,"start":0,"maxScore":1.0,"docs":[]
1.2.5, collection别名/core重命名
- 使用场景: 一个旧的collection仅有一个shard,现业务增加,需要扩容为3个sahrd:
新建collection, 导入数据,停止旧的collection, 创建别名,以使旧的接口能够继续使用
#solr cloud : collection别名 (test1--> test1, test1_ali )
curl "http://localhost:8983/solr/admin/collections?action=CREATEALIAS&name=test1_ali&collections=test1"
#solr stand : core重命名 (test --> test2)
curl "http://localhost:8089/solr/admin/cores?action=RENAME&core=test&other=test2"
2,使用api ,读写数据
2.1, 读写流程
图片来源:https://blog.csdn.net/hxpjava1/article/details/78134251
2.2, solradmin 读写数据/原子更新
参考:https://lucene.apache.org/solr/guide/7_7/uploading-data-with-index-handlers.html#uploading-data-with-index-handlers
原子更新:https://cwiki.apache.org/confluence/display/solr/UpdateXmlMessages#UpdateXmlMessages-Optionalattributesfor%22add%22
#添加doc(覆盖)
<add>
<doc>
<field name="id">11</field>
<field name="manu">Patrick Eagar</field>
</doc>
</add>
#修改doc(原子添加/更新/删除字段)
<add>
<doc>
<field name="id">1</field>
<field name="title" update="add">addmutivalue_or_addfiled</field>
<field name="name" update="set">add_or_update_name</field>
<field name="subject" update="set" null="true" />
</doc>
</add>
#删除(doc)
<delete>
<id>11</id>
<query>manu:"Patrick Eagar"</query>
</delete>
#立即提交
<commit waitSearcher="false"/>
<commit waitSearcher="false" expungeDeletes="true"/>
<optimize waitSearcher="false"/>
高亮显示
- 默认斜体显示高亮内容:hl.simple.pre/post: <em> xxx </em>
- 设置红色字体显示高亮内容:hl.simple.pre/post: 可改为<font color=“red”> xxx </font>
"response": {
"numFound": 2,
"start": 0,
"maxScore": 0.6043929,
"docs": [
{
"id": "id111",
"title": [
"t111xx"
],
"_version_": 1686024315139522600
},
{
"id": "id5",
"title": [
"123555 test test123 xasdfasdf"
],
"_version_": 1686025007598141400
}
]
},
"highlighting": {
"id111": {
"title": [
"<em>t111xx</em>"
],
"id": [
"<em>id111</em>"
]
},
"id5": {
"title": [
"<em>123555</em> test <em>test123</em> xasdfasdf"
]
}
}
}
2.3, 使用post.jar 导入文本数据
#jar包路径
[root@test-c6 solr]# find /opt/cloudera/parcels/ -name post.jar
/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/jars/post.jar
/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/share/doc/solr-doc-4.10.3+cdh5.12.0+513/example/exampledocs/post.jar
/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/hue/apps/search/examples/bin/post.jar
#jar包使用方法
[root@test-c6 solr]# java -jar /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/jars/post.jar -h
SimplePostTool version 1.5
Usage: java [SystemProperties] -jar post.jar [-h|-] [<file|folder|url|arg> [<file|folder|url|arg>...]]
....
Examples:
java -jar post.jar *.xml
java -Ddata=args -jar post.jar '<delete><id>42</id></delete>'
java -Ddata=stdin -jar post.jar < hd.xml
java -Ddata=web -jar post.jar http://example.com/
java -Dtype=text/csv -jar post.jar *.csv
java -Dtype=application/json -jar post.jar *.json
java -Durl=http://localhost:8983/solr/update/extract -Dparams=literal.id=a -Dtype=application/pdf -jar post.jar a.pdf
java -Dauto -jar post.jar *
java -Dauto -Drecursive -jar post.jar afolder
java -Dauto -Dfiletypes=ppt,html -jar post.jar afolder
#导入测试数据
[root@test-c6 solr]# find /opt/cloudera/parcels/ -name *.xml |grep solr |grep exampledocs
/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/share/doc/solr-doc-4.10.3+cdh5.12.0+513/example/multicore/exampledocs/ipod_other.xml
/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/share/doc/solr-doc-4.10.3+cdh5.12.0+513/example/multicore/exampledocs/ipod_video.xml
[root@test-c6 solr]# java -Durl=http://localhost:8983/solr/jcse_shard1_replica1/update \
-jar /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/jars/post.jar \
/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/share/doc/solr-doc-4.10.3+cdh5.12.0+513/example/exampledocs/*.xml
SimplePostTool version 1.5
Posting files to base url http://localhost:8983/solr/jcse_shard1_replica1/update using content-type application/xml..
POSTing file gb18030-example.xml
POSTing file hd.xml
POSTing file ipod_other.xml
POSTing file ipod_video.xml
POSTing file manufacturers.xml
POSTing file mem.xml
POSTing file money.xml
POSTing file monitor2.xml
POSTing file monitor.xml
POSTing file mp500.xml
POSTing file sd500.xml
POSTing file solr.xml
POSTing file utf8-example.xml
POSTing file vidcard.xml
14 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/jcse_shard1_replica1/update..
Time spent: 0:00:00.384
You have new mail in /var/spool/mail/root
2.4, solrj 读写数据:性能对比
操作 | 类名 | 写入性能(一万条) | 读取性能(一万条) |
---|---|---|---|
读 | CloudSolrServer | 58,572ms (HttpSolrServer写入: 102,330ms) | 1,254ms (HttpSolrServer读取: 640ms) |
写 | ConcurrentUpdateSolrServer | 3,787ms (CloudSolrServer写入: 58,572ms) |
//添加maven依赖
<dependency>
<groupId>org.apache.solr</groupId>
<artifactId>solr-solrj</artifactId>
<version>4.10.3</version>
</dependency>
public static void batchadd() throws IOException, SolrServerException {
String solrServerUrl = "http://192.168.56.161:8983/solr/test2";
HttpClientBuilder httpClientBuilder = HttpClientBuilder.create();
//用户名密码登录
//CredentialsProvider provider = new BasicCredentialsProvider();
//UsernamePasswordCredentials credentials = new UsernamePasswordCredentials("user", "passwd");
//provider.setCredentials(AuthScope.ANY, credentials);
//httpClientBuilder = httpClientBuilder.addInterceptorFirst(new PreemptiveAuthInterceptor()).setDefaultCredentialsProvider(provider);
HttpClient httpClient = httpClientBuilder.build();
HttpSolrClient solr = new HttpSolrClient.Builder().withBaseSolrUrl(solrServerUrl)
.withHttpClient(httpClient)
.build();
List<SolrInputDocument> collect= new ArrayList<>();
for (int id = 0; id < 9999; id++) {
SolrInputDocument document = new SolrInputDocument();
document.addField("id", "" + id);
document.addField("name", "book3中文" + id);
document.addField("price", "" + id);
collect.add(document);
//commit per 2000 doc
if (id % 2000 ==0 ) {
solr.add(collect);
solr.commit();
collect.clear();
}
}
//commit last docs
if (collect.size()>0 ){
solr.add(collect);
solr.commit();
collect.clear();
}
}
public static void query() throws SolrServerException, IOException {
String solrServerUrl = "http://192.168.56.161:8983/solr/test2";
HttpClientBuilder httpClientBuilder = HttpClientBuilder.create();
//用户名密码登录
//CredentialsProvider provider = new BasicCredentialsProvider();
//UsernamePasswordCredentials credentials = new UsernamePasswordCredentials("user", "passwd");
//provider.setCredentials(AuthScope.ANY, credentials);
//httpClientBuilder = httpClientBuilder.addInterceptorFirst(new PreemptiveAuthInterceptor()).setDefaultCredentialsProvider(provider);
HttpClient httpClient = httpClientBuilder.build();
HttpSolrClient solr = new HttpSolrClient.Builder().withBaseSolrUrl(solrServerUrl)
.withHttpClient(httpClient)
.build();
SolrQuery solrQuery = new SolrQuery();
solrQuery.setTimeAllowed(-1);
solrQuery.setQuery("*:*");
// solrQuery.setFields("*");
// solrQuery.setParam("fl", "id, DATETIME, DATE_BIRTH");
// solrQuery.setParam("sort", "id asc");
// solrQuery.addSort("id", SolrQuery.ORDER.asc); // Pay attention to this line
// solrQuery.setRows(0);
QueryRequest req = new QueryRequest(solrQuery);
QueryResponse response = req.process(solr);
long nDocuments = response.getResults().getNumFound();
System.out.println("Found " + nDocuments + " documents");
System.out.println(response);
SolrDocumentList list = response.getResults();
//long numFound = list.getNumFound();
for (SolrDocument doc : list) {
Set<String> keys = doc.keySet();
for (String k : keys) {
Object value = doc.get(k);
System.out.println(k + "=>" + value);
}
System.out.println("---------");
}
solr.close();
}
原子更新:添加某doc某字段/修改字段的数据
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.CloudSolrServer;
import org.apache.solr.common.SolrInputDocument;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
public class Test2 {
public static void main(String[] args) throws IOException, SolrServerException {
String zk= "192.168.56.161:2181";
String root="/solr";
CloudSolrServer solrClient = new CloudSolrServer(zk+root);
solrClient.setDefaultCollection("test1");
SolrInputDocument doc = new SolrInputDocument();
//添加一个新字段 或往一个mutivalue字段中添加一个数据: 注意字段重复会报错
Map<String,String> map1=new HashMap<String, String>();
map1.put("add","title3");
doc.addField("title",map1);
//更新某字段的值:存在则更新,否则添加新字段
Map<String,String> map2=new HashMap<String, String>();
map2.put("set","sub2");
doc.addField("subject", map2);
//使某字段值:自增一个数
Map<String,String> map3=new HashMap<String, String>();
map3.put("inc","10");
doc.addField("price", map3);
doc.addField("id","1");
solrClient.add(doc);
solrClient.shutdown();
}
}