I already integrated carrot2 with solr-4.x with my customerized chinese tokenizer successfully.
But I run some errors following my series of blogs http://ylzhj02.iteye.com/blog/2152348 to adopt carrot2 to solr-5.1.0
The error is
org.carrot2.util.factory.FallbackFactory; Tokenizer for Chinese Simplified (zh_cn) is not available. This may degrade clustering quality of Chinese Simplified content. Cause: java.lang.NoSuchMethodError: org.apache.lucene.analysis.Tokenizer.<init>(Ljava/io/Reader;)V
The reason is that solr-5.2.1 adopted lucene 5.1.0, however carrot2-3.10.0 used lucene 4.6.0. So the cause is jars uncompatible.
So, the solution is to download the latest version of carrot2 #git clone git://github.com/carrot2/carrot2.git (3.11.0) the lucene version is now 5.1.0 #cd carrot2 step 1: #vi core/carrot2-util-text/src/org/carrot2/text/linguistic/DefaultTokenizerFactory.java addimport org.carrot2.text.linguistic.lucene.InokChineseTokenizerAdapter;change
100 map.put(LanguageCode.CHINESE_SIMPLIFIED, 101 new NewClassInstanceFactory<ITokenizer>(ChineseTokenizerAdapter.class));to
map.put(LanguageCode.CHINESE_SIMPLIFIED, new NewClassInstanceFactory<ITokenizer>(InokChineseTokenizerAdapter.class));step 2: #vi InokChineseTokenizerAdapter.java #cp chineseTokenizer/InokChineseTokenizerAdapter.java ./core/carrot2-util-text/src/org/carrot2/text/linguistic/lucene/ step 3: #mkdir lib/org.lionsoul.jcseg ├── build.properties
├── jcseg-core-1.9.6.jar
├── jcseg.LICENSE
└── META-INF
└── MANIFEST.MF the file and jars is build.properties
bin.includes = META-INF/,\ jcseg-core-1.9.6.jar,\ jcseg.LICENSEMETA-INF/MANIFEST.MF
Manifest-Version: 1.0 Bundle-ManifestVersion: 2 Bundle-Name: Jcseg Tokenizer Bundle-SymbolicName: org.lionsoul.jcseg Bundle-Version: 1.9.6 Bundle-ClassPath: jcseg-core-1.9.6.jar Bundle-Vendor: INokNok Inc. Bundle-RequiredExecutionEnvironment: JavaSE-1.6step 4: modify build.xml
141 <patternset id="lib.test"> 142 <include name="core/**/*.jar" /> 143 <include name="lib/**/*.jar" /> 144 <include name="lib/org.lionsoul.jcseg/*.jar" /> 145 <exclude name="lib/org.slf4j/slf4j-nop*" /> 146 <include name="applications/carrot2-dcs/**/*.jar" /> 147 <include name="applications/carrot2-webapp/lib/*.jar" /> 148 <include name="applications/carrot2-benchmarks/lib/*.jar" /> 149 </patternset>
173 <patternset id="lib.core"> 174 <include name="lib/**/*.jar" /> 175 <include name="lib/org.lionsoul.jcseg/*.jar" /> 176 <include name="core/carrot2-util-matrix/lib/*.jar" /> 177 <patternset refid="lib.core.excludes" /> 178 </patternset>
180 <patternset id="lib.core.mini"> 181 <include name="lib/**/mahout-*.jar" /> 182 <include name="lib/**/jcseg*.jar" /> 183 <include name="lib/**/mahout.LICENSE" /> 184 <include name="lib/**/colt.LICENSE" /> 185 <include name="lib/**/commons-lang*" /> 186 <include name="lib/**/guava*" /> 187 <include name="lib/**/jackson*" /> 188 <include name="lib/**/lucene-snowball*" /> 189 <include name="lib/**/lucene.LICENSE" /> 190 <include name="lib/**/hppc-*.jar" /> 191 <include name="lib/**/hppc*.LICENSE" /> 192 193 <include name="lib/**/slf4j-api*.jar" /> 194 <include name="lib/**/slf4j-nop*.jar" /> 195 <include name="lib/**/slf4j.LICENSE" /> 196 197 <include name="lib/**/attributes-binder-*.jar" /> 198 </patternset> 199
906 <target name="core" depends="jar, jar.src, lib-no-jar.flattened" description="Builds Carrot2 Java API JAR with dependencies"> 907 <delete dir="${api.dir}" failonerror="false" /> 908 <mkdir dir="${api.dir}" /> 909 <mkdir dir="${api.dir}/lib" /> 910 <mkdir dir="${api.dir}/examples" /> 911 <mkdir dir="${api.dir}/resources" /> 912 913 <patternset id="carrot2.required"> 914 <include name="**/jcseg*" /> 915 <include name="**/commons-lang*" />step 6: #ant jar #scp tmp/jar/carrot2-core-3.11.0-SNAPSHOT.jar [email protected]:/opt/solr/contrib/clustering/lib
carrot2-core-3.11.0-SNAPSHOT.jar restart solr server to test clustering ----------------------------- An error happans
org.apache.solr.common.SolrException; null:java.lang.RuntimeException: java.lang.NoClassDefFoundError: com /carrotsearch/hppc/ObjectHashSetSolution : #scp lib/com.carrotsearch.hppc/hppc-0.7.1.jar [email protected]:/opt/solr/contrib/clustering/lib/
hppc-0.7.1.jar #rm -f opt/solr/contrib/clustering/lib/hppc-0.5.2.jar ------ another error is
java.lang.RuntimeException: java.lang.IllegalAccessError: class com.carrotsearch.hppc.ObjectHashSet cannot access its superclass com.carrotsearch.hppc.AbstractObjectCollectionThe reason is that there is an old hppc-0.5.2.jar in /opt/solr/server/webapps/solr.war so, Solution is to #cd /opt/solr/server/solr-webapp/webapp #rm -f WEB-INF/lib/hppc-0.5.2.jar #cp hppc-0.7.1.jar WEB-INF/lib #jar cf solr.war ./ #mv solr.war /opt/solr/server/webapps restart solr the error disappers