netty包版本冲突
报错代码
java.lang.NoSuchMethodError: io.netty.buffer.PooledByteBufAllocator.defaultNumHeapArena()I
at org.apache.spark.network.util.NettyUtils.createPooledByteBufAllocator(NettyUtils.java:113)
at org.apache.spark.network.client.TransportClientFactory.<init>(TransportClientFactory.java:106)
at org.apache.spark.network.TransportContext.createClientFactory(TransportContext.java:99)
at org.apache.spark.rpc.netty.NettyRpcEnv.<init>(NettyRpcEnv.scala:71)
at org.apache.spark.rpc.netty.NettyRpcEnvFactory.create(NettyRpcEnv.scala:461)
at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:57)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:249)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:175)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:257)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:424)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.zeppelin.spark.BaseSparkScalaInterpreter.spark2CreateContext(BaseSparkScalaInterpreter.scala:263)
at org.apache.zeppelin.spark.BaseSparkScalaInterpreter.createSparkContext(BaseSparkScalaInterpreter.scala:182)
at org.apache.zeppelin.spark.SparkScala211Interpreter.open(SparkScala211Interpreter.scala:90)
at org.apache.zeppelin.spark.NewSparkInterpreter.open(NewSparkInterpreter.java:102)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:62)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:616)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
针对上面的问题,在谷歌中搜索”Zeppelin java.lang.NoSuchMethodError: io.netty.buffer.PooledByteBufAllocator.defaultNumHeapArena()“,第一个答案是 https://stackoverflow.com/questions/50388919/spark-2-3-java-lang-nosuchmethoderror-io-netty-buffer-pooledbytebufallocator-me,其它的答案也给出了相同的解决方法:用Spark的netty包替换Zeppelin的netty包,先查看一下Spark和Zeppelin各自Netty包的版本:
# 在SPARK安装目录下执行
$ ls jars | grep netty
netty-3.9.9.Final.jar
netty-all-4.1.17.Final.jar
# 在Zeppelin安装目录执行
$ ls lib | grep netty
netty-all-4.0.23.Final.jar
可以看到Spark的netty-all包版本是 4.1.17
,Zeppelin是 4.0.23
,进行替换:
# 在Zeppelin安装目录执行
$ rm lib/netty-all-4.0.23.Final.jar
$ cp ~/spark-2.4.4-bin-hadoop2.7/jars/netty-all-4.1.17.Final.jar lib/
然后点击Notebook页面右上角的齿轮按钮:
在弹出的菜单中,点击spark前面的重启按钮:
在弹出的确认重启对话框中点击 OK
,重启完成后点击 Save
,再重启执行最初的两行代码。
注意:重启spark解释器这个方法在很多时候会用到,比如更改配置、Spark任务异常等等,因为不需要重启整个Zeppelin,所以较为方便
zlib库不可用
在谷歌里搜索“zipimport.ZipImportError: can’t decompress data; zlib not available”,原因似乎是编译python的时候没有将包含zlib,因此需要安装zlib后并重新编译python,这个作为一个备选方案,因为我的系统里同时装了python2和python3,所以我先按照前文修改 master 的方式修改了spark解释器的 zeppelin.pyspark.python,将它改为了 python3,重启解释器后再运行代码,发现错误又变了:
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.readRDDFromFile.
: java.lang.ExceptionInInitializerError
at org.apache.spark.SparkContext.withScope(SparkContext.scala:699)
at org.apache.spark.SparkContext.parallelize(SparkContext.scala:716)
at org.apache.spark.api.python.PythonRDD$.readRDDFromInputStream(PythonRDD.scala:195)
at org.apache.spark.api.python.PythonRDD$.readRDDFromFile(PythonRDD.scala:175)
at org.apache.spark.api.python.PythonRDD.readRDDFromFile(PythonRDD.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Incompatible Jackson version: 2.8.11-1
at com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:64)
at com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19)
at com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:747)
at org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)
at org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)
... 16 more
所以下面解决这个错误,不过注意,如果替换为python3之后仍然有 zlib
错误的话,可能是需要重新编译或安装包含 zlib
库的python版本。
jackson-databind包版本冲突
上面的报错提示的很清晰,版本不兼容,和之前一样,查看一下Spark和Zeppelin各自的jackson包版本:
# 在SPARK安装目录下执行
$ ls jars | grep jackson-databind
jackson-databind-2.6.7.1.jar
# 在Zeppelin安装目录执行
$ ls lib | grep jackson
jackson-databind-2.8.11.1.jar
果然版本不同,和之前一样,用Spark的包替换Zeppelin的
重启解释器,终于没有报错