pyspark读取Mysql数据:
样例code 1:
from pyspark.sqlimportSQLContext
sqlContext = SQLContext(sc)
dataframe_mysql = sqlContext.read.format("jdbc").options(url="jdbc:mysql://127.0.0.1:3306/spark_db", driver="com.mysql.jdbc.Driver", dbtable="spark_table", user="root", password="root").load()
dataframe_mysql.show()
样例code 2:
from pyspark import SparkContext,SQLContext
from pyspark.sql import SQLContext
sc = SparkContext("spark://train01:7077","LDASample")
sqlContext=SQLContext(sc)
jdbcDf=sqlContext.read.format("jdbc").options(url="jdbc:mysql://10.10.10.10:3306/adl",driver="com.mysql.jdbc.Driver",dbtable="(SELECT code,title,description FROM project) tmp",user="mouren",password="mouren").load()
print(jdbcDf.select('description').show(2))
前提:配置文件/etc/spark/conf/spark-env.sh
+export SPARK_CLASSPATH=$SPARK_CLASSPATH:/opt/mysql-connector-java/mysql-connector-java-5.1.40-bin.jar
这样的配置有时报错:
WARN spark.SparkConf: Setting 'spark.executor.extraClassPath' to ':/opt/mysql-connector-java/mysql-connector-java-5.1.40-bin.jar' as a work-around.
解决方案:
去掉上面的配置,编辑spark-defaults.conf
+spark.executor.extraClassPath /opt/mysql-connector-java/mysql-connector-java-5.1.40-bin.jar