子雨大数据之Spark入门教程(Python版): http://dblab.xmu.edu.cn/blog/1709-2/
子雨大数据之Spark入门教程(Scala版): http://dblab.xmu.edu.cn/blog/spark/
https://blog.csdn.net/FlySky1991
PySpark pandas udf: https://www.imooc.com/article/269724
使用Pandas_UDF快速改造Pandas代码: https://www.cnblogs.com/wkang/p/10255043.html
在PySpark的并行跑xgboost模型: https://www.jianshu.com/p/3930039d298a
pySpark系列学习: https://blog.csdn.net/suzyu12345/article/category/6653162
PySpark学习笔记(1): https://blog.csdn.net/FlySky1991/article/details/79493830
PySpark学习笔记(2)——RDD基本操作: https://blog.csdn.net/FlySky1991/article/details/79556131
PySpark学习笔记(3)——DataFrame基本操作: https://blog.csdn.net/FlySky1991/article/details/79569846
PySpark学习笔记(4)——MLlib和ML介绍: https://blog.csdn.net/FlySky1991/article/details/79671106
PySpark学习笔记(5)——文本特征处理: https://blog.csdn.net/FlySky1991/article/details/79761506
PySpark学习笔记(6)——数据处理: https://blog.csdn.net/FlySky1991/article/details/79897334
PySpark学习笔记(7)——数据清洗: https://blog.csdn.net/FlySky1991/article/details/81239851
PySpark机器学习(1)——随机森林: https://blog.csdn.net/FlySky1991/article/details/80054421
PySpark机器学习(2)——GBDT: https://blog.csdn.net/FlySky1991/article/details/80080673
PySpark机器学习(3)——LR和SVM: https://blog.csdn.net/FlySky1991/article/details/80182501
PySpark机器学习(4)——KMeans和GMM: https://blog.csdn.net/FlySky1991/article/details/80226373
Python机器学习(1)——异常点检测: https://blog.csdn.net/FlySky1991/article/details/80526257
Python邮件发送: https://blog.csdn.net/FlySky1991/article/details/80396296
PySparK使用网格搜索查询最优化参数: https://www.cnblogs.com/cymx66688/p/10699018.html
learningSpark:
数据:http://www.tomdrabas.com/data/LearningPySpark/
git: https://github.com/drabastomek/learningPySpark
python spark API:https://blog.csdn.net/qingqing7/article/details/79251264
大数据相关学习( spark、hive、kafka ):https://www.iteblog.com
浅谈pandas,pyspark 的大数据ETL实践经验: https://cloud.tencent.com/developer/article/1384456
集群python环境:
import os
os.environ['SPARK_HOME'] = '/usr/local/workspace/spark-2.1.0-bin-hadoop2.7'
os.environ['PYSPARK_PYTHON'] = '/usr/local/bin/python3.5'
os.environ['PYSPARK_DRIVER_PYTHON']='python3'
from pyspark.sql import SparkSession
spark = SparkSession\
.builder\
.enableHiveSupport()\
.master("xxx.xxx.xxx.xxx:7077")\
.appName("my_first_app_name")\
.getOrCreate()
《Spark Python API 官方文档中文版》 之 pyspark.sql (一): https://www.bbsmax.com/A/QW5YW89Bzm/
《Spark Python API 官方文档中文版》 之 pyspark.sql (二): https://www.bbsmax.com/A/gVdnYnDD5W/
pyspark.sql官方文档:http://spark.apache.org/docs/1.6.2/api/python/pyspark.sql.html
子雨大数据之Spark入门教程(Python版): http://dblab.xmu.edu.cn/blog/1709-2/
子雨大数据之Spark入门教程(Scala版): http://dblab.xmu.edu.cn/blog/spark/
Python3 连接spark,spark集群: https://www.168seo.cn/pyspark/24754.html
详解pyspark以及添加xgboost支持: http://haihome.top/2018/06/16/dist-xgb.html
基于PySpark的机器学习环境搭建和模型开发:https://www.jianshu.com/p/5a5fc30a7a70
XGboost参数详解:https://blog.csdn.net/iyuanshuo/article/details/80142730