spark独立程序方式创建文件流

cd /usr/local/spark/mycode/streaming/logfile
vim FileStreaming.py

#!usr/bin/env python3

from pyspark import SparkContext,SparkConf
from pyspark.streaming import StreamingContext

conf = SparkConf()
conf.setAppName('TestStream').setMaster('local[2]')
sc = SparkContext(conf=conf)
ssc = streamingContext(sc,10)#10秒一个信息
lines = ssc.textFileStream('file:///usr/local/spark/mycode/streaming/logfile')
words = lines.flatMap(lambda x:x.split(''))
wordsCounts = words.map(lambda x:(x,1)).reduceByKey(lambda a,b:a+b)
wordsCounts.pprint()
ssc.start()
ssc.awaitTermination()

发布了25 篇原创文章 · 获赞 0 · 访问量 375

猜你喜欢

转载自blog.csdn.net/qq_45371603/article/details/104615794