Spark处理数据如何获得行号 - 代码天地

Spark处理数据如何获得行号

其他 2018-05-11 21:50:30 阅读次数: 1

因为Spark并行的处理数据，所以你不能在自己的driver program中计数到底是处理到第几个。Spark提供了zipWithIndex可以给你提供索引号。这个索引号是全局有序和唯一的。

public RDD<scala.Tuple2<T,Object>> zipWithIndex()
Zips this RDD with its element indices. The ordering is first based on the partition index and then the ordering of items within each partition. So the first item in the first partition gets index 0, and the last item in the last partition receives the largest index.
This is similar to Scala's zipWithIndex but it uses Long instead of Int as the index type. This method needs to trigger a spark job when this RDD contains more than one partitions.

Note that some RDDs, such as those returned by groupBy(), do not guarantee order of elements in a partition. The index assigned to each element is therefore not guaranteed, and may even change if the RDD is reevaluated. If a fixed ordering is required to guarantee the same index assignments, you should sort the RDD with sortByKey() or save it to a file.

另一个可以让你得到唯一标示符的函数是zipWithUniqueId,跟zipWithIndex不同的是，zipWithUniqueId只保证唯一，不保证连续性，中间可能有gap.

原文:http://blog.csdn.net/hongchangfirst/article/details/80175839

作者:hongchangfirst

hongchangfirst的主页:http://blog.csdn.net/hongchangfirst

猜你喜欢

转载自blog.csdn.net/hongchangfirst/article/details/80175839

Spark处理数据如何获得行号

【Spark系列】：如何处理数据倾斜

获得 bootstrapTable行号index

Spark处理数据倾斜

spark数据倾斜处理

Python微博爬取实战（三）爬虫获得的json格式数据如何处理

cat与nl对行号的处理

Spark处理时间序列数据

spark处理数据写入kafka

spark处理数据至mysql

Spark大数据处理

spark数据倾斜处理方案

Spark任务数据倾斜处理

spark数据倾斜处理实践

Spark数据分析及处理

大数据开发实战：美团是如何应用Spark处理大数据的？

如何用spark对清理好的数据进行数据处理

Spark系列（三）更快一点--Spark运行处理数据原理，如何简单提高并行度？？

如何获得财务数据信息

Spark 中在处理大批量数据排序问题时，如何避免OOM

【华为云技术分享】Spark如何与深度学习框架协作，处理非结构化数据

如何基于Apache Pulsar和Spark进行批流一体的弹性数据处理？

当 Spark 任务出现数据倾斜的问题时该如何处理呢？

Spark和Hadoop都可以进行大数据处理，我们应该如何选择处理平台?

如何去除文件中的行号

jupyter notebook如何显示行号？

Visual Studio 如何显示行号

如何缩短IDEA行号的距离

Android studio如何显示行号

jgGrid获得的id值是主键的id而不是jqGrid的行号值

今日推荐

周排行

8种防盗链的方法

php的序列化和反序列化

Java 8：CompletableFuture

Android版本差异适配方案(5.0-9.0)

makedownpad使用

Spring Boot 使用AOP切面实现后台日志管理模块

实战SSM_O2O商铺_44【DES加密】关键配置信息进行DES加密

ACM排行榜说明

【转】SQL重复记录查询

板球和秃子威力那个大

每日归档

更多

2024-09-15(0)

2024-09-14(0)

2024-09-13(0)

2024-09-12(0)

2024-09-11(0)

2024-09-10(0)

2024-09-09(0)

2024-09-08(0)

2024-09-07(0)

2024-09-06(0)