1. 背景

Spark Structured Streaming 读取kafka，然后进行转换，最后写入到kafka中，中间运行的时候出现这个，但是清除checkpoin目录后，就可以正常使用。

错误代码：

assertion failed:concurrent update to the log .mutiple streaming jobs delete

2. 源码定位

spark 2.3版本

org.apache.spark.sql.execution.streaming.MicroBatchExecution#constructNextBatch

 updateStatusMessage("Writing offsets to log")
      reportTimeTaken("walCommit") {
        assert(offsetLog.add(
          currentBatchId,
          availableOffsets.toOffsetSeq(sources, offsetSeqMetadata)),
          s"Concurrent update to the log. Multiple streaming jobs detected for $currentBatchId")
        logInfo(s"Committed offsets for batch $currentBatchId. " +
          s"Metadata ${offsetSeqMetadata.toString}")

        // NOTE: The following code is correct because runStream() processes exactly one
        // batch at a time. If we add pipeline parallelism (multiple batches in flight at
        // the same time), this cleanup logic will need to change.

        // Now that we've updated the scheduler's persistent checkpoint, it is safe for the
        // sources to discard data from the previous batch.
        if (currentBatchId != 0) {
          val prevBatchOff = offsetLog.get(currentBatchId - 1)
          if (prevBatchOff.isDefined) {
            prevBatchOff.get.toStreamProgress(sources).foreach {
              case (src: Source, off) => src.commit(off)
              case (reader: MicroBatchReader, off) =>
                reader.commit(reader.deserializeOffset(off.json))
            }
          } else {
            throw new IllegalStateException(s"batch $currentBatchId doesn't exist")
          }
        }

3.问题解决

暂无

assertion failed:concurrent update to the log .mutiple streaming jobs delete 4

1. 背景

2. 源码定位

3.问题解决

猜你喜欢