Storm Stream grouping

在Storm中, 开发者可以为上游spout/bolt发射出的tuples指定下游bolt的哪个/哪些task(s)来处理该tuples。这种指定在storm中叫做对stream的分组,即stream grouping,分组方式主要有以下6种

  • Shuffle Grouping 或 None Grouping
  • Fields Grouping
  • All Grouping
  • Global Grouping
  • LocalOrShuffle Grouping
  • Direct Grouping

1. Shuffle Grouping或None Grouping

1.1 定义

    Shuffle grouping: Tuples are randomly distributed across the bolt's tasks in a way such that each bolt is guaranteed to get an equal number of tuples.

    None grouping: This grouping specifies that you don't care how the stream is grouped. Currently, none groupings are equivalent to shuffle groupings. Eventually though, Storm will push down bolts with none groupings to execute in the same thread as the bolt or spout they subscribe from (when possible).





2. Fields Grouping

2.1 定义

    The stream is partitioned by the fields specified in the grouping. For example, if the stream is grouped by the "user-id" field, tuples with the same "user-id" will always go to the same task, but tuples with different "user-id"'s may go to different tasks.






3. All Grouping

3.1 定义

    The stream is replicated across all the bolt's tasks. Use this grouping with care.





4. Global Grouping

4.1 定义

    The entire stream goes to a single one of the bolt's tasks. Specifically, it goes to the task with the lowest id.




5. LocalOrShuffle Grouping

5.1 定义

    If the target bolt has one or more tasks in the same worker process, tuples will be shuffled to just those in-process tasks. Otherwise, this acts like a normal shuffle grouping.


如果下游bolt的某些task与上游spout/bolt的某些task运行在同一个worker进程中,那么上游spout/bolt的这些task所发射的所有tuples均由下游bolt的同进程的tasks来处理;否则,这种分组方式等同于shuffle grouping。


6. Direct Grouping

6.1 定义

    This is a special kind of grouping. A stream grouped this way means that the producer of the tuple decides which task of the consumer will receive this tuple. Direct groupings can only be declared on streams that have been declared as direct streams. Tuples emitted to a direct stream must be emitted using one of the emitDirect methods. A bolt can get the task ids of its consumers by either using the provided TopologyContext or by keeping track of the output of the emit method in OutputCollector (which returns the task ids that the tuple was sent to).






