接下来看看消费者的性能测试
[root@hadoop-sh1-core1 bin]# ./kafka-consumer-perf-test.sh --help
Missing required argument "[topic]"
Option Description
------ -----------
--batch-size <Integer: size> Number of messages to write in a
single batch. (default: 200)
--broker-list <String: host> REQUIRED (unless old consumer is
used): A broker list to use for
connecting if using the new consumer.
--compression-codec <Integer: If set, messages are sent compressed
supported codec: NoCompressionCodec (default: 0)
as 0, GZIPCompressionCodec as 1,
SnappyCompressionCodec as 2,
LZ4CompressionCodec as 3>
--consumer.config <String: config file> Consumer config properties file.
--date-format <String: date format> The date format to use for formatting
the time field. See java.text.
SimpleDateFormat for options.
(default: yyyy-MM-dd HH:mm:ss:SSS)
--fetch-size <Integer: size> The amount of data to fetch in a
single request. (default: 1048576)
--from-latest If the consumer does not already have
an established offset to consume
from, start with the latest message
present in the log rather than the
earliest message.
--group <String: gid> The group id to consume on. (default:
perf-consumer-26926)
--help Print usage.
--hide-header If set, skips printing the header for
the stats
--message-size <Integer: size> The size of each message. (default:
100)
--messages <Long: count> REQUIRED: The number of messages to
send or consume
--new-consumer Use the new consumer implementation.
This is the default.
--num-fetch-threads <Integer: count> Number of fetcher threads. (default: 1)
--reporting-interval <Integer: Interval in milliseconds at which to
interval_ms> print progress info. (default: 5000)
--show-detailed-stats If set, stats are reported for each
reporting interval as configured by
reporting-interval
--socket-buffer-size <Integer: size> The size of the tcp RECV size.
(default: 2097152)
--threads <Integer: count> Number of processing threads.
(default: 10)
--topic <String: topic> REQUIRED: The topic to consume from.
--zookeeper <String: urls> REQUIRED (only when using old
consumer): The connection string for
the zookeeper connection in the form
host:port. Multiple URLS can be
given to allow fail-over. This
option is only used with the old
consumer.
以上是它的参数说明,接下来开始测试。
[root@hadoop-sh1-core1 bin]# ./kafka-consumer-perf-test.sh --broker-list hadoop-sh1-master1:9092,hadoop-sh1-master2:9092,hadoop-sh1-core1:9092 --num-fetch-threads 1 --reporting-interval 5000 --threads 10 --topic test003 --messages 1000000
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec
2018-07-02 15:14:02:794, 2018-07-02 15:14:08:885, 976.9717, 160.3959, 1000419, 164245.4441
1个拉取线程,10个处理线程,发送100w条,平均是164245.4441/s,消费是160.3959M/s,加大拉取线程,继续
[root@hadoop-sh1-core1 bin]# ./kafka-consumer-perf-test.sh --broker-list hadoop-sh1-master1:9092,hadoop-sh1-master2:9092,hadoop-sh1-core1:9092 --num-fetch-threads 10 --reporting-interval 5000 --threads 10 --topic test003 --messages 1000000
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec
2018-07-02 15:17:45:956, 2018-07-02 15:17:51:506, 976.9717, 176.0309, 1000419, 180255.6757
将拉取线程也增加到10后,平均是180255.6757/s,消费是176.0309M/s,增加了一些
[root@hadoop-sh1-core1 bin]# ./kafka-consumer-perf-test.sh --broker-list hadoop-sh1-master1:9092,hadoop-sh1-master2:9092,hadoop-sh1-core1:9092 --num-fetch-threads 10 --reporting-interval 5000 --threads 20 --topic test003 --messages 1000000
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec
2018-07-02 15:19:24:626, 2018-07-02 15:19:30:463, 976.9805, 167.3772, 1000428, 171394.2094
将处理线程增加到20,发现性能并没有提升,可知,瓶颈并不在处理线程,继续调节其他参数
[root@hadoop-sh1-core1 bin]# ./kafka-consumer-perf-test.sh --broker-list hadoop-sh1-master1:9092,hadoop-sh1-master2:9092,hadoop-sh1-core1:9092 --num-fetch-threads 20 --reporting-interval 5000 --threads 20 --topic test003 --messages 1000000
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec
2018-07-02 15:21:32:301, 2018-07-02 15:21:36:755, 976.9717, 219.3470, 1000419, 224611.3606
将拉取线程调到20以后,性能显著提升,连之前平均6秒的消费时间数,都降到了3秒多,处理速度224611.3606/s,处理大小219.3470M/s,由此可见,在消费数据的时候,多增加消费者,对性能的提升帮助很大,当然,要注意你的分区数。
[root@hadoop-sh1-core1 bin]# ./kafka-consumer-perf-test.sh --broker-list hadoop-sh1-master1:9092,hadoop-sh1-master2:9092,hadoop-sh1-core1:9092 --batch-size 400 --num-fetch-threads 20 --reporting-interval 5000 --threads 20 --topic test003 --messages 1000000
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec
2018-07-02 15:24:15:713, 2018-07-02 15:24:22:712, 976.9805, 139.5886, 1000428, 142938.705
此时再提高batch-size,即每次拉取的数量,性能不升反降,所以,这个参数并不是越大越好,要设置合理的size值才行。
[root@hadoop-sh1-core1 bin]# ./kafka-consumer-perf-test.sh --broker-list hadoop-sh1-master1:9092,hadoop-sh1-master2:9092,hadoop-sh1-core1:9092 --num-fetch-threads 20 --reporting-interval 5000 --threads 20 --topic test003 --messages 1000000
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec
2018-07-02 15:21:32:301, 2018-07-02 15:21:36:755, 976.9717, 219.3470, 1000419, 224611.3606
[root@hadoop-sh1-core1 bin]# ./kafka-consumer-perf-test.sh --broker-list hadoop-sh1-master1:9092,hadoop-sh1-master2:9092,hadoop-sh1-core1:9092 --fetch-size 2000000 --num-fetch-threads 20 --reporting-interval 5000 --threads 20 --topic test003 --messages 1000000
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec
2018-07-02 15:27:48:063, 2018-07-02 15:27:52:589, 976.8652, 215.8341, 1000310, 221014.1405
再前面的基础上,再次提高fetch-size,默认是1048576,增加到200w,可以发现,性能并没有多大的变化,可以猜测,当前的三台机器是否已经到达瓶颈。
再测试size大小对它的影响。
[root@hadoop-sh1-core1 bin]# ./kafka-consumer-perf-test.sh --broker-list hadoop-sh1-master1:9092,hadoop-sh1-master2:9092,hadoop-sh1-core1:9092 --num-fetch-threads 20 --reporting-interval 5000 --threads 20 --topic test003 --messages 1000000
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec
2018-07-02 15:35:29:318, 2018-07-02 15:35:34:346, 976.9805, 194.3080, 1000428, 198971.3604
[root@hadoop-sh1-core1 bin]# ./kafka-consumer-perf-test.sh --broker-list hadoop-sh1-master1:9092,hadoop-sh1-master2:9092,hadoop-sh1-core1:9092 --message-size 25 --num-fetch-threads 20 --reporting-interval 5000 --threads 20 --topic test003 --messages 1000000
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec
2018-07-02 15:34:44:607, 2018-07-02 15:34:49:050, 976.9717, 219.8901, 1000419, 225167.4544
默认的size大小是100,将到25以后,发现西能有明显提升,所以,size的大小也是影响的一个因素。
由以上测试可以得出,num-fetch-threads拉取线程数、threads处理线程数影响最大,size和batch-size和fetch-size有一定的影响。