Setup
1 Node cluster on my local laptop: 8core, Xms=8G, Xmx=8G
Indexing performance (Single index):
10 million payments, each one about 5KB, with batch size = 10000. Each batch takes roughly 2.5 s → 4 s, total time to index 10 million payment is around 50 min
Indexing performance (Multiple indices):
20 separate indices store totally 10 million payments. Indexing execution is slightly faster than single index case. Each batch takes roughly 1.7 s → 3.8 s, total time to index 10 million payment is around 38 min
Parameters required for bulk load operation
Elasticsearch config: http.max_content_length: 500mb
Client time out adjustment:
RestClient.builder(HttpHost("localhost", 9200))
.setRequestConfigCallback {
it.apply {
this.setConnectTimeout(5000)
this.setSocketTimeout(60000)
}
}.setMaxRetryTimeoutMillis(60000))
Initially batch size is set to 100000, elastic search server becomes unstable with high GC frequency, occupying a large percent of CPU time. So larger batch size does not always imply higher performance