java的hashmap,如果确定只装载100个元素,new HashMap(?)多少是最佳的,why?

问题来源美团CRM后端团队:https://www.iteye.com/topic/1134016

刚学java萌新一看:只装载100个元素,本着厉行节约的原则,直接给100不就行了

一年java菜鸟一看:有坑……好像扩容因子是0.75,应该是100/0.75

两年java初级一看:想坑我~ tableSizeFor,所以最好应该是128~

三年java码农一看:还是太年轻~

其实问的就是HashMap的初始化策略,拷问的是对HashMap底层实现的掌握程度

上源码:
第一步:


    /**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and the default load factor (0.75).
     *
     * @param  initialCapacity the initial capacity.
     * @throws IllegalArgumentException if the initial capacity is negative.
     */
    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

看最后那步的构造函数

  /**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and load factor.
     *
     * @param  initialCapacity the initial capacity
     * @param  loadFactor      the load factor
     * @throws IllegalArgumentException if the initial capacity is negative
     *         or the load factor is nonpositive
     */
    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }

重点关注tableSizeFor(initialCapacity),这才是真正的容量


    /**
     * Returns a power of two size for the given target capacity.
     */
    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

这步的结果是返回一个不小于传入参数并且最近的一个2的n次方的值,简而言之,如果你传1000,给你返回1024,你传1024它还是1024,你传1025,它就返回2028

详情参考:读HashMap源码之tableSizeFor

读到这大概明白了,初始化的大小并不一定是你传入值的大小,而是tableSizeFor后的值,为了迎合底层实现,就设置成128不就ok了

128确实是个好数字,满足了tableSizeFor的要求,但是,在看一下putVal实现,你会发现

 /**
     * Implements Map.put and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
     */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length; // 划重点
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
         ……
         if (++size > threshold)
            resize();// 划重点
        afterNodeInsertion(evict);
        return null;

上面是putVal的实现节选,两个地方划了重点,resize();

导演,切换镜头,让源码转到resize()

 final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) {
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        else {               // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        ……

当put第一个值进去时,会发生什么,源码读一下

   Node<K,V>[] oldTab = table; // = null
        int oldCap = (oldTab == null) ? 0 : oldTab.length;// = 0
        int oldThr = threshold; // 初始化时this.threshold = tableSizeFor(initialCapacity); = 128
        int newCap, newThr = 0; // 都等于0

继续

if (oldCap > 0) {
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;// 上面得知=128
        else {               // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {// 执行这里
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);// newThr  = 128*0.75=96
        }
        threshold = newThr;// threshold = 96

所以通过以上代码,知道一个重要信息就是,当初始化容量为128时,它的扩容阈值threshold = 96

言外之意就是当元素put到第96个时,就会再次进行resize()方法,而resize()方法是非常耗时间的

结合我们实际情况进行一次resize()方法只为了多方4个元素,这样的损耗显然是不适合的

所以,综上所述,个人觉得初始化最好的应该是256

鉴于硬件成本已经足够低廉了,用空间换时间,值~

类似问题:面试官:”准备用HashMap存1w条数据,构造时传10000会触发扩容吗?

读HashMap源码之tableSizeFor

HashMap面试常问的那些常量、数值

猜你喜欢

转载自blog.csdn.net/leisure_life/article/details/102898866