Java 7 HashMap源码解析

记录阅读Java 7 HashMap源码过程，Java 7 与 Java 8 两个版本的源码区别比较大，主要是Java 8 中引入了红黑树存储。本篇只对get()方法和put()方法进行解析。

分析两个方法前，先认识HashMap几个关键的成员变量

static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;
默认的初始化容量，1<<4=16，至于为什么是16而不是15，17后面讲解
static final int MAXIMUM_CAPACITY = 1 << 30;
允许的最大容量值
static final float DEFAULT_LOAD_FACTOR = 0.75f;
翻译为加载因子，与HashMap的扩容有关
transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;
HashMap中存放键值对的数组，我们所讲的容量指的就是table数组的大小，初始值指向一个空数组
transient int size;
形象来说就是记录当前装了多少个桶（桶指的是table[i]）
int threshold;
阈值，HashMap扩容的关键，值为（加载因子 * 容量），当size >= threshold且需要插入新的键值对时，HashMap执行扩容操作。

put()

public V put(K key, V value) {
    //第一次插入元素时，先对table进行初始化
    if (table == EMPTY_TABLE) {
        inflateTable(threshold);
    }
    //HashMap允许key值为null，存放在table[0]中
    if (key == null)
        return putForNullKey(value);
    //计算key的hash值
    int hash = hash(key);
    //获取hash值对应的下标
    int i = indexFor(hash, table.length);
    //遍历Entry链，如果hash相等并且key值相等，替换旧值并返回旧值
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);//这是个空方法，暂时不知道有啥用
            return oldValue;
        }
    }
    //与HashMap的fail-fast机制有关，可以在某度查阅相关内容
    modCount++;
    //增加新的键值对
    addEntry(hash, key, value, i);
    return null;
}

接下来对相关方法分析：

//初始化table
private void inflateTable(int toSize) {
    //保证table数组容量大小为2^n，有兴趣的可以进入查看算法的实现
    int capacity = roundUpToPowerOf2(toSize);
    //初始化阈值大小，值系capacity*loadFactor
    threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
    //初始化table数组
    table = new Entry[capacity];
    //官方解释为初始化哈希掩码，不太了解，忽略
    initHashSeedAsNeeded(capacity);
}

之所以要保证table数组容量为2^n，是和解决hash冲突有关，后面涉及到再分析。
putForNullKey(value)方法遍历table[0]链，替换并返回旧值。

//获取键的hash值
final int hash(Object k) {
    int h = hashSeed;//hash掩码，忽略
    if (0 != h && k instanceof String) {
        return sun.misc.Hashing.stringHash32((String) k);
    }

    h ^= k.hashCode();//hashCode是一个native方法，用于获取hash值

    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

//根据hash值获取数组下标
static int indexFor(int h, int length) {
    // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
    return h & (length-1);
}

indexFor(int h, int length)方法比较简单，将hash值与数组长度作相与操作，得出的值即为数组下标。在上面提及到，数组的容量大小为什么要保证是2^n？举个栗子看看：

//两个hash值分别为：1011，1010
//如果容量大小=15，数组下标0~14，用二进制表示为：1110，相与结果：
1011 & 1110 = 1010 = 10
1010 & 1110 = 1010 = 10

//如果容量大小=16，数组下标0~15，用二进制表示为：1111，相与结果：
1011 & 1111 = 1011 = 11
1010 & 1111 = 1010 = 10

可以看到：如果容量大小用二进制表示不是全1，即不为2^n，将会导致hash冲突更为频繁。但是，还有一个问题，从indexFor方法得到的是hash值低位与长度相与的结果，试想下，如果两个hash值高位不同而低位相同，得出的下标结果是一致的，hash冲突的概率还是比较高，为此，引入了hash(Object k)来解决这个问题。

在hash(Object k)方法中，官方给的解释翻译为：这个函数确保哈希码在每个位的倍数不变的情况下，有一定数量的碰撞，感觉比较抽象，网上资料解释为扰乱函数，通过一系列位运算，能够保证hash每一位的变化，都会改变最终得到的结果，这就解决了高位不同而低位相同导致冲突的问题，可以参考这篇文章

再来看看addEntry方法

/**
 * Adds a new entry with the specified key, value and hash code to
 * the specified bucket.  It is the responsibility of this
 * method to resize the table if appropriate.
 */
void addEntry(int hash, K key, V value, int bucketIndex) {
    if ((size >= threshold) && (null != table[bucketIndex])) {
        //每次扩容两倍，保证容量大小为2^n
        resize(2 * table.length);
        //扩容后重新计算hash值和桶下标
        hash = (null != key) ? hash(key) : 0;
        bucketIndex = indexFor(hash, table.length);
    }

    createEntry(hash, key, value, bucketIndex);
}

void createEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
    //新的entry将从头部开始插入到entry链中
    table[bucketIndex] = new Entry<>(hash, key, value, e);

    size++;
}

为指定的key，value，hashcode添加新的entry到指定的桶中（index[i]），addEntry方法需要处理扩容问题。逻辑比较好理解，如果达到阈值（threshold）且指定的桶不为空，则进行扩容。看看resize方法：

void resize(int newCapacity) {
    Entry[] oldTable = table;
    int oldCapacity = oldTable.length;
    //如果达到最大允许的容量，停止扩容
    if (oldCapacity == MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return;
    }

    Entry[] newTable = new Entry[newCapacity];
    //重构新的table
    transfer(newTable, initHashSeedAsNeeded(newCapacity));
    table = newTable;
    //更新阈值
    threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}

transfer方法将会遍历每个entry，重新计算下标并插入到新的table中，可想而知，resize的成本非常大，所以必须科学地设定初始容量、阈值，和选择合适的散列算法，降低resize频率。

get()

get方法就相对easy多了，源码：

public V get(Object key) {
    //若key为null，从table[0]中遍历
    if (key == null)
        return getForNullKey();
    Entry<K,V> entry = getEntry(key);

    return null == entry ? null : entry.getValue();
}

/**
 * Returns the entry associated with the specified key in the
 * HashMap.  Returns null if the HashMap contains no mapping
 * for the key.
 */
final Entry<K,V> getEntry(Object key) {
    //每个桶都是空的，返回null
    if (size == 0) {
        return null;
    }
    //计算hash值
    int hash = (key == null) ? 0 : hash(key);
    //遍历指定桶中的entry链
    for (Entry<K,V> e = table[indexFor(hash, table.length)];
         e != null;
         e = e.next) {
        Object k;
        if (e.hash == hash &&
            ((k = e.key) == key || (key != null && key.equals(k))))
            return e;
    }
    return null;
}

官方解释得很清楚，从HashMap中返回指定key的entry，如果如果不存在key的映射，返回null。