Java Collections 框架中的接口,有很多方法被标注为可选的(optional),这意味着允许具体的实现类不实现这些方法,被调用到的时候直接抛出一个 UnsupportedOperationException 异常。
同时 Java 也提供了一系列的通用实现(general purpose implementations),这些实现适用于所有通用的场景,它们实现了对应接口的所有方法,下面这个表格总结的所有的通用实现:
接口
|
HashTable
|
Resiable Array
|
Balanced Tree
|
Linked List
|
Hash Table + LinkedList
|
Set
|
HashSet
|
|
TreeSet |
|
LinkedHashSet
|
List
|
|
ArrayList
|
|
LinkedList
|
|
Deque
|
|
ArrayDeque
|
|
LinkedList
|
|
Map
|
HashMap
|
|
TreeMap |
|
LinkedHashMap
|
下面逐个分析一下各个通用实现中比较巧妙的技术细节。
ArrayList
ArrayList 使用可伸缩数组实现 List 接口。除了不支持同步之外,跟 Vector 非常类似。
内部数据存储结构:数组。
数组容量增长规则:默认增长到原来容量的 1.5 倍,如果所需的量比这个值大(addAll 一个 Collections 的时候可能会出现这样的情况),则按需增长。
private void grow(int minCapacity) { // overflow-conscious code int oldCapacity = elementData.length; int newCapacity = oldCapacity + (oldCapacity >> 1); if (newCapacity - minCapacity < 0) newCapacity = minCapacity; if (newCapacity - MAX_ARRAY_SIZE > 0) newCapacity = hugeCapacity(minCapacity); // minCapacity is usually close to size, so this is a win: elementData = Arrays.copyOf(elementData, newCapacity); }
ArrayDeque
ArrayDeque 使用可伸缩数组实现双端数组接口,是非线程安全的,在栈的场景下它的性能优于 Stack,在队列的场景下优于 LinkedList。
内部采用循环数组结构,并保持数组的大小是 2 的 n 次幂。
容量增长的规则:增长到原来容量的 2 倍。
初始容量分配算法
/** * Allocates empty array to hold the given number of elements. * * @param numElements the number of elements to hold */ private void allocateElements(int numElements) { int initialCapacity = MIN_INITIAL_CAPACITY; // Find the best power of two to hold elements. // Tests "<=" because arrays aren't kept full. if (numElements >= initialCapacity) { initialCapacity = numElements; initialCapacity |= (initialCapacity >>> 1); initialCapacity |= (initialCapacity >>> 2); initialCapacity |= (initialCapacity >>> 4); initialCapacity |= (initialCapacity >>> 8); initialCapacity |= (initialCapacity >>> 16); initialCapacity++; if (initialCapacity < 0) // Too many elements, must back off initialCapacity >>>= 1;// Good luck allocating 2 ^ 30 elements } elements = new Object[initialCapacity]; }
扩容算法
/** * Doubles the capacity of this deque. Call only when full, i.e., * when head and tail have wrapped around to become equal. */ private void doubleCapacity() { assert head == tail; int p = head; int n = elements.length; int r = n - p; // number of elements to the right of p int newCapacity = n << 1; if (newCapacity < 0) throw new IllegalStateException("Sorry, deque too big"); Object[] a = new Object[newCapacity]; System.arraycopy(elements, p, a, 0, r); System.arraycopy(elements, 0, a, r, p); elements = a; head = 0; tail = n; }
插入节点定位方式:
/** * Inserts the specified element at the front of this deque. * * @param e the element to add * @throws NullPointerException if the specified element is null */ public void addFirst(E e) { if (e == null) throw new NullPointerException(); elements[head = (head - 1) & (elements.length - 1)] = e; if (head == tail) doubleCapacity(); }
HashMap
HashMap 是基于哈希表的 Map 接口实现,它跟 Hashtable 很像,不同点是它不是线程安全的,另外它支持 key 或 value 为空。
哈希桶是以 2 的 n 次幂数组为存储。
哈希方案是 key 的 hashCode 高低位重合。
解决冲突的方案是冲突节点少于 8 个采用链表,否则采用红黑树。
扩容的方案是每次扩容哈希桶数量为原来的二倍,扩从时间点多数在新增一个节点之后。
扩容时机:可以指定一个负载因子(
loadFactor
),默认是 0.75,当元素数量超过 capacity *
loadFactor 时,就开始扩容。
hash 算法:
static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); }
查找节点方法:
/** * Implements Map.get and related methods * * @param hash hash for key * @param key the key * @return the node, or null if none */ final Node<K,V> getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; if ((tab = table) != null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) != null) { // 先找哈希桶的第一个节点 if (first.hash == hash && // always check first node ((k = first.key) == key || (key != null && key.equals(k)))) return first; if ((e = first.next) != null) { if (first instanceof TreeNode) // 如果是树型节点,遍历树查找节点 return ((TreeNode<K,V>)first).getTreeNode(hash, key); do { // 链表结构的节点,依次遍历链表查找 if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) return e; } while ((e = e.next) != null); } } return null; }
具体扩容算法:
/** * Initializes or doubles table size. If null, allocates in * accord with initial capacity target held in field threshold. * Otherwise, because we are using power-of-two expansion, the * elements from each bin must either stay at same index, or move * with a power of two offset in the new table. * * @return the table */ final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0; if (oldCap > 0) { if (oldCap >= MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return oldTab; } else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // double threshold } else if (oldThr > 0) // initial capacity was placed in threshold newCap = oldThr; else { // zero initial threshold signifies using defaults newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } if (newThr == 0) { float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; @SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; if (oldTab != null) { for (int j = 0; j < oldCap; ++j) { Node<K,V> e; if ((e = oldTab[j]) != null) { oldTab[j] = null; if (e.next == null) newTab[e.hash & (newCap - 1)] = e; else if (e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); // 对于树型节点,分裂之,具体算法跟下面的类似 else { // preserve order Node<K,V> loHead = null, loTail = null; // 因为扩容算法是容量翻倍,因此扩容前的一个桶对应扩容后的两个桶 Node<K,V> hiHead = null, hiTail = null; // loHead 和 hiHead 分别对应扩容后的两个桶 Node<K,V> next; do { next = e.next; if ((e.hash & oldCap) == 0) { if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } else { if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) != null); if (loTail != null) { loTail.next = null; newTab[j] = loHead; } if (hiTail != null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; }
HashSet
HashSet 整体借用 HashMap 来实现,实际上就是内部引用了一个 HashMap 实例。
public class HashSet<E> extends AbstractSet<E> implements Set<E>, Cloneable, java.io.Serializable { static final long serialVersionUID = -5024744406713321676L; private transient HashMap<E,Object> map; // 内部引用了一个 HashMap 实例 // Dummy value to associate with an Object in the backing Map private static final Object PRESENT = new Object(); /** * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has * default initial capacity (16) and load factor (0.75). */ public HashSet() { map = new HashMap<>(); }
添加元素的算法:
public boolean add(E e) { return map.put(e, PRESENT)==null; // 仅仅对 HashMap 实例的简单调用 }
TreeMap
TreeMap 基于红黑树实现
NavigableMap 接口,它是非线程安全的。
TreeSet
内部默认借用 TreeMap 实现。
LinkedList
LinkedList 内部封装了一个双向链表,此外没什么特别的。
一个优化点是根据下标访问元素的时候是先判断元素是在链表中的前部还是后部,从而决定是先序遍历还是后序遍历。
LinkedHashMap
内部封装了一个 HashMap 加一个双向链表。HashMap 通过继承的方式复用,双向链表用户维护迭代器的元素访问顺序,默认按照 key 的插入顺序迭代,也可以按照 key 的访问顺序迭代。
按照 key 的访问顺序迭代的方式适用于实现基于 LRU 策略的缓存系统,实现思路是继承 LinkedHashMap 并覆写 removeEldestEntry 方法,具体可以参考这个帖子:
LinkedHashSet
LinkedHashSet 是要支持按序迭代访问,它内部聚合了一个 LinkedHashMap 来实现它的功能。
实现上有个巧妙的地方可以学习:
它对 LinkedHashMap 的聚合是通过继承 HashSet 来实现的,具体代码是给 HashSet 新增了一个内部构造方法:
/** * Constructs a new, empty linked hash set. (This package private * constructor is only used by LinkedHashSet.) The backing * HashMap instance is a LinkedHashMap with the specified initial * capacity and the specified load factor. * * @param initialCapacity the initial capacity of the hash map * @param loadFactor the load factor of the hash map * @param dummy ignored (distinguishes this * constructor from other int, float constructor.) * @throws IllegalArgumentException if the initial capacity is less * than zero, or if the load factor is nonpositive */ HashSet(int initialCapacity, float loadFactor, boolean dummy) { map = new LinkedHashMap<>(initialCapacity, loadFactor); }
这个方法有两点值得注意,一个是方法的可见域是默认,另一个是加了一个哑参数用于跟其他构造函数做区分。