Java 源码阅读笔记（一）String

对底层的源码进行阅读，提高自己的Java基础能力。

笔记一：

public final class String

String 类的定义用 final 修饰。类如果用final修饰，将表明这个类不可以被继承。

笔记二：全局变量

/** The value is used for character storage. */
private final char value[];

这里定义的字符数组用来存字符串，可以发现String的底层是通过字符数组来进行操作的。这里注意数组定义的两种方式。

int[] a1
int a1[]
以上两种方式作用相同，定义了一个int类型的数组a1。

/** Cache the hash code for the string */
    private int hash; // Default to 0

哈希值，默认为0，是String实例化的hashcode的一个缓存。因为String经常被用于比较，如在hashcode中，如果每次都用来比较都需要计算hashcode值的话，比较麻烦。

public static final Comparator<String> CASE_INSENSITIVE_ORDER
                                         = new CaseInsensitiveComparator();

其实这个静态常量就是用来忽略大小写来比较两个字符串。

笔记三：构造方法

public String() {
        this.value = "".value;
    }

创建一个空的字符序列，这样意义不大，因为String是不可变的。

public String(String original) {
        this.value = original.value;
        this.hash = original.hash;
    }

以String为参数，创建字符串对象。

 public String(char value[]) {
        this.value = Arrays.copyOf(value, value.length);
    }

以字符数组为参数创建字符串对象，修改字符数组不会影响创建的字符串对象。

public String(char value[], int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count <= 0) {
            if (count < 0) {
                throw new StringIndexOutOfBoundsException(count);
            }
            if (offset <= value.length) {
                this.value = "".value;
                return;
            }
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > value.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }
        this.value = Arrays.copyOfRange(value, offset, offset+count);
    }

利用给定的字符数组，选取一部分创建String对象
String的有参构造方法太多了，可以接收String、char[]、byte[]、StringBuffer等多种参数。就不一一列举了。但本质就是将接收到的参数传递给全局变量value[]。在构造函数中也可以传入字节数组。但字节数组需要指定字符的编码方式。
Java中字符(char)与字节（byte）的区别

byte占用一个字节，char占用两个字节
byte为有符号类型，可以表示负数，char为无符号类型，不可以表示负数
对于英文，两者可以相互转换

笔记四：内部类

String 类中只有一个内部类

private static class CaseInsensitiveComparator
            implements Comparator<String>, java.io.Serializable {
        // use serialVersionUID from JDK 1.2.2 for interoperability
        private static final long serialVersionUID = 8575799808933029326L;

        public int compare(String s1, String s2) {
            int n1 = s1.length();
            int n2 = s2.length();
            int min = Math.min(n1, n2);
            for (int i = 0; i < min; i++) {
                char c1 = s1.charAt(i);
                char c2 = s2.charAt(i);
                if (c1 != c2) {
                    c1 = Character.toUpperCase(c1);
                    c2 = Character.toUpperCase(c2);
                    if (c1 != c2) {
                        c1 = Character.toLowerCase(c1);
                        c2 = Character.toLowerCase(c2);
                        if (c1 != c2) {
                            // No overflow because of numeric promotion
                            return c1 - c2;
                        }
                    }
                }
            }
            return n1 - n2;
        }

已经存在一个compareTo()方法了，为什么还需要这个类。其实是为了代码复用，这个类和compareTo()的区别是这个方法在比较时可以忽略大小写进行比较。其次，String类中提供的compareToIgnoreCase方法调用的就是这个内部类中的方法实现的。

笔记五：常用方法

// 返回字符串对象的字符数
 public int length() {
        return value.length;
    }

--------------------------------------
// 判断字符串对象是否为空
public boolean isEmpty() {
        return value.length == 0;
    }

--------------------------------------
// 返回字符数组中指定索引的值
public char charAt(int index) {
        if ((index < 0) || (index >= value.length)) {
            throw new StringIndexOutOfBoundsException(index);
        }
        return value[index];
    }
--------------------------------------
// 比较字符对象，忽略大小写
public boolean equalsIgnoreCase(String anotherString) {
        return (this == anotherString) ? true
                : (anotherString != null)
                && (anotherString.value.length == value.length)
                && regionMatches(true, 0, anotherString, 0, value.length);
    }
--------------------------------------
// 检测字符串对象在tooffset位置是否由prefix开头
public boolean startsWith(String prefix, int toffset) {
        char ta[] = value;
        int to = toffset;
        char pa[] = prefix.value;
        int po = 0;
        int pc = prefix.value.length;
        // Note: toffset might be near -1>>>1.
        if ((toffset < 0) || (toffset > value.length - pc)) {
            return false;
        }
        while (--pc >= 0) {
            if (ta[to++] != pa[po++]) {
                return false;
            }
        }
        return true;
    }

--------------------------------------
// 检测字符串是否以suffix字符串结尾
 public boolean endsWith(String suffix) {
        return startsWith(suffix, value.length - suffix.value.length);
    }

知道了String类的底层是字符数组之后，就会理解上述方法其实就是在操作字符数组。
可以看出下面两个重载方法的本质都是调用System.arraycopy()这个函数，在jdk中很多源码都是这样，看似有很多个重载，其实本质上都是调用同样一个函数，只是会给你不同的默认初始值。

// 将字符串复制到字符数组dst中
 void getChars(char dst[], int dstBegin) {
        System.arraycopy(value, 0, dst, dstBegin, value.length);
    }

public void getChars(int srcBegin, int srcEnd, char dst[], int dstBegin) {
        if (srcBegin < 0) {
            throw new StringIndexOutOfBoundsException(srcBegin);
        }
        if (srcEnd > value.length) {
            throw new StringIndexOutOfBoundsException(srcEnd);
        }
        if (srcBegin > srcEnd) {
            throw new StringIndexOutOfBoundsException(srcEnd - srcBegin);
        }
        System.arraycopy(value, srcBegin, dst, dstBegin, srcEnd - srcBegin);
    }

除了下面说的equals()方法，还有只比较内容的contentEquals()方法，这个主要是用来比较String、StringBuffer、StringBuild的内容是否一样。方法的参数为CharSequence，可以知到StringBuffer、StringBuild同样实现了CharSequence。源码先判断参数是哪一个实例，然后再采取不同的策略。不过本质都是通过for循环来判断内同是否相同。

 public boolean contentEquals(CharSequence cs) {
        // Argument is a StringBuffer, StringBuilder
        if (cs instanceof AbstractStringBuilder) {
            if (cs instanceof StringBuffer) {
                synchronized(cs) {
                   return nonSyncContentEquals((AbstractStringBuilder)cs);
                }
            } else {
                return nonSyncContentEquals((AbstractStringBuilder)cs);
            }
        }
        // Argument is a String
        if (cs instanceof String) {
            return equals(cs);
        }
        // Argument is a generic CharSequence
        char v1[] = value;
        int n = v1.length;
        if (n != cs.length()) {
            return false;
        }
        for (int i = 0; i < n; i++) {
            if (v1[i] != cs.charAt(i)) {
                return false;
            }
        }
        return true;
    }

下面这个方法返回指定字符的索引，Character.MIN_SUPPLEMENTARY_CODE_POINT 表明通常在java中char存储的值都是比0x010000小的。当比这个值大的时候就是增补字符了。

public int indexOf(int ch, int fromIndex) {
        final int max = value.length;
        if (fromIndex < 0) {
            fromIndex = 0;
        } else if (fromIndex >= max) {
            // Note: fromIndex might be near -1>>>1.
            return -1;
        }

        if (ch < Character.MIN_SUPPLEMENTARY_CODE_POINT) {
            // handle most cases here (ch is a BMP code point or a
            // negative value (invalid code point))
            final char[] value = this.value;
            for (int i = fromIndex; i < max; i++) {
                if (value[i] == ch) {
                    return i;
                }
            }
            return -1;
        } else {
            return indexOfSupplementary(ch, fromIndex);
        }
    }
    
// 和上面的相反
public int lastIndexOf(int ch, int fromIndex) {
        if (ch < Character.MIN_SUPPLEMENTARY_CODE_POINT) {
            // handle most cases here (ch is a BMP code point or a
            // negative value (invalid code point))
            final char[] value = this.value;
            int i = Math.min(fromIndex, value.length - 1);
            for (; i >= 0; i--) {
                if (value[i] == ch) {
                    return i;
                }
            }
            return -1;
        } else {
            return lastIndexOfSupplementary(ch, fromIndex);
        }
    }

将指定的字符串连接到该字符串末尾

public String concat(String str) {
        int otherLen = str.length();
        if (otherLen == 0) {
            return this;
        }
        int len = value.length;
        char buf[] = Arrays.copyOf(value, len + otherLen);
        str.getChars(buf, len);
        return new String(buf, true);
    }

笔记六：重要方法

一、equals方法

在讨论euquls方法前，需要先了解一下 “ == ” 的用法，在对基本类型进行比较的时候，“==” 比较的是他们的值，在对引用类型进行比较的时候， == 比较的对象存放的内存地址。equals方法比较的是两个字符串的内容是否相同。之所以string可以作为Map[key,value]中的key，关键在于equals()方法。

 public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }
        if (anObject instanceof String) {
            String anotherString = (String)anObject;
            int n = value.length;
            if (n == anotherString.value.length) {
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = 0;
                while (n-- != 0) {
                // 觉得这里写的比较巧妙
                    if (v1[i] != v2[i])
                        return false;
                    i++;
                }
                return true;
            }
        }
        return false;
    }

二、compareTo方法

首先依次遍历两个字符数组，进行对比，若不想等，返回字符的差值。若遍历完最小的字符数组发现前面的字符都相等，返回两个字符串的差值。

public int compareTo(String anotherString) {
        int len1 = value.length;
        int len2 = anotherString.value.length;
        int lim = Math.min(len1, len2);
        char v1[] = value;
        char v2[] = anotherString.value;

        int k = 0;
        while (k < lim) {
            char c1 = v1[k];
            char c2 = v2[k];
            if (c1 != c2) {
                return c1 - c2;
            }
            k++;
        }
        return len1 - len2;
    }

三、hashCode方法

这里不清楚为什么选31

public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;

            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
    }

四、startsWith()方法

判断一个字符串是否在某个位置以prefix开始。

public boolean startsWith(String prefix, int toffset) {
        char ta[] = value;
        int to = toffset;
        char pa[] = prefix.value;
        int po = 0;
        int pc = prefix.value.length;
        // Note: toffset might be near -1>>>1.
        if ((toffset < 0) || (toffset > value.length - pc)) {
            return false;
        }
        while (--pc >= 0) {
            if (ta[to++] != pa[po++]) {
                return false;
            }
        }
        return true;
    }

// 利用重载，判断字符串是否以prefix开头 
public boolean startsWith(String prefix) {
        return startsWith(prefix, 0);
    }

// 换个顺序而已
public boolean endsWith(String suffix) {
        return startsWith(suffix, value.length - suffix.value.length);
    }

五、substring() 方法

切割指定位置的字符串，这里使用了String类的一个构造方法。

public String substring(int beginIndex, int endIndex) {
        if (beginIndex < 0) {
            throw new StringIndexOutOfBoundsException(beginIndex);
        }
        if (endIndex > value.length) {
            throw new StringIndexOutOfBoundsException(endIndex);
        }
        int subLen = endIndex - beginIndex;
        if (subLen < 0) {
            throw new StringIndexOutOfBoundsException(subLen);
        }
        return ((beginIndex == 0) && (endIndex == value.length)) ? this
                : new String(value, beginIndex, subLen);
    }

六、replace() 方法

代码写的真好，替换字符串中某一个字符。

public String replace(char oldChar, char newChar) {
        if (oldChar != newChar) {
            int len = value.length;
            int i = -1;
            char[] val = value; /* avoid getfield opcode */

            while (++i < len) {
                if (val[i] == oldChar) {
                    break;
                }
            }
            if (i < len) {
                char buf[] = new char[len];
                for (int j = 0; j < i; j++) {
                    buf[j] = val[j];
                }
                while (i < len) {
                    char c = val[i];
                    buf[i] = (c == oldChar) ? newChar : c;
                    i++;
                }
                return new String(buf, true);
            }
        }
        return this;
    }

七、trim() 方法

删除字符串首尾的空格，通过比对，找出首尾不为空格的索引，然后对原始字符串进行切割。

public String trim() {
        int len = value.length;
        int st = 0;
        char[] val = value;    /* avoid getfield opcode */

        while ((st < len) && (val[st] <= ' ')) {
            st++;
        }
        while ((st < len) && (val[len - 1] <= ' ')) {
            len--;
        }
        return ((st > 0) || (len < value.length)) ? substring(st, len) : this;
    }

总结：到这里，字符串的大部分源码就阅读结束了，里面有的方法不常用，就没有贴上来。

字符串已经创建就不可变
字符串底层依赖字符数组实现，String类的很多方法都是在操作字符数组
对字符串的操作，如subString()、concat()等方法，返回的是一个新的字符串。
有的源码写的真好，需要好好理解，争取融合到自己的编程风格中

Time__Lc

发布了66 篇原创文章 · 获赞 26 · 访问量 1万+

私信关注