先把代码贴出来
public String(int[] codePoints, int offset, int count) {
if (offset < 0) {
throw new StringIndexOutOfBoundsException(offset);
}
if (count <= 0) {
if (count < 0) {
throw new StringIndexOutOfBoundsException(count);
}
if (offset <= codePoints.length) {
this.value = "".value;
return;
}
}
// Note: offset or count might be near -1>>>1.
if (offset > codePoints.length - count) {
throw new StringIndexOutOfBoundsException(offset + count);
}
final int end = offset + count;
// Pass 1: Compute precise size of char[]
int n = count;
for (int i = offset; i < end; i++) {
int c = codePoints[i];
if (Character.isBmpCodePoint(c))
continue;
else if (Character.isValidCodePoint(c))
n++;
else throw new IllegalArgumentException(Integer.toString(c));
}
// Pass 2: Allocate and fill in char[]
final char[] v = new char[n];
for (int i = offset, j = 0; i < end; i++, j++) {
int c = codePoints[i];
if (Character.isBmpCodePoint(c))
v[j] = (char)c;
else
Character.toSurrogates(c, v, j++);
}
this.value = v;
}
这个构造方法的运行结果
public static void main(String[] args) {
int[] a = {100, 2312, 12313, 54545, 23432, 22, 65, 78, 99};
String b = new String(a, 0, a.length);
System.out.println(b);
}
//这是结果
dई〙픑守ANc
- unicode的合理取值范围现在扩展到了0x0000-0x10ffff,一共21位,二进制
0000 0000 0000 0001 0000 1111 1111 1111 1111 - java中的char是两个字节的,也就是16位。最大值就是0xffff,就是二进制
1111 1111 1111 1111 - unicode中 0x0000-0xffff 被称作BMP (Basic Multilingual Plane),char只能表示BMP
- 值大于0xffff的字符称为增补字符
- char只能表示BMP,而int的范围甚至超出了unicode的合理取值范围
Character.isBmpCodePoint(c)
public static boolean isBmpCodePoint(int codePoint) {
return codePoint >>> 16 == 0;
// Optimized form of:
// codePoint >= MIN_VALUE && codePoint <= MAX_VALUE
// We consistently use logical shift (>>>) to facilitate
// additional runtime optimizations.
}
判断是不是Bmp,如果是的话,一个char就能放下,所以不需要增加空间
Character.isValidCodePoint(c)
public static boolean isValidCodePoint(int codePoint) {
// Optimized form of:
// codePoint >= MIN_CODE_POINT && codePoint <= MAX_CODE_POINT
int plane = codePoint >>> 16;
return plane < ((MAX_CODE_POINT + 1) >>> 16);
}
判断是不是合理取值范围,如果是的话,说明一个char的空间存不下,再申请一个,如果超出了合理取值范围就抛异常
- 这个例子就是超出范围了,抛异常
public static void main(String[] args) {
int[] a = {100, 99,0x7fffffff};
String b = new String(a, 0, a.length);
System.out.println(b);
}
// Exception in thread "main" java.lang.IllegalArgumentException: 268435455
at java.lang.String.<init>(String.java:266)
at main.java.Test.main(Test.java:11)
Character.toSurrogates(c, v, j++);
会将大于BMP范围但是是unicode合理范围的int,处理成两个char,分别为高位代理和地位代理,Charater类中有对应的方法,判断是否为代理,是否为高位代理,是否为地位代理,是否为代理对,将一对代理转换为一个代码点
static void toSurrogates(int codePoint, char[] dst, int index) {
// We write elements "backwards" to guarantee all-or-nothing
dst[index+1] = lowSurrogate(codePoint);
dst[index] = highSurrogate(codePoint);
}
public static char lowSurrogate(int codePoint) {
return (char) ((codePoint & 0x3ff) + MIN_LOW_SURROGATE);
}
public static char highSurrogate(int codePoint) {
return (char) ((codePoint >>> 10)
+ (MIN_HIGH_SURROGATE - (MIN_SUPPLEMENTARY_CODE_POINT >>> 10)));
}