引子
今天,同学发来一个关于String的程序段,让我判断,先给出代码,大家可先想想执行结果。
<span style="font-family:SimSun;">public class InternTest { public static void main(String[] args) { String s = new String("1"); s.intern(); String s2 = "1"; System.out.println(s == s2); System.out.println(s.equals(s2)); String s3 = new String("1") + new String("1"); s3.intern(); String s4 = "11"; System.out.println(s3 == s4); System.out.println(s3.equals(s4)); } }</span></span>
何为String
在《java8语言规范中》中String类型的说明如下:
1、Instances of class String represent sequences of Unicode code points(String类的实例表示Unicode字符序列)
2、A String object has a constant (unchanging) value (一个String对象有一个不可变的常量值)。
3、String literals are references to instances of class String(String字段时String类实例的引用)。
4、The string concatenation operator + implicitly creates a new String object when the result is not a constant expression(String的合并操作“+”会隐式的生成一个新的String对象)。
有了语言规范的定义,我们大概清楚了String使用的限制。一般,String变量的定义方式有3种:
1、使用关键字 new,如:String str = new String("spring");
2、直接定义,如 String str = “spring";
3、连接生成,如 String str = "spr"+new String("ing");
== & equals
我们知道java中使用 == 和 equals来比较两个对象。equals最初是在Object对象中实现的。
<span style="font-family:SimSun;"> public boolean equals(Object obj) { return (this == obj); //这里 equals与==是等价的 }</span>但是一般我们定义类的时候,会重载Object的hashCode与equals方法。String也不例外,重载后equals表示String的内容组成是否相等。
<span style="font-family:SimSun;">public boolean equals(Object anObject) { if (this == anObject) { return true; } if (anObject instanceof String) { String anotherString = (String)anObject; int n = value.length; if (n == anotherString.value.length) { char v1[] = value; char v2[] = anotherString.value; int i = 0; while (n-- != 0) { if (v1[i] != v2[i]) return false; i++; } return true; } } return false; }</span>总结下来,==一般用来比较java虚拟机栈中的对象(虚拟机栈中保存基本类型和引用类型的引用)是否相等,而equals表示堆中的内容是否相等。
String的intern()方法
public native String intern();它返回一个字符串对象的标准表示形式。字符串池最初是空的,由String类私有并维护。调用该方法,如果池中包含一个字符串,有equals(Object)判断,等于该字符串对象 则返回池中的字符串。否则,该字符串对象将添加到池中,并返回该字符串对象的引用。
因此,对于任意两个字符串S和T,S intern() = = T intern()是真的当且仅当s.equals(t)是真的。
String的intern()方法时一个本地方法。通过JNI调用底层的C++动态库,其实现源代码如下
因此,对于任意两个字符串S和T,S intern() = = T intern()是真的当且仅当s.equals(t)是真的。
<span style="font-family:SimSun;">JVM_ENTRY(jstring, JVM_InternString(JNIEnv *env, jstring str)) JVMWrapper("JVM_InternString"); JvmtiVMObjectAllocEventCollector oam; if (str == NULL) return NULL; oop string = JNIHandles::resolve_non_null(str); oop result = StringTable::intern(string, CHECK_NULL); return (jstring) JNIHandles::make_local(env, result); JVM_END</span>
<span style="font-family:SimSun;">再继续看看StringTable::intern(String,CHECK_NULL)</span>
<span style="font-family:SimSun;">oop StringTable::intern(oop string, TRAPS) { if (string == NULL) return NULL; ResourceMark rm(THREAD); int length; Handle h_string (THREAD, string); jchar* chars = java_lang_String::as_unicode_string(string, length); oop result = intern(h_string, chars, length, CHECK_NULL); return result; }</span>
<span style="font-family:SimSun;">oop StringTable::intern(Handle string_or_null, jchar* name, int len, TRAPS) { unsigned int hashValue = hash_string(name, len); int index = the_table()->hash_to_index(hashValue); oop found_string = the_table()->lookup(index, name, len, hashValue); //调用lookup()方法 // Found if (found_string != NULL) return found_string; debug_only(StableMemoryChecker smc(name, len * sizeof(name[0]))); assert(!Universe::heap()->is_in_reserved(name) || GC_locker::is_active(), "proposed name of symbol must be stable"); Handle string; // try to reuse the string if possible if (!string_or_null.is_null() && (!JavaObjectsInPerm || string_or_null()->is_perm())) { string = string_or_null; } else { string = java_lang_String::create_tenured_from_unicode(name, len, CHECK_NULL); } // Grab the StringTable_lock before getting the_table() because it could // change at safepoint. MutexLocker ml(StringTable_lock, THREAD); // Otherwise, add to symbol to table return the_table()->basic_add(index, string, name, len, hashValue, CHECK_NULL); }</span>
</pre><pre>
<span style="font-family:SimSun;">Symbol* SymbolTable::lookup(int index, const char* name, int len, unsigned int hash) { int count = 0; for (HashtableEntry<Symbol*, mtSymbol>* e = bucket(index); e != NULL; e = e->next()) { count++; // count all entries in this bucket, not just ones with same hash if (e->hash() == hash) { Symbol* sym = e->literal(); if (sym->equals(name, len)) { //如上所述,用equals方式比较 // something is referencing this symbol now. sym->increment_refcount(); return sym; } } } // If the bucket size is too deep check if this hash code is insufficient. if (count >= BasicHashtable<mtSymbol>::rehash_count && !needs_rehashing()) { _needs_rehashing = check_rehash_table(count); } return NULL; }</span><span style="font-family: SimSun;"> </span>
下面是StringTable的数据结构,注意,StringTable并非常量池。
<span style="font-family:SimSun;">class StringTable : public Hashtable<oop, mtSymbol> { friend class VMStructs; private: // The string table static StringTable* _the_table; // Set if one bucket is out of balance due to hash algorithm deficiency static bool _needs_rehashing; // Claimed high water mark for parallel chunked scanning static volatile int _parallel_claimed_idx; static oop intern(Handle string_or_null, jchar* chars, int length, TRAPS); oop basic_add(int index, Handle string_or_null, jchar* name, int len, unsigned int hashValue, TRAPS); oop lookup(int index, jchar* chars, int length, unsigned int hashValue); // Apply the give oop closure to the entries to the buckets // in the range [start_idx, end_idx). static void buckets_do(OopClosure* f, int start_idx, int end_idx); StringTable() : Hashtable<oop, mtSymbol>((int)StringTableSize, sizeof (HashtableEntry<oop, mtSymbol>)) {} ....}</span></span>
StringTable数据结构是我们常用的java中的hashtable, 先计算字符串的hashcode,根据hashcode到对应的数组,然后遍历里面的链表结构比较字符串里的每个字符,直到找到相同的。当数据比较多的时候,会导致查找效率变慢,java会在进入safepoint点的时候判断是否需要做一次rehash,就是扩大数组的容量来提高查找的效率。
引子的具体分析
1、命令行切换到类所在目录,编译程序:javac InternTest.java
2、分析编译后的字节码:javap -verbose InternTest
首先是常量池:
<span style="font-family:SimSun;">public class InternTest SourceFile: "InternTest.java" minor version: 0 major version: 52 flags: ACC_PUBLIC, ACC_SUPER Constant pool: #1 = Methodref #16.#29 // java/lang/Object."<init>":()V #2 = Class #30 // java/lang/String #3 = String #31 // 1 #4 = Methodref #2.#32 // java/lang/String."<init>":(Ljava/l ang/String;)V #5 = Methodref #2.#33 // java/lang/String.intern:()Ljava/la ng/String; #6 = Fieldref #34.#35 // java/lang/System.out:Ljava/io/Prin tStream; #7 = Methodref #36.#37 // java/io/PrintStream.println:(Z)V #8 = Methodref #2.#38 // java/lang/String.equals:(Ljava/lan g/Object;)Z #9 = Methodref #36.#39 // java/io/PrintStream.println:()V #10 = Class #40 // java/lang/StringBuilder #11 = Methodref #10.#29 // java/lang/StringBuilder."<init>":( )V #12 = Methodref #10.#41 // java/lang/StringBuilder.append:(Lj ava/lang/String;)Ljava/lang/StringBuilder; #13 = Methodref #10.#42 // java/lang/StringBuilder.toString:( )Ljava/lang/String; #14 = String #43 // 11 #15 = Class #44 // InternTest #16 = Class #45 // java/lang/Object #17 = Utf8 <init> #18 = Utf8 ()V #19 = Utf8 Code #20 = Utf8 LineNumberTable #21 = Utf8 main #22 = Utf8 ([Ljava/lang/String;)V #23 = Utf8 StackMapTable #24 = Class #46 // "[Ljava/lang/String;" #25 = Class #30 // java/lang/String #26 = Class #47 // java/io/PrintStream #27 = Utf8 SourceFile #28 = Utf8 InternTest.java #29 = NameAndType #17:#18 // "<init>":()V #30 = Utf8 java/lang/String #31 = Utf8 1 #32 = NameAndType #17:#48 // "<init>":(Ljava/lang/String;)V #33 = NameAndType #49:#50 // intern:()Ljava/lang/String; #34 = Class #51 // java/lang/System #35 = NameAndType #52:#53 // out:Ljava/io/PrintStream; #36 = Class #47 // java/io/PrintStream #37 = NameAndType #54:#55 // println:(Z)V #38 = NameAndType #56:#57 // equals:(Ljava/lang/Object;)Z #39 = NameAndType #54:#18 // println:()V #40 = Utf8 java/lang/StringBuilder #41 = NameAndType #58:#59 // append:(Ljava/lang/String;)Ljava/l ang/StringBuilder; #42 = NameAndType #60:#50 // toString:()Ljava/lang/String; #43 = Utf8 11 #44 = Utf8 InternTest #45 = Utf8 java/lang/Object #46 = Utf8 [Ljava/lang/String; #47 = Utf8 java/io/PrintStream #48 = Utf8 (Ljava/lang/String;)V #49 = Utf8 intern #50 = Utf8 ()Ljava/lang/String; #51 = Utf8 java/lang/System #52 = Utf8 out #53 = Utf8 Ljava/io/PrintStream; #54 = Utf8 println #55 = Utf8 (Z)V #56 = Utf8 equals #57 = Utf8 (Ljava/lang/Object;)Z #58 = Utf8 append #59 = Utf8 (Ljava/lang/String;)Ljava/lang/StringBuilder; #60 = Utf8 toString</span></span>再看看我们的main方法
<span style="font-family:SimSun;"><public static void main(java.lang.String[]); descriptor: ([Ljava/lang/String;)V flags: ACC_PUBLIC, ACC_STATIC Code: stack=4, locals=5, args_size=1 //<span style="color:#ff0000;">深度为4的操作数栈,局部变量Slot个数为5,一个输入参数</span> 0: new #2 // class java/lang/String 3: dup //复制栈顶数值 并将 复制值压入栈顶 4: ldc #3 // String 1 6: invokespecial #4 // Method java/lang/String."<init> ":(Ljava/lang/String;)V //创建String s对象 9: astore_1 // 将String 1的引用 保存到 slot 1中,即s变量。 10: aload_1 11: invokevirtual #5 // Method java/lang/String.intern:()Ljava/lang/String; 14: pop <span style="color:#ff0000;">15: ldc #3 </span> // String 1 <span style="color:#ff0000;">17: astore_2 </span> 18: getstatic #6 // Field java/lang/System.out:Ljava/io/PrintStream; 21: aload_1 22: aload_2 23: if_acmpne 30 26: iconst_1 27: goto 31 30: iconst_0 31: invokevirtual #7 // Method java/io/PrintStream.println:(Z)V 34: getstatic #6 // Field java/lang/System.out:Ljava/io/PrintStream; 37: aload_1 38: aload_2 39: invokevirtual #8 // Method java/lang/String.equals:(Ljava/lang/Object;)Z 42: invokevirtual #7 // Method java/io/PrintStream.println:(Z)V 45: getstatic #6 // Field java/lang/System.out:Ljava/io/PrintStream; 48: invokevirtual #9 // Method java/io/PrintStream.println:()V 51: new #10 // class java/lang/StringBuilder 54: dup 55: invokespecial #11 // Method java/lang/StringBuilder."<init>":()V 58: new #2 // class java/lang/String 61: dup 62: ldc #3 // String 1 64: invokespecial #4 // Method java/lang/String."<init>":(Ljava/lang/String;)V 67: invokevirtual #12 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 70: new #2 // class java/lang/String 73: dup 74: ldc #3 // String 1 76: invokespecial #4 // Method java/lang/String."<init>":(Ljava/lang/String;)V 79: invokevirtual #12 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 82: invokevirtual #13 // Method java/lang/StringBuilder.toString:()Ljava/lang/String; 85: astore_3 86: aload_3 87: invokevirtual #5 // Method java/lang/String.intern:()Ljava/lang/String; 90: pop 91: ldc #14 // String 11 93: astore 4 95: getstatic #6 // Field java/lang/System.out:Ljava/io/PrintStream; 98: aload_3 99: aload 4 101: if_acmpne 108 104: iconst_1 105: goto 109 108: iconst_0 109: invokevirtual #7 // Method java/io/PrintStream.println:(Z)V 112: getstatic #6 // Field java/lang/System.out:Ljava/io/PrintStream; 115: aload_3 116: aload 4 118: invokevirtual #8 // Method java/lang/String.equals:(Ljava/lang/Object;)Z 121: invokevirtual #7 // Method java/io/PrintStream.println:(Z)V 124: return</span></span>其中 String s2 = "1";的代码对应字节码 为 ldc #3 ,astore_2 。其中ldc表示将 int.float或者String类型从常量池中推到 操作数栈顶。在interpreterRuntime.cpp中我们看到了ldc的执行
<span style="font-family:SimSun;"><span style="font-family:SimSun;">IRT_ENTRY(void, InterpreterRuntime::ldc(JavaThread* thread, bool wide)) // access constant pool constantPoolOop pool = method(thread)->constants(); int index = wide ? get_index_u2(thread, Bytecodes::_ldc_w) : get_index_u1(thread, Bytecodes::_ldc); constantTag tag = pool->tag_at(index); if (tag.is_unresolved_klass() || tag.is_klass()) { klassOop klass = pool->klass_at(index, CHECK); oop java_class = klass->java_mirror(); thread->set_vm_result(java_class); } else { #ifdef ASSERT // If we entered this runtime routine, we believed the tag contained // an unresolved string, an unresolved class or a resolved class. // However, another thread could have resolved the unresolved string // or class by the time we go there. assert(tag.is_unresolved_string()|| tag.is_string(), "expected string"); #endif oop s_oop = pool->string_at(index, CHECK); thread->set_vm_result(s_oop); } IRT_END</span></span>因为这是个字符串常量,代码调用了pool->string_at(index, CHECK) ,最后代码调用了string_at_impl方法
<span style="font-family:SimSun;"><span style="font-family:SimSun;">oop constantPoolOopDesc::string_at_impl(constantPoolHandle this_oop, int which, TRAPS) { oop str = NULL; CPSlot entry = this_oop->slot_at(which); if (entry.is_metadata()) { ObjectLocker ol(this_oop, THREAD); if (this_oop->tag_at(which).is_unresolved_string()) { // Intern string Symbol* sym = this_oop->unresolved_string_at(which); <span style="font-size:14px;">str = StringTable::intern(sym, CHECK_(constantPoolOop(NULL)));</span> this_oop->string_at_put(which, str); } else { // Another thread beat us and interned string, read string from constant pool str = this_oop->resolved_string_at(which); } } else { str = entry.get_oop(); } assert(java_lang_String::is_instance(str), "must be string"); return str; }</span>在代码中,我们可以看到在没有调用ldc 之前,字符串常量值是用symbol 来表示的,而当调用ldc之后,通过调用StringTable::intern产生了String的引用,并且存放在常量池中。如果再调用ldc指令的话,直接从常量池根据索引取出String的引用(this_oop->resolved_string_at(which)),而避免再次从StringTable中去查找一次。
以此方法来分析。
1、堆中new一个String变量,s持有其堆中引用,并且会在常量池中生成一个”1“对象。
2、调用s.intern()方法,最终调用StringTable.intern(),试图将变量s的引用加入到常量池中,发现其已经存在。
3、s2="1",查找常量池中是否有”1“,有,则返回常量池中”1“的引用 保存在 s2中。
3、s2="1",查找常量池中是否有”1“,有,则返回常量池中”1“的引用 保存在 s2中。
4、所以 s==s2 结果为false。
5、s3 = new String("1")+new String("1"); 首先 会在堆中生成String对象 并在常量池中生成”1“。我们知道jvm会使用StringBuilder来优化使用”+“的字符串生成。语句执行完成后,堆中有 String "11"的对象,而常量池中并没有。
6、s3.intern()将其加入常量池,jdk7开始,不再复制常量值,与堆栈中的s3相同,常量池中保存s3在堆中的引用。
7、s4 = "11",调用ldc命令,查询常量池,存在,直接返回其引用。所以 s3==s4.
大致就是这样子,后来搜索了一下,发现同学也是在一篇博客中看到的,深入解析String#intern,讲解的很细致,推荐大家看看,本篇对其也有参考,另外,参考了 Java (JDK7)中的String常量和String.intern的实现,因为String的intern()方法使用hashTable,故数据量比较大的时候会出现较多的哈希冲突,链接法效率较低,所以会经常出现性能问题,这方面暂不讨论,上述博客有分析到,大家自己去看看并探索吧。