JDK 9学习笔记 - (2)能屈能伸的String
背景
String作为JDK最核心的数据类型之一,非常有必要专门学习一下,重点关注这4个文件
- jdk/src/java.base/share/native/libjava/String.c
- jdk/src/java.base/share/classes/java/lang/String.java
- jdk/src/java.base/share/classes/java/lang/StringLatin1.java
- jdk/src/java.base/share/classes/java/lang/StringUTF16.java
存储
无论是何种语言的何种实现,String本质上都是字节序列,所有可能的字符加起来就构成了字符集,给字符集中每个字符一个序号就是字符编码,使用最广泛的就是Unicode了,它几乎支持地球上所有常见文字,Unicode有三种最主要的实现,UTF-8,UTF-16还有UTF-32,在web领域,UTF-8已经处于绝对垄断地位。Java 9的String,引入了类似Python str的压缩功能。原理很简单,如果String只包含Latin1字符,1字节存一个字符够用了,如果String含有中文,那么就换一种编码方式存储,用一个变量表示当前的字符集就行了。先看三个类。
public final class String
implements java.io.Serializable, Comparable<String>, CharSequence {
/**
* The value is used for character storage.
*
* @implNote This field is trusted by the VM, and is a subject to
* constant folding if String instance is constant. Overwriting this
* field after construction will cause problems.
*
* Additionally, it is marked with {@link Stable} to trust the contents
* of the array. No other facility in JDK provides this functionality (yet).
* {@link Stable} is safe here, because value is never null.
*/
@Stable
private final byte[] value;
/**
* The identifier of the encoding used to encode the bytes in
* {@code value}. The supported values in this implementation are
*
* LATIN1
* UTF16
*
* @implNote This field is trusted by the VM, and is a subject to
* constant folding if String instance is constant. Overwriting this
* field after construction will cause problems.
*/
private final byte coder;
/** Cache the hash code for the string */
private int hash; // Default to 0
}
final class StringLatin1 {
}
final class StringUTF16 {
}
字符序列存储在字节数组value中,然后用一个字节的coder表示编码,这是String的基本构成。然后StringLatin1提供一组静态方法,用来处理只含有Latin1字
文章被以下专栏收录
![放码过来](https://picx.zhimg.com/v2-6dea2ac70be6c8dcc5e9f2cb0bb50c33_l.jpg?source=172ae18b)