String內存陷阱簡介

2019-11-14 23:05:55

字體：大中小

來源：轉載

供稿：網友

String內存陷阱簡介

String 方法用于文本分析及大量字符串處理時會對內存性能造成一些影響。可能導致內存占用太大甚至OOM。

一、先介紹一下String對象的內存占用

一般而言，java 對象在虛擬機的結構如下：•對象頭（object header）：8 個字節（保存對象的 class 信息、ID、在虛擬機中的狀態）•Java 原始類型數據：如 int, float, char 等類型的數據•引用（reference）：4 個字節•填充符（padding）

String定義：

JDK6:PRivate final char value[];private final int offset;private final int count;private int hash;

JDK6的空字符串所占的空間為40字節

JDK7:private final char value[];private int hash;private transient int hash32;

JDK7的空字符串所占的空間也是40字節

JDK6字符串內存占用的計算方式：首先計算一個空的 char 數組所占空間，在 Java 里數組也是對象，因而數組也有對象頭，故一個數組所占的空間為對象頭所占的空間加上數組長度，即 8 + 4 = 12 字節 , 經過填充后為 16 字節。

那么一個空 String 所占空間為：

對象頭（8 字節）+ char 數組（16 字節）+ 3 個 int（3 × 4 = 12 字節）+1 個 char 數組的引用 (4 字節 ) = 40 字節。

因此一個實際的 String 所占空間的計算公式如下：

8*( ( 8+12+2*n+4+12)+7 ) / 8 = 8*(int) ( ( ( (n) *2 )+43) /8 )

其中，n 為字符串長度。

二、舉個例子：

1、substring

package demo;import java.io.BufferedReader;import java.io.File;import java.io.FileInputStream;import java.io.InputStreamReader;public class TestBigString{    private String strsub;    private String strempty = new String();    public static void main(String[] args) throws Exception    {        TestBigString obj = new TestBigString();        obj.strsub = obj.readString().substring(0,1);        Thread.sleep(30*60*1000);    }    private String readString() throws Exception    {        BufferedReader bis = null;        try        {            bis = new BufferedReader(new InputStreamReader(new FileInputStream(newFile("d://teststring.txt"))));            StringBuilder sb = new StringBuilder();            String line = null;            while((line = bis.readLine()) != null)            {                sb.append(line);            }            System.out.println(sb.length());            return sb.toString();        }        finally        {            if (bis != null)            {                bis.close();            }        }    }}

其中文件"d://teststring.txt"里面有33475740個字符，文件大小有35M。

用JDK6來運行上面的代碼，可以看到strsub只是substring(0,1)只取一個，count確實只有1，但其占用的內存卻高達接近67M。

然而用JDK7運行同樣的上面的代碼，strsub對象卻只有40字節

什么原因呢？

來看下JDK的源碼：

JDK6：

 1 public String substring(int beginIndex, int endIndex) { 2  3     if (beginIndex < 0) { 4  5         throw new StringIndexOutOfBoundsException(beginIndex); 6  7     } 8  9     if (endIndex > count) {10 11         throw new StringIndexOutOfBoundsException(endIndex);12 13     }14 15     if (beginIndex > endIndex) {16 17         throw new StringIndexOutOfBoundsException(endIndex - beginIndex);18 19     }20 21     return ((beginIndex == 0) && (endIndex == count)) ? this :22 23         new String(offset + beginIndex, endIndex - beginIndex, value);24 25 }26 27 // Package private constructor which shares value array for speed.28 29     String(int offset, int count, char value[]) {30 31     this.value = value;32 33     this.offset = offset;34 35     this.count = count;36 37 }

JDK7:

 1 public String substring(int beginIndex, int endIndex) { 2  3         if (beginIndex < 0) { 4  5             throw new StringIndexOutOfBoundsException(beginIndex); 6  7         } 8  9         if (endIndex > value.length) {10 11             throw new StringIndexOutOfBoundsException(endIndex);12 13         }14 15         int subLen = endIndex - beginIndex;16 17         if (subLen < 0) {18 19             throw new StringIndexOutOfBoundsException(subLen);20 21         }22 23         return ((beginIndex == 0) && (endIndex == value.length)) ? this24 25                 : new String(value, beginIndex, subLen);26 27 }28 29 public String(char value[], int offset, int count) {30 31         if (offset < 0) {32 33             throw new StringIndexOutOfBoundsException(offset);34 35         }36 37         if (count < 0) {38 39             throw new StringIndexOutOfBoundsException(count);40 41         }42 43         // Note: offset or count might be near -1>>>1.44 45         if (offset > value.length - count) {46 47             throw new StringIndexOutOfBoundsException(offset + count);48 49         }50 51         this.value = Arrays.copyOfRange(value, offset, offset+count);52 53     }

可以看到原來是因為JDK6的String.substring()所返回的 String 仍然會保存原始 String的引用，所以原始String無法被釋放掉，因而導致了出乎意料的大量的內存消耗。

JDK6這樣設計的目的其實也是為了節約內存，因為這些 String 都復用了原始 String，只是通過 int 類型的 offerset, count 等值來標識substring后的新String。

然而對于上面的例子，從一個巨大的 String 截取少數 String 為以后所用，這樣的設計則造成大量冗余數據。因此有關通過 String.split()或 String.substring()截取 String 的操作的結論如下：

•對于從大文本中截取少量字符串的應用，String.substring()將會導致內存的過度浪費。•對于從一般文本中截取一定數量的字符串，截取的字符串長度總和與原始文本長度相差不大，現有的 String.substring()設計恰好可以共享原始文本從而達到節省內存的目的。

既然導致大量內存占用的根源是 String.substring()返回結果中包含大量原始 String，那么一個減少內存浪費的的途徑就是去除這些原始 String。如再次調用 newString構造一個的僅包含截取出的字符串的 String，可調用 String.toCharArray()方法：

String newString = new String(smallString.toCharArray());

2、同樣，再看看split方法

 1 public class TestBigString 2  3 { 4  5     private String strsub; 6  7     private String strempty = new String(); 8  9     private String[] strSplit;10 11     public static void main(String[] args) throws Exception12 13     {14 15         TestBigString obj = new TestBigString();16 17         obj.strsub = obj.readString().substring(0,1);18 19         obj.strSplit = obj.readString().split("Address:",5);20 21         Thread.sleep(30*60*1000);22 23     }

JDK6中分割的字符串數組中，每個String元素占用的內存都是原始字符串的內存大小(67M):

而JDK7中分割的字符串數組中，每個String元素都是實際的內存大小:

原因：

JDK6源代碼：

 1 public String[] split(String regex, int limit) { 2  3     return Pattern.compile(regex).split(this, limit); 4  5     } 6  7 public String[] split(CharSequence input, int limit) { 8  9         int index = 0;10 11         boolean matchLimited = limit > 0;12 13         ArrayList<String> matchList = new ArrayList<String>();14 15         Matcher m = matcher(input);16 17         // Add segments before each match found18 19         while(m.find()) {20 21             if (!matchLimited || matchList.size() < limit - 1) {22 23                 String match = input.subSequence(index, m.start()).toString();24 25                 matchList.add(match);26 27 public CharSequence subSequence(int beginIndex, int endIndex) {28 29         return this.substring(beginIndex, endIndex);30 31     }

三、其他方面：

1、String a1 = “Hello”; //常量字符串，JVM默認都已經intern到常量池了。創建字符串時 JVM 會查看內部的緩存池是否已有相同的字符串存在：如果有，則不再使用構造函數構造一個新的字符串，直接返回已有的字符串實例；若不存在，則分配新的內存給新創建的字符串。String a2 = new String(“Hello”); //每次都創建全新的字符串

2、在拼接靜態字符串時，盡量用 +，因為通常編譯器會對此做優化。

1 public String constractStr()2 3     {4 5         return "str1" + "str2" + "str3";6 7 }

對應的字節碼：

Code:

0: ldc #24; //String str1str2str3 --將字符串常量壓入棧頂

2: areturn

3、在拼接動態字符串時，盡量用 StringBuffer 或 StringBuilder的 append，這樣可以減少構造過多的臨時 String 對象（javac編譯器會對String連接做自動優化）：

1 public String constractStr(String str1, String str2, String str3)2 3     {4 5         return str1 + str2 + str3;6 7 }

對應字節碼（JDK1.5之后轉換為調用StringBuilder.append方法）：

Code:

0:   new     #24; //class java/lang/StringBuilder3:   dup4:   aload_15:   invokestatic    #26; //Method java/lang/String.valueOf:(Ljava/lang/Object;)Ljava/lang/String;8:   invokespecial   #32; //Method java/lang/StringBuilder."<init>":(Ljava/lang/String;)V11:  aload_212:  invokevirtual   #35; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;15:  aload_316:  invokevirtual   #35; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;  ――調用StringBuilder的append方法19:  invokevirtual   #39; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;22:  areturn     ――返回引用

上一篇：Java多線程技術學習筆記（一）

下一篇：java正則表達式