分析 Java 中亂碼問題產生的根源

2019-11-18 11:59:57

字體：大中小

來源：轉載

供稿：網友

　　最近用到了字符串的壓縮，用到了GZipInputStream和GZipOutputStream，再次碰到了闊別已久的中文亂碼問題。
　　
　　看了一些相關的文章，覺得我們之所以會碰到這樣那樣的亂碼問題，基本上都是由于我們在某些地方隱含了byte到char的轉換，而這種隱含的轉換采用的是iso-8859-1的編碼進行的。
　　
　　以jsp頁面中文傳遞為例子，假設客戶端的編碼是GB2312,表單中的中文提交后，首先根據GB2312編碼轉換為字節流，到達服務器端后，假如我們直接在servlet中調用request.getParameter(String name)等方法，由于方法返回的是String 對象，所以其中必然隱含了一次從byte到char的轉換，錯誤也就是在這里產生的，假如這次轉換采用的編碼是iso-8859-1，得到的當然是亂碼。
　　
　　public class Login
　　extends HttpServlet {
　　PRivate static final String CONTENT_TYPE = "text/Html; charset=UTF-8";
　　.....
　　//Initialize global variables
　　public void init() throws ServletException {
　　}
　　
　　//Process the HTTP Get request
　　public void doGet(HttpServletRequest request, HttpServletResponse response) throws
　　ServletException, IOException {
　　String name = request.getParameter("userid");//隱含的轉換
　　
　　name = new String(name.getBytes("iso-8859-1"), "GB2312");//還原字節，重新構造
　　
　　response.setContentType(CONTENT_TYPE);
　　PrintWriter out = response.getWriter();
　　out.println("<html>");
　　out.println("<head><title>Login</title></head>");
　　out.println("<body bgcolor=/"#ffffff/">");
　　out.println("<p>The servlet has received a GET. This is the reply.</p>");
　　out.println("</body>");
　　out.println("</html>");
　　out.close();
　　}
　　}
　　
　　幸好，以iso-8859-1進行的默認轉換不會損失字節，也不會增加字節，我們只要按照iso-8859-1的方式返回原來的字節數組，重新按照GB2312的方式進行byte 到char的轉換就可以了。
　　
　　再以壓縮流為例（文件流實際上也是一樣的)
　　
　　public String uncompress(byte[] cmp) {
　　String ret = "";
　　int i;
　　byte[] buf = new byte[512];
　　try {
　　/**
　　*新的方式，始終保持以字節為核心，最后再按照合適的編碼進行組裝
　　*/
　　BufferedInputStream bis = new BufferedInputStream(new GZIPInputStream(new
　　ByteArrayInputStream(cmp)));
　　
　　/**
　　* 以前的方式
　　* 在 new InputStreamReader()的時候發生了隱含的byte到char的轉換，導致之后出來的都是亂碼
　　*/
　　//　　　BufferedReader bis = new BufferedReader(new InputStreamReader(new
　　//　　　　　GZIPInputStream(new
　　//　　　　　　　　　　　　　ByteArrayInputStream(cmp))));
　　
　　ByteArrayOutputStream baos = new ByteArrayOutputStream();
　　BufferedOutputStream bos = new BufferedOutputStream(baos);
　　
　　while ( (i = bis.read(buf)) > 0) {
　　bos.write(buf, 0, i);
　　}
　　bos.close();
　　baos.close();
　　bis.close();
　　ret = new String(baos.toByteArray());//用平臺默認的編碼進行組裝，我是GB2312
　　}
　　catch (IOException ex) {
　　ex.printStackTrace();
　　}
　　
　　return ret;
　　}
　　
　　reader是以字符為核心，inputStream是以byte為核心的，當他們轉換的時候就會進行byte到char的轉換，所以我們要注重自己的調用的順序。
　　
　　我們假如今后再碰到亂碼的問題，就去找找自己是不是什么地方進行了隱含的byte到char的轉換。

上一篇：控制對類內部數據或函數成員訪問的類

下一篇：異常的捕獲和實現自己的異常類