android 漢字轉拼音帶多音字識別功能,供大家參考,具體內容如下
問題來源
在做地名按首字母排序的時候出現了這樣一個bug。長沙會被翻譯拼音成zhangsha,重慶會被翻譯拼音成zhong qing。于是排序出了問題。
漢字轉拼音庫和多音字識別庫
1.多音字對應的詞匯庫
2.文字的二進制大小對應的拼音庫
關鍵代碼
1.我在這里首先將要轉化的文字轉化成對應的”gb2312”編碼。漢字轉化成二進制編碼一般占兩個字節,如果一個字節返回字符,如果是兩個字節算一下偏移量。代碼如下
/** * 漢字轉成ASCII碼 * * @param chs * @return */ private int getChsAscii(String chs) { int asc = 0; try { byte[] bytes = chs.getBytes("gb2312"); if (bytes == null || bytes.length > 2 || bytes.length <= 0) { throw new RuntimeException("illegal resource string"); } if (bytes.length == 1) { asc = bytes[0]; } if (bytes.length == 2) { int hightByte = 256 + bytes[0]; int lowByte = 256 + bytes[1]; asc = (256 * hightByte + lowByte) - 256 * 256; } } catch (Exception e) { System.out.println("ERROR:ChineseSpelling.class-getChsAscii(String chs)" + e); } return asc; }2.將單個漢字獲取的拼音再和多音字庫的hashMap進行比較,代碼如下:
public String getSellingWithPolyphone(String chs){ if(polyphoneMap != null && polyphoneMap.isEmpty()){ polyphoneMap = initDictionary(); } String key, value, resultPy = null; buffer = new StringBuilder(); for (int i = 0; i < chs.length(); i++) { key = chs.substring(i, i + 1); if (key.getBytes().length >= 2) { value = (String) convert(key); if (value == null) { value = "unknown"; } } else { value = key; } resultPy = value; String left = null; if(i>=1 && i+1 <= chs.length()){ left = chs.substring(i-1,i+1); if(polyphoneMap.containsKey(value) && polyphoneMap.get(value).contains(left)){ resultPy = value; } }// if(chs.contains("重慶")){ String right = null; //向右多取一個字,例如 [長]沙 if(i<=chs.length()-2){ right = chs.substring(i,i+2); if(polyphoneMap.containsKey(right)){ resultPy = polyphoneMap.get(right); } }// } String middle = null; //左右各多取一個字,例如 龍[爪]槐 if(i>=1 && i+2<=chs.length()){ middle = chs.substring(i-1,i+2); if(polyphoneMap.containsKey(value) && polyphoneMap.get(value).contains(middle)){ resultPy = value; } } String left3 = null; //向左多取2個字,如 羋月[傳],列車長 if(i>=2 && i+1<=chs.length()){ left3 = chs.substring(i-2,i+1); if(polyphoneMap.containsKey(value) && polyphoneMap.get(value).contains(left3)){ resultPy = value; } } String right3 = null; //向右多取2個字,如 [長]孫無忌 if(i<=chs.length()-3){ right3 = chs.substring(i,i+3); if(polyphoneMap.containsKey(value) && polyphoneMap.get(value).contains(right3)){ resultPy = value; } } buffer.append(resultPy); } return buffer.toString(); }3.將asserts文件內容解析生成HashMap列表.
public HashMap<String, String> initDictionary(){ String fileName = "py4j.dic"; InputStreamReader inputReader = null; BufferedReader bufferedReader = null; HashMap<String, String> polyphoneMap = new HashMap<String, String>(); try{ inputReader = new InputStreamReader(MyApplication.mContext.getResources().getAssets().open(fileName),"UTF-8"); bufferedReader = new BufferedReader(inputReader); String line = null; while((line = bufferedReader.readLine()) != null){ String[] arr = line.split(PINYIN_SEPARATOR); if(isNotEmpty(arr[1])){ String[] dyzs = arr[1].split(WORD_SEPARATOR); for(String dyz: dyzs){ if(isNotEmpty(dyz)){ polyphoneMap.put(dyz.trim(),arr[0]); } } } } }catch(Exception e){ e.printStackTrace(); }finally{ if(inputReader != null){ try { inputReader.close(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } if(bufferedReader != null){ try { bufferedReader.close(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } } return polyphoneMap; }github源碼下載:https://github.com/loveburce/ChinesePolyphone.git
以上就是本文的全部內容,希望對大家的學習有所幫助,也希望大家多多支持武林網。
新聞熱點
疑難解答