python字符串格式化之學習筆記

2019-11-14 17:32:46

字體：大中小

來源：轉載

供稿：網友

在python中格式化輸出字符串使用的是%運算符，通用的形式為

•格式標記字符串 % 要輸出的值組
其中，左邊部分的”格式標記字符串“可以完全和c中的一致。右邊的'值組'如果有兩個及以上的值則需要用小括號括起來，中間用短號隔開。重點來看左邊的部分。左邊部分的最簡單形式為：

•%cdoe
其中的code有多種，不過由于在python中，所有東西都可以轉換成string類型，因此，如果沒有什么特殊需求完全可以全部使用’%s‘來標記。比如：

•'%s %s %s' % (1, 2.3, ['one', 'two', 'three'])
它的輸出為'1 2.3 ['one', 'two', 'three']'，就是按照%左邊的標記輸出的。雖然第一個和第二值不是string類型，一樣沒有問題。在這個過程中，當電腦發現第一個值不是%s時，會先調用整型數的函數，把第一個值也就是1轉成string類型，然后再調用str()函數來輸出。前面說過還有一個rePR()函數，如果要用這個函數，可以用%r來標記。除了%s外，還有很多類似的code:

字符串格式化：

代碼如下復制代碼
format = “hello %s, %s enough for ya?”
values = (‘world’,'hot’)
print format % values
結果：hello world, hot enough for ya?

注：2.7可以。3.0不行

3.0要用print(format % values) 要用括號括起來。

與php類似但函數或方法名不一樣的地方：

explode/" target="_blank">php explode=> python split
php trim => python strip
php implode => python join

工作中格式化字符串時遇到了UnicodeDecodeError的異常，所以研究下字符串格式化的相關知識和大家分享。

代碼如下復制代碼
C:Userszhuangyan>python
Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> a = '你好世界'
>>> print 'Say this: %s' % a
Say this: 你好世界
>>> print 'Say this: %s and say that: %s' % (a, 'hello world')
Say this: 你好世界 and say that: hello world
>>> print 'Say this: %s and say that: %s' % (a, u'hello world')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 10: ordinal
not in range(128)

看到print 'Say this: %s and say that: %s' % (a, u'hello world') 這句報的UnicodeDecodeError錯誤了嗎，和上句的區別只是把'hello world'改成 u'hello world'的原因，str對象變成了unicode對象。但問題是，’hello world’只是單純的英文字符串，不包含任何ASCII之外的字符，怎么會無法decode(www.111cn.net)呢？再仔細看看異常附帶的message，里面提到了0xe4，這個顯然不是’hello world‘里面的，所以只能懷疑那句中文了。

>>> a 'xc4xe3xbaxc3xcaxc0xbdxe7'

把它的字節序列打印了出來，果然就是它，第一個就是0xe4。

看來在字符串格式化的時候Python試圖將a decode成unicode對象，并且decode時用的還是默認的ASCII編碼而非實際的UTF-8編碼。那這又是怎么回事呢？？下面繼續我們的試驗：

代碼如下復制代碼
>>> 'Say this: %s' % 'hello'
'Say this: hello'
>>> 'Say this: %s' % u'hello'
u'Say this: hello'
>>>

仔細看，’hello’是普通的字符串，結果也是字符串（str對象），u’hello’變成了unicode對象，格式化的結果也變成unicode了（注意結果開頭的那個u）。

看看Python文檔怎么說的：

If format is a Unicode object, or if any of the objects being converted using the %s conversion are Unicode objects, the result will also be a Unicode object.

如果代碼里混合著str和unicode，這種問題很容易出現。在同事的代碼里，中文字符串是用戶輸入的，經過了正確的編碼處理，是以UTF-8編碼的str對象；但那個惹事的unicode對象，雖然其內容都是ASCII碼，但其來源是sqlite3數據庫查詢的結果，而sqlite的API返回的字符串都是unicode對象，所以導致了這么怪異的結果。

最后我測試用format格式字符串的方式不會出現上述異常！

代碼如下復制代碼
>>> print 'Say this:{0} and say that:{1}'.format(a,u'hello world')
Say this:你好世界 and say that:hello world

接下來我們研究下format的基本用法。

代碼如下復制代碼
>>> '{0}, {1}, {2}'.format('a', 'b', 'c')
'a, b, c'
>>> '{2}, {1}, {0}'.format('a', 'b', 'c')
'c, b, a'
>>> '{2}, {1}, {0}'.format(*'abc') # unpacking argument sequence
'c, b, a'
>>> '{0}{1}{0}'.format('abra', 'cad') # arguments' indices can be repeated
'abracadabra'
>>> 'Coordinates: {latitude}, {longitude}'.format(latitude='37.24N', longitude='-115.81W')
'Coordinates: 37.24N, -115.81W'
>>> coord = {'latitude': '37.24N', 'longitude': '-115.81W'}
>>> 'Coordinates: {latitude}, {longitude}'.format(**coord)
'Coordinates: 37.24N, -115.81W'
>>> coord = (3, 5)
>>> 'X: {0[0]}; Y: {0[1]}'.format(coord)
'X: 3; Y: 5'