Google Protocol Buffer 簡單介紹

2019-11-14 23:22:05

字體：大中小

來源：轉(zhuǎn)載

供稿：網(wǎng)友

Google PRotocol Buffer 簡單介紹

以下內(nèi)容主要整理自官方文檔。

為什么使用 Protocol Buffers
.proto文件
- Protocol Buffers 語法
編譯.proto文件
Protocol Buffers API
枚舉和嵌套類
Builders vs. Messages
解析和序列化
- Writing A Message
- Reading A Message
擴展協(xié)議
編碼
對比xml 和 JSON
- 數(shù)據(jù)大小
- 序列化性能
- 解析性能

為什么使用 Protocol Buffers

通常序列化和解析結(jié)構(gòu)化數(shù)據(jù)的幾種方式？

使用java默認的序列化機制。這種方式缺點很明顯：性能差、跨語言性差。
將數(shù)據(jù)編碼成自己定義的字符串格式。簡單高效，但是僅適合比較簡單的數(shù)據(jù)格式。
使用XML序列化。比較普遍的做法，優(yōu)點很明顯，人類可讀，擴展性強，自描述。但是相對來說XML結(jié)構(gòu)比較冗余，解析起來比較復(fù)雜性能不高。

Protocol Buffers是一個更靈活、高效、自動化的解決方案。它通過一個.proto文件描述你想要的數(shù)據(jù)結(jié)構(gòu)，它能夠自動生成解析這個數(shù)據(jù)結(jié)構(gòu)的Java類，這個類提供高效的讀寫二進制格式數(shù)據(jù)的API。最重要的是Protocol Buffers的擴展性和兼容性很強，只要遵很少的規(guī)則就可以保證向前和向后兼容。

.proto文件

package tutorial;option java_package = "com.example.tutorial";option java_outer_classname = "AddressBookProtos";message Person {  required string name = 1;  required int32 id = 2;  optional string email = 3;  enum PhoneType {    MOBILE = 0;    HOME = 1;    WORK = 2;  }  message PhoneNumber {    required string number = 1;    optional PhoneType type = 2 [default = HOME];  }  repeated PhoneNumber phone = 4;}message AddressBook {  repeated Person person = 1;}

Protocol Buffers 語法

.proto文件的語法跟Java的很相似，message相當(dāng)于class，enum即枚舉類型，基本的數(shù)據(jù)類型有bool,int32,float,double, 和string，類型前的修飾符有：

required 必需的字段
optional 可選的字段
repeated 重復(fù)的字段

NOTE 1: 由于歷史原因，數(shù)值型的repeated字段后面最好加上[packed=true]，這樣能達到更好的編碼效果。 repeated int32 samples = 4 [packed=true];

NOTE 2: Protocol Buffers不支持map，如果需要的話只能用兩個repeated代替：keys和values。

字段后面的1,2,3…是它的字段編號（tag number），注意這個編號在后期協(xié)議擴展的時候不能改動。[default = HOME]即默認值。為了避免命名沖突，每個.proto文件最好都定義一個package，package用法和Java的基本類似，也支持import。

import "myproject/other_protos.proto";

擴展

PB語法雖然跟Java類似，但是它并沒有繼承機制，它有所謂的Extensions，這很不同于我們原來基于面向?qū)ο蟮?code>JavaBeans式的協(xié)議設(shè)計。

Extensions就是我們定義message的時候保留一些field number讓第三方去擴展。

message Foo {  required int32 a = 1;  extensions 100 to 199;}

message Bar {    optional string name =1;    optional Foo foo = 2;} extend Foo {    optional int32 bar = 102;}

也可以嵌套：

message Bar {    extend Foo {    optional int32 bar = 102;    }    optional string name =1;    optional Foo foo = 2;}

Java中設(shè)置擴展的字段：

BarProto.Bar.Builder bar = BarProto.Bar.newBuilder();bar.setName("zjd");        FooProto.Foo.Builder foo = FooProto.Foo.newBuilder();foo.setA(1);foo.setExtension(BarProto.Bar.bar,12);        bar.setFoo(foo.build());System.out.println(bar.getFoo().getExtension(BarProto.Bar.bar));

個人覺得使用起來非常不方便。

有關(guān)PB的語法的詳細說明，建議看官方文檔。PB的語法相對比較簡單，一旦能嵌套就能定義出非常復(fù)雜的數(shù)據(jù)結(jié)構(gòu)，基本可以滿足我們所有的需求。

編譯.proto文件

可以用Google提供的一個proto程序來編譯，Windows版本下載protoc.exe。基本使用如下：

protoc.exe -I=$SRC_DIR --java_out=$DST_DIR $SRC_DIR/addressbook.proto

.proto文件中的java_package和java_outer_classname定義了生成的Java類的包名和類名。

Protocol Buffers API

AddressBookProtos.java中對應(yīng).proto文件中的每個message都會生成一個內(nèi)部類：AddressBook和Person。每個類都有自己的一個內(nèi)部類Builder用來創(chuàng)建實例。messages只有getter只讀方法，builders既有getter方法也有setter方法。

Person

// required string name = 1;public boolean hasName();public String getName();// required int32 id = 2;public boolean hasId();public int getId();// optional string email = 3;public boolean hasEmail();public String getEmail();// repeated .tutorial.Person.PhoneNumber phone = 4;public List<PhoneNumber> getPhoneList();public int getPhoneCount();public PhoneNumber getPhone(int index);

Person.Builder

// required string name = 1;public boolean hasName();public java.lang.String getName();public Builder setName(String value);public Builder clearName();// required int32 id = 2;public boolean hasId();public int getId();public Builder setId(int value);public Builder clearId();// optional string email = 3;public boolean hasEmail();public String getEmail();public Builder setEmail(String value);public Builder clearEmail();// repeated .tutorial.Person.PhoneNumber phone = 4;public List<PhoneNumber> getPhoneList();public int getPhoneCount();public PhoneNumber getPhone(int index);public Builder setPhone(int index, PhoneNumber value);public Builder addPhone(PhoneNumber value);public Builder addAllPhone(Iterable<PhoneNumber> value);public Builder clearPhone();

除了JavaBeans風(fēng)格的getter-setter方法之外，還會生成一些其他getter-setter方法：

has_ 非repeated的字段都有一個這樣的方法來判斷字段值是否設(shè)置了還是取的默認值。
clear_ 每個字段都有1個clear方法用來清理字段的值為空。
_Count 返回repeated字段的個數(shù)。
addAll_ 給repeated字段賦值集合。
repeated字段還有根據(jù)index設(shè)置和讀取的方法。

枚舉和嵌套類

message嵌套message會生成嵌套類，enum會生成未Java 5的枚舉類型。

public static enum PhoneType {  MOBILE(0, 0),  HOME(1, 1),  WORK(2, 2),  ;  ...}

Builders vs. Messages

所有的messages生成的類像Java的string一樣都是不可變的。要實例化一個message必須先創(chuàng)建一個builder，修改message類只能通過builder類的setter方法修改。每個setter方法會返回builder自身，這樣就能在一行代碼內(nèi)完成所有字段的設(shè)置：

Person john =  Person.newBuilder()    .setId(1234)    .setName("John Doe")    .setEmail("jdoe@example.com")    .addPhone(      Person.PhoneNumber.newBuilder()        .setNumber("555-4321")        .setType(Person.PhoneType.HOME))    .build();

每個message和builder提供了以下幾個方法：

isInitialized(): 檢查是否所有的required字段都已經(jīng)設(shè)置；
toString(): 返回一個人類可讀的字符串，這在debug的時候很有用；
mergeFrom(Message other): 只有builder有該方法，合并另外一個message對象，非repeated字段會覆蓋，repeated字段則合并兩個集合。
clear(): 只有builder有該方法，清除所有字段回到空值狀態(tài)。

解析和序列化

每個message都有以下幾個方法用來讀寫二進制格式的protocol buffer。關(guān)于二進制格式，看這里（可能需要FQ）。

byte[] toByteArray(); 將message序列化為byte[]。
static Person parseFrom(byte[] data); 從byte[]解析出message。
void writeTo(OutputStream output); 序列化message并寫到OutputStream。
static Person parseFrom(InputStream input); 從InputStream讀取并解析出message。

每個Protocol buffer類提供了對于二進制數(shù)據(jù)的一些基本操作，在面向?qū)ο笊厦孀龅牟⒉皇呛芎茫绻枰S富操作或者無法修改.proto文件的情況下，建議在生成的類的基礎(chǔ)上封裝一層。

Writing A Message

import com.example.tutorial.AddressBookProtos.AddressBook;import com.example.tutorial.AddressBookProtos.Person;import java.io.BufferedReader;import java.io.FileInputStream;import java.io.FileNotFoundException;import java.io.FileOutputStream;import java.io.InputStreamReader;import java.io.IOException;import java.io.PrintStream;class AddPerson {  // This function fills in a Person message based on user input.  static Person PromptForAddress(BufferedReader stdin,                                 PrintStream stdout) throws IOException {    Person.Builder person = Person.newBuilder();    stdout.print("Enter person ID: ");    person.setId(Integer.valueOf(stdin.readLine()));    stdout.print("Enter name: ");    person.setName(stdin.readLine());    stdout.print("Enter email address (blank for none): ");    String email = stdin.readLine();    if (email.length() > 0) {      person.setEmail(email);    }    while (true) {      stdout.print("Enter a phone number (or leave blank to finish): ");      String number = stdin.readLine();      if (number.length() == 0) {        break;      }      Person.PhoneNumber.Builder phoneNumber =        Person.PhoneNumber.newBuilder().setNumber(number);      stdout.print("Is this a mobile, home, or work phone? ");      String type = stdin.readLine();      if (type.equals("mobile")) {        phoneNumber.setType(Person.PhoneType.MOBILE);      } else if (type.equals("home")) {        phoneNumber.setType(Person.PhoneType.HOME);      } else if (type.equals("work")) {        phoneNumber.setType(Person.PhoneType.WORK);      } else {        stdout.println("Unknown phone type.  Using default.");      }      person.addPhone(phoneNumber);    }    return person.build();  }  // Main function:  Reads the entire address book from a file,  //   adds one person based on user input, then writes it back out to the same  //   file.  public static void main(String[] args) throws Exception {    if (args.length != 1) {      System.err.println("Usage:  AddPerson ADDRESS_BOOK_FILE");      System.exit(-1);    }    AddressBook.Builder addressBook = AddressBook.newBuilder();    // Read the existing address book.    try {      addressBook.mergeFrom(new FileInputStream(args[0]));    } catch (FileNotFoundException e) {      System.out.println(args[0] + ": File not found.  Creating a new file.");    }    // Add an address.    addressBook.addPerson(      PromptForAddress(new BufferedReader(new InputStreamReader(System.in)),                       System.out));    // Write the new address book back to disk.    FileOutputStream output = new FileOutputStream(args[0]);    addressBook.build().writeTo(output);    output.close();  }}

View CodeReading A Message

import com.example.tutorial.AddressBookProtos.AddressBook;import com.example.tutorial.AddressBookProtos.Person;import java.io.FileInputStream;import java.io.IOException;import java.io.PrintStream;class ListPeople {  // Iterates though all people in the AddressBook and prints info about them.  static void Print(AddressBook addressBook) {    for (Person person: addressBook.getPersonList()) {      System.out.println("Person ID: " + person.getId());      System.out.println("  Name: " + person.getName());      if (person.hasEmail()) {        System.out.println("  E-mail address: " + person.getEmail());      }      for (Person.PhoneNumber phoneNumber : person.getPhoneList()) {        switch (phoneNumber.getType()) {          case MOBILE:            System.out.print("  Mobile phone #: ");            break;          case HOME:            System.out.print("  Home phone #: ");            break;          case WORK:            System.out.print("  Work phone #: ");            break;        }        System.out.println(phoneNumber.getNumber());      }    }  }  // Main function:  Reads the entire address book from a file and prints all  //   the information inside.  public static void main(String[] args) throws Exception {    if (args.length != 1) {      System.err.println("Usage:  ListPeople ADDRESS_BOOK_FILE");      System.exit(-1);    }    // Read the existing address book.    AddressBook addressBook =      AddressBook.parseFrom(new FileInputStream(args[0]));    Print(addressBook);  }}

View Code擴展協(xié)議

實際使用過程中，.proto文件可能經(jīng)常需要進行擴展，協(xié)議擴展就需要考慮兼容性的問題，Protocol Buffers有良好的擴展性，只要遵守一些規(guī)則：

不能修改現(xiàn)有字段的tag number；
不能添加和刪除required字段；
可以刪除optional和repeated字段；
可以添加optional和repeated字段，但是必須使用新的tag number。

向前兼容（老代碼處理新消息）：老的代碼會忽視新的字段，刪除的option字段會取默認值，repeated字段會是空集合。

向后兼容（新代碼處理老消息）：對新的代碼來說可以透明的處理老的消息，但是需要謹記新增的字段在老消息中是沒有的，所以需要顯示的通過has_方法判斷是否設(shè)置，或者在新的.proto中給新增的字段設(shè)置合理的默認值，對于可選字段來說如果.proto中沒有設(shè)置默認值那么會使用類型的默認值，字符串為空字符串，數(shù)值型為0，布爾型為false。

注意對于新增的repeated字段來說因為沒有has_方法，所以如果為空的話是無法判斷到底是新代碼設(shè)置的還是老代碼生成的原因。

建議字段都設(shè)置為optional，這樣擴展性是最強的。

編碼

英文好的可以直接看官方文檔，但我覺得博客園上這篇文章說的更清楚點。

總的來說Protocol Buffers的編碼的優(yōu)點是非常緊湊、高效，占用空間很小，解析很快，非常適合移動端。缺點是不含有類型信息，不能自描述（使用一些技巧也可以實現(xiàn)），解析必須依賴.proto文件。

Google把PB的這種編碼格式叫做wire-format。

PB的緊湊得益于Varint這種可變長度的整型編碼設(shè)計。

（圖片轉(zhuǎn)自http://m.survivalescaperooms.com/shitouer/archive/2013/04/12/google-protocol-buffers-encoding.html）

對比XML 和 JSON數(shù)據(jù)大小

我們來簡單對比下Protocol Buffer和XML、JSON。

.proto

message Request {  repeated string str = 1;  repeated int32 a = 2;}

JavaBean

public class Request {    public List<String> strList;    public List<Integer> iList;}

首先我們來對比生成數(shù)據(jù)大小。測試代碼很簡單，如下：

public static void main(String[] args) throws Exception {    int n = 5;    String str = "testtesttesttesttesttesttesttest";    int val = 100;    for (int i = 1; i <=n; i++) {        for (int j = 0; j < i; j++) {            str += str;        }        protobuf(i, (int) Math.pow(val, i), str);        serialize(i, (int) Math.pow(val, i), str);        System.out.println();    }}public static void protobuf(int n, int in, String str) {    RequestProto.Request.Builder req = RequestProto.Request.newBuilder();    List<Integer> alist = new ArrayList<Integer>();    for (int i = 0; i < n; i++) {        alist.add(in);    }    req.addAllA(alist);    List<String> strList = new ArrayList<String>();    for (int i = 0; i < n; i++) {        strList.add(str);    }    req.addAllStr(strList);    // System.out.println(req.build());    byte[] data = req.build().toByteArray();    System.out.println("protobuf size:" + data.length);}public static void serialize(int n, int in, String str) throws Exception {    Request req = new Request();    List<String> strList = new ArrayList<String>();    for (int i = 0; i < n; i++) {        strList.add(str);    }    req.strList = strList;    List<Integer> iList = new ArrayList<Integer>();    for (int i = 0; i < n; i++) {        iList.add(in);    }    req.iList = iList;    String xml = SerializationInstance.sharedInstance().simpleToXml(req);    // System.out.println(xml);    System.out.println("xml size:" + xml.getBytes().length);    String json = SerializationInstance.sharedInstance().fastToJson(req);    // System.out.println(json);    System.out.println("json size:" + json.getBytes().length);}

View Code

隨著n的增大，int類型數(shù)值越大，string類型的值也越大。我們先將str置為空：