谷歌数据存储 - Blob或文本

2个可能的方式在Google数据存储中保存大字符串的方法是Text和Blob数据类型。谷歌数据存储 - Blob或文本

从存储消费的角度来看，推荐使用哪两种？从protobuf序列化和反序列化的角度来看同样的问题。

2010-06-04 Keyur

这两者之间没有明显的性能差异 - 只要使用最适合您的数据的那个。应该使用BlobProperty来存储二进制数据（例如str对象），而TextProperty应该用于存储任何文本数据（例如，unicode或str对象）。请注意，如果您将str存储在TextProperty中，则它只能包含ASCII字节（小于十六进制80或十进制128）（与BlobProperty不同）。

这两个属性均源自UnindexedProperty，如source中所示。

这里是一个示例应用程序，它表明，有这些ASCII或UTF-8字符串存储开销没有什么区别：

import struct 

from google.appengine.ext import db, webapp 
from google.appengine.ext.webapp.util import run_wsgi_app 

class TestB(db.Model): 
    v = db.BlobProperty(required=False) 

class TestT(db.Model): 
    v = db.TextProperty(required=False) 

class MainPage(webapp.RequestHandler): 
    def get(self): 
     self.response.headers['Content-Type'] = 'text/plain' 

     # try simple ASCII data and a bytestring with non-ASCII bytes 
     ascii_str = ''.join([struct.pack('>B', i) for i in xrange(128)]) 
     arbitrary_str = ''.join([struct.pack('>2B', 0xC2, 0x80+i) for i in xrange(64)]) 
     u = unicode(arbitrary_str, 'utf-8') 

     t = [TestT(v=ascii_str), TestT(v=ascii_str*1000), TestT(v=u*1000)] 
     b = [TestB(v=ascii_str), TestB(v=ascii_str*1000), TestB(v=arbitrary_str*1000)] 

     # demonstrate error cases 
     try: 
      err = TestT(v=arbitrary_str) 
      assert False, "should have caused an error: can't store non-ascii bytes in a Text" 
     except UnicodeDecodeError: 
      pass 
     try: 
      err = TestB(v=u) 
      assert False, "should have caused an error: can't store unicode in a Blob" 
     except db.BadValueError: 
      pass 

     # determine the serialized size of each model (note: no keys assigned) 
     fEncodedSz = lambda o : len(db.model_to_protobuf(o).Encode()) 
     sz_t = tuple([fEncodedSz(x) for x in t]) 
     sz_b = tuple([fEncodedSz(x) for x in b]) 

     # output the results 
     self.response.out.write("text: 1=>%dB 2=>%dB 3=>%dB\n" % sz_t) 
     self.response.out.write("blob: 1=>%dB 2=>%dB 3=>%dB\n" % sz_b) 

application = webapp.WSGIApplication([('/', MainPage)]) 
def main(): run_wsgi_app(application) 
if __name__ == '__main__': main()

这里是输出：

text: 1=>172B 2=>128047B 3=>128047B 
blob: 1=>172B 2=>128047B 3=>128047B

来源

2010-06-04 20:59:37

我没” t知道文本属性只能包含ASCII字节。这种认识回答了我的问题。谢谢。 – Keyur 2010-06-04 21:48:11

这不是真的 - 文本属性存储unicode。但是，如果将一个字节（'raw'）字符串（类型'str'）分配给text属性，它将尝试转换为使用系统默认编码（ASCII码）的unicode。如果你不这样做，你需要明确地解码字符串。 – 2010-06-05 22:49:36

谢谢尼克。我试图说'TextProperty'不能存储包含非ASCII字节的str对象，但是（正如你所指出的），我的评论没有说清楚，所以我删除了它。 – 2010-06-06 01:11:13

谷歌数据存储 - Blob或文本

回答

相关问题