在Datastore中存储长文本

huangapple go评论74阅读模式
英文:

Storing long text in Datastore

问题

Datastore适合存储非常长的文本,例如个人资料描述和文章吗?

如果不适合,有什么Google Cloud的替代方案?

如果适合,为了保持格式,如换行和支持markdown的关键词,最理想的存储方式是什么?是简单地存储为字符串还是转换为字节?我应该担心用户输入中的脏数据吗?

我需要在一个Go项目中使用它(我认为语言不重要,但也许Go有一些对此有用的功能)。

英文:

Is Datastore suitable to store really long text, e.g. profile descriptions and articles?

If not, what's the Google Cloud alternative?

If yes, what would be the ideal way to store it in order to maintain formatting such as linebreaks and markdown supported keywords? Simply store as string or convert to byte? And should I be worried about dirty user input?

I need it for a Go project (I don't think language is relevant, but maybe Go have some useful features for this)

答案1

得分: 4

是的,如果您对某些限制没有意见,那么这是适合的。

这些限制包括:

  • 总实体大小(属性+索引)不能超过1 MB(对于配置文件和大多数文章来说应该是可以的)
  • 长度超过一定限制(目前为1500字节)的文本无法进行索引,因此实体可以存储更长的字符串,但您将无法在其中搜索/包含它在查询过滤器中;不要忘记使用"noindex"标记这些字段

至于类型,您可以简单地使用string,例如:

type Post struct {
    UserID  int64  `datastore:"uid"`
    Content string `datastore:"content,noindex"`
}

string类型保留所有格式,包括换行符、HTML、标记和任何格式。

“脏用户输入?”这是呈现/展示数据的问题。数据存储不会尝试解释它或根据其内容执行任何操作,也不会对其进行转换。因此,从数据存储的角度来看,您无需担心(您不会通过追加文本来创建文本GQL,对吧?!)。

还要注意,如果您要在实体中存储大文本,每当加载/查询此类实体时,将获取这些大文本,并且在修改和(重新)保存此类实体时,您也必须发送它。

**提示1:**如果在某些查询中不需要整个文本,可以使用投影查询来避免“大”数据移动(从而最终加快查询速度)。

**提示2:**为了“减轻”无法对大文本进行索引的负担,您可以添加重复属性,例如大文本的简短摘要或标题,因为长度小于1500字节的string值可以进行索引。

**提示3:**如果您想超过1 MB的实体大小限制,或者只是普遍减少数据存储大小的使用,可以选择在实体内部存储压缩的大文本。由于它们很长,无论如何都无法搜索/过滤它们,但它们非常压缩(通常低于原始大小的40%)。因此,如果您有许多长文本,只需通过存储所有文本压缩,就可以将数据存储大小缩小到原来的三分之一左右。当然,这将增加实体保存/加载时间(因为您必须压缩/解压缩文本),但通常是值得的。

英文:

Yes, it's suitable if you're OK with certain limitations.

These limitations are:

  • the overall entity size (properties + indices) must not exceed 1 MB (this should be OK for profiles and most articles)
  • texts longer than a certain limit (currently 1500 bytes) cannot be indexed, so the entity may store a longer string, but you won't be able to search in it / include it in query filters; don't forget to tag these fields with "noindex"

As for the type, you may simply use string, e.g.:

type Post struct {
	UserID  int64  `datastore:"uid"`
	Content string `datastore:"content,noindex"`
}

string types preserve all formatting, including newlines, HTML, markup and whatever formatting.

"Dirty user input?" That's the issue of rendering / presenting the data. The datastore will not try to interpret it or attempt to perform any action based on its content, nor will transform it. So from the Datastore point of view, you have nothing to worry about (you don't create text GQLs by appending text ever, right?!).

Also note that if you're going to store large texts in your entities, those large texts will be fetched whenever you load / query such entities, and you also must send it when you modify and (re)save such an entity.

Tip #1: Use projection queries if you don't need the whole texts in certain queries to avoid "big" data movement (and so to ultimately speed up queries).

Tip #2: To "ease" the burden of not being able to index large texts, you may add duplicate properties like a short summary or title of the large text, because string values shorter than 1500 bytes can be indexed.

Tip #3: If you want to go over the 1 MB entity size limit, or you just want to generally decrease your datastore size usage, you may opt to store large texts compressed inside entities. Since they are long, you can't search / filter them anyway, but they are very well compressed (often below 40% of the original). So if you have many long texts, you can shrink your datastore size to like 1 third just by storing all texts compressed. Of course this will add to the entity save / load time (as you have to compress / decompress the texts), but often it is still worth it.

huangapple
  • 本文由 发表于 2017年4月3日 19:22:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/43183353.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定