英文:
Cassandra data modeling for server time series metrics
问题
最好的数据模型是什么?
我应该将每种类型的指标存储在单独的表中还是全部存储在一个表中?
英文:
I gonna collect server metrics, such as OS metrics, Java processes info etc, every second.
For example JSON:
{
"localhost": {
"os": {
"cpu": 4,
"memory": 16
},
"java": {
"jvm": {
"vendor": "Oracle"
},
"heap": 4,
"version": 1.8
}
}
}
What is the best data model for such kind of data?
Should I store every type of metrics in separate table or all in one?
答案1
得分: 1
一种选择是将每个指标单独翻译成点分字符串,以便您的JSON:
{
"localhost": {
"os": {
"cpu": 4,
"memory": 16
},
"java": {
"jvm": {
"vendor": "Oracle"
},
"heap": 4,
"version": 1.8
}
}
}
转化为:
Host Key Value
localhost os.cpu 4
localhost os.memory 16
localhost java.jvm.vendor Oracle
localhost java.heap 4
localhost java.version 1.8
上面未显示时间戳列。主键将是主机+键+时间戳。如果您不需要能够按独立主机查询,您可以将主机名合并到键中,例如key=localhost.os.cpu。
您的查询需求的具体细节将严重影响是否选择这种方式。
英文:
One option would be to translate each individual metric into a dotted string, so that your JSON:
{
"localhost": {
"os": {
"cpu": 4,
"memory": 16
},
"java": {
"jvm": {
"vendor": "Oracle"
},
"heap": 4,
"version": 1.8
}
}
}
Turns into this:
Host Key Value
localhost os.cpu 4
localhost os.memory 16
localhost java.jvm.vendor Oracle
localhost java.heap 4
localhost java.version 1.8
Not shown above is a timestamp column. The primary key would be host+key+timestamp. If you don't need to be able to query by individual host, you could lump the hostname into the key, i.e. key=localhost.os.cpu.
The precise details of your querying needs weigh heavily into whether this is the right choice.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论