英文:
What is the best way to retrieve all values from a Cassandra database table when all partition keys are unique?
问题
为了我的应用程序,我有一个如下所示的表:
create table companies(id uuid, name text, ...., primary key((id)));
现在,为了我的管理面板和后台作业,我需要能够检索所有公司并在我的代码中循环遍历所有行。对于这个问题,最好的方法是什么?
我知道我可以执行一个没有分区(主键)的选择查询,但这很糟糕,因为它会联系到数据中心中的所有节点,因为每一行都存储在随机节点上。
select * from companies;
现在,我可以做的一件事是创建一个虚拟键,对于每一行都是相同的,并且所有行都存储在相同的分区中,但这也很糟糕,因为表会增长,可能会超过1,000行。
create table companies(fake_key text, id uuid, name text, ... primary key((fake_key), id));
insert into companies(fake_key, id, name) values ('app', uuid(), 'company_a');
insert into companies(fake_key, id, name) values ('app', uuid(), 'company_b');
我是否应该在例如MySQL数据库中创建一个表,并在每次在Cassandra中创建新公司行时在MySQL数据库中创建一个新行?
英文:
For my application I have a table as following:
create table companies(id uuid, name text, ...., primary key((id)));
Now for my admin panel and for background jobs I need to be able to retrieve all of the companies and loop through all the rows in my code. What would be the best approach for this?
I know I can just perform a select query without a partition (primary) key but this is bad because it will contact all nodes since every row is stored on random nodes in the datacenter.
select * from companies;
Now one thing I could do is create a dummy key which will always be the same for every row and all the rows will be stored in the same partition, but this is also very bad since the table will grow and could reach more than 1k rows.
create table companies(fake_key text, id uuid, name text, ... primary key((fake_key), id));
insert into companies(fake_key, id, name) values ('app', uuid(), 'company_a');
insert into companies(fake_key, id, name) values ('app', uuid(), 'company_b');
Should I create a table in for example a MySQL database and create a new row in the MySQL database every time I create a new companies row in Cassandra?
答案1
得分: 2
抱歉,没有一种适合所有情况的解决方案,因为您有一个特殊的用例。
如果性能很重要,那么将其存储在一个具有聚集行的单个分区中是一个选择。正如您已经指出的那样,这个解决方案可能存在问题,因为如果分区无限增长,它就无法扩展。然而,1-2K行不会太糟糕,因为您实际上只是存储公司名称。
将数据存储在另一个关系数据库中并不会带来太多好处,因为它只会增加您应用程序的复杂性,还需要处理管理另一个基础架构的挑战。祝好运!
英文:
Unfortunately, there isn't going to be a one-size-fits-all solution since you have a special use case.
If performance matters then store it in a single partition with clustered rows. As you already pointed out, it can be problematic since this solution won't scale if the partition grows unbounded. However, 1-2K rows isn't going to be so bad since you're only really storing the company names.
Storing the data in another relational DB isn't going to be much of a benefit since it'll just increase the level of complexity in your app plus the fact that you need to contend with the challenges of a managing another infrastructure. Cheers!
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论