使用LINQ的Join时丢失记录

huangapple go评论72阅读模式
英文:

Losing records when using Join with LINQ

问题

使用C#和LINQ,我正在尝试连接从两个不同表格中搜索的列表。假设我有两个表格,我正在为不同的项目提取不同的详细信息。

请注意,这是概括性的信息,我提取了更多字段,并且对每个列表集合应用了许多特定条件的过滤器。

当我尝试连接这两个列表时,我会丢失一些记录,我认为这是因为映射与 .Join() 的工作方式有关。

因此,它的设置如下:

var _joinedLists = _listOne.Join(_listTwo,
                                 item => item.Name,
                                 details => details.Name,
                                 (item, details) => new
                                 {
                                   itemName = item.Name,
                                   ID = item.ID,
                                   leaderName = details.Leader,
                                   score = details.Score
                                 }).OrderByDescending(x => x.ID).ToArray();

假设某条记录名为“A”在_tableOne中,但在_tableTwo中没有,它不会将“A”连接到_joinedLists列表中,而是将其排除。

主要目标是,我希望在_joinedLists中保留_listOne的所有记录,即使某些相关记录在_listTwo中不存在。

这是否是 .Join() 应该工作的方式,是否有一种方法可以解决这个问题?我尝试过使用 Union,但似乎必须具有相同的字段(或我想返回的类型)才能使用它,而对于我的目的,这是行不通的。我尝试在网上搜索此问题,但找不到其他解决方法,除了 Union/Concat(但正如我所说,对于我的目的,这不起作用)。

谢谢。

英文:

Using C# and LINQ, I am trying to join lists gained from searching two seperate tables. Let's say I have two tables that I am pulling different details for different items.

var _listOne = await _tableOne.Select( x => new
{
   Name = x.Name,
   ID = x.ID,
}).ToArrayAsync();

var _listTwo = await _tableTwo.Select( x => new
{
   Name = x.Name,
   Leader = x.Leader,
   Score = x.Score
}).ToArrayAsync();

Note this is generalized information, I have much more fields I grab and a lot of filters for each set of the list pertaining to certain conditions.

When I try and join these two lists, I lose a few records and I think it's because of how the mapping works with .Join().

So, it's setup like:

var _joinedLists = _listOne.Join(_listTwo,
                                 item => item.Name,
                                 details => details.Name,
                                 (item, details) => new
                                 {
                                   itemName = item.Name,
                                   ID = item.ID,
                                   leaderName = details.Leader,
                                   score = details.Score
                                 }).OrderByDescending(x => x.ID).ToArray();

Let's say some record named "A" is in _tableOne, but "A" is not in _tableTwo, it does not join "A" in the list of _joinedLists, it leaves it out.

The main goal is I want to keep all records of _listOne even if some related record is not in _listTwo in my _joinedLists.

Is this how .Join() is suppose to work, and is there a way around this? I've looked at Union, but it seems to do that you have to have the same fields (or I suppose return types) for each of them and for my purpose that will not work. Tried googling around for this issue, but could not find anything other than Union/Concat (but like I said it does not work for my purpose).

Thanks.

答案1

得分: 1

你要找的是左连接,获取所有左侧源数据并将其与其他源数据连接,无论它们是否匹配。如果没有连接链接(正如你所说),右侧源数据字段将为NULL。

在C#中,左连接的方法是GroupJoin,我会向你展示两种方法。

第一种选择

第一种选择是按照你已经做过的来做,但这种方法的问题在于你将所有来自tableOne的数据都加载到内存中,将这些数据存储在一个数组中,然后获取来自tableTwo的所有数据,并将其也保存在内存中,第三个数组是这些数据的连接。正如你所见,我们有两个可以包含成千上万条数据并且没有用处的集合。所以我已经说过第二种选择是最好的。

var listOne = await _tableOne.Select( x => new
{
   Name = x.Name,
   ID = x.ID,
}).ToArrayAsync();

var listTwo = await _tableTwo.Select( x => new
{
   Name = x.Name,
   Leader = x.Leader,
   Score = x.Score
}).ToArrayAsync();

var joinedLists = listOne.GroupJoin(listTwo,
                                    item => item.Name,
                                    details => details.Name,
                                    (item, details) => new
                                    {
                                        ItemName = item.Name,
                                        item.ID,
                                        LeaderName = details.DefaultIfEmpty().FirstOrDefault()?.Leader,
                                        Score = details.DefaultIfEmpty().FirstOrDefault()?.Score
                                    })
                          .OrderByDescending(x => x.ID)
                          .ToArray();

第二种选择

正如我之前提到的,我认为第二种选择更好,因为我们只从数据库中获取一个已经完成了连接的单一列表。所以我们节省了内存和少了一些处理。

var joinedLists = _tableOne.GroupJoin(_tableTwo,
                                      item => item.Name,
                                      details => details.Name,
                                      (item, details) => new
                                      {
                                          ItemName = item.Name,
                                          ID = item.ID,
                                          LeaderName = details.DefaultIfEmpty().FirstOrDefault()?.Leader,
                                          Score = details.DefaultIfEmpty().FirstOrDefault()?.Score
                                      })
                            .OrderByDescending(x => x.ID)
                            .ToArray();

顺便说一下,我更改了你的变量名。按照惯例(Microsoft),不要对方法范围的变量使用下划线。

继续编码吧!

英文:

What you are looking for is a LEFT JOIN, get all the left source data and join it to other source data regardless if they match or not. If there is no join link (as you said), the RIGHT source data fields will be NULL.

The left join for C# is the method GroupJoin and I'll show you two ways to do it.

First option

The first one is following what you already did, but the problem with this approach is that you fetch all the data from tableOne into memory, store this data in an array, then get all the data from tableTwo and save it in memory as well and the third array is the join of this data. As you can see, we have 2 collections that can have thousands of data and are not used for anything. So I already say that the second option is the best.

var listOne = await _tableOne.Select( x => new
{
   Name = x.Name,
   ID = x.ID,
}).ToArrayAsync();

var listTwo = await _tableTwo.Select( x => new
{
   Name = x.Name,
   Leader = x.Leader,
   Score = x.Score
}).ToArrayAsync();


var joinedLists = listOne.GroupJoin(listTwo,
                                    item => item.Name,
                                    details => details.Name,
                                    (item, details) => new
                                    {
                                        ItemName = item.Name,
                                        item.ID,
                                        LeaderName = details.DefaultIfEmpty().FirstOrDefault()?.Leader,
                                        Score = details.DefaultIfEmpty().FirstOrDefault()?.Score
                                    })
                          .OrderByDescending(x => x.ID)
                          .ToArray();

Second option

As I mentioned before, I think this second option is better, since we fetch from the database only a single list with the join already done. So we save memory and less processing.

var joinedLists = _tableOne.GroupJoin(_tableTwo,
                                      item => item.Name,
                                      details => details.Name,
                                      (item, details) => new
                                      {
                                          ItemName = item.Name,
                                          ID = item.ID,
                                          LeaderName = details.DefaultIfEmpty().FirstOrDefault()?.Leader,
                                          Score = details.DefaultIfEmpty().FirstOrDefault()?.Score
                                      })
                            .OrderByDescending(x => x.ID)
                            .ToArray();

By the way, I changed the name of your variables. By convention (Microsoft), do not use underscore for method scope variables.

Keep coding!

huangapple
  • 本文由 发表于 2023年8月4日 05:31:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/76831715.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定