2023年3月4日 04:02:47go评论96阅读模式

英文:

Create a list of single-entry dictionaries where each group by a given column contributes a value from a 2nd column for all but 1st row which is key

问题

我有一个看起来像这样的pandas数据框：

header1	header2
First	row1
Second	row2
Third	row1
Fourth	row2
Fifth	row1

我想创建一个字典列表，在其中，对于所有具有与header2列中匹配值的行（除了第一行），将使用第一行的header1列值作为唯一的字典键，将其他行的header1列值作为唯一的字典值。

期望输出：

[{"First": "Third}, {"Second": "Fourth"}, {"First": "Fifth"}]

或者甚至

{"First": "Third", "Second": "Fourth"}（这个输出不处理header2中的多个匹配）

理想情况下，解决方案不会计算密集，因为我已经能够使用嵌套的for循环来完成这个任务。

根据评论中提到的内容进行编辑：在header2中有多个匹配值的情况下，假定第一次出现的将是键，重复的将是值。例如：[{"First": "Third}, {"Second": "Fourth"}, {"First": "Fifth"}]。换句话说，第一个匹配行中的header1值将重复作为键，并且将在结果列表中为每个后续匹配行添加一个单一项的字典。

谢谢。

英文:

I have a pandas dataframe that looks like this:

header1	header2
First	row1
Second	row2
Third	row1
Fourth	row2
Fifth	row1

I want to create a list of dictionaries where, for all rows with matching value in the header2 column (except the first such row), a dictionary is added to the list using the first row's header1 column value as the lone dict key, and every other row's header1 column value as the lone dict value.

Expected output:

[{"First":"Third},{"Second":"Fourth"}, {"First":"Fifth"}]

or even

{"First":"Third","Second":"Fourth"} (This output doesn't handle multiple matches in header2)

Ideally the solution isn't going to be computationally intensive as I am able to accomplish this with nested for loops already.

Edit based on something brought up in comments: In case of multiple values in the first column with matching header2, assume first occurrence will be the key and duplicate with the value. For example: [{"First":"Third},{"Second":"Fourth"}, {"First":"Fifth"}]. In other words, the header1 value in the first matching row will be repeating key, with one single-entry dict added to the result list for each subsequent matching row.

Thank you

答案1

得分: 1

以下是您要翻译的代码部分：

out = []
df.groupby('header2')['header1'].apply(lambda x: out.extend([{x.iloc[0]:x.iloc[i]} for i in range(1, len(x))]) if len(x) > 1 else None)
idxByHeader1 = df.reset_index(drop=False).set_index('header1')['index']
out = sorted(out, key=lambda x: idxByHeader1[list(x.values())[0]])

out = []
df.assign(dup=df.apply(tuple, axis=1)).groupby('header2')['dup'].apply(
    lambda x: out.extend([{x.iloc[0][0]:x.iloc[i]} 
    for i in range(1, len(x))]) if len(x) > 1 else None)
idx = df.reset_index(drop=False).set_index(['header1','header2'])['index']
out = sorted(out, key=lambda x: idx[list(x.values())[0]])
out = [{key:val[0]} for item in out for key, val in item.items()]
print(out)

希望这些信息对您有所帮助。如果您需要进一步的协助，请随时告诉我。

英文:

Here's a way to do what your question asks:

out = []
df.groupby(&#39;header2&#39;)[&#39;header1&#39;].apply(lambda x: out.extend([{x.iloc[0]:x.iloc[i]} for i in range(1, len(x))]) if len(x) &gt; 1 else None)
idxByHeader1 = df.reset_index(drop=False).set_index(&#39;header1&#39;)[&#39;index&#39;]
out = sorted(out, key=lambda x: idxByHeader1[list(x.values())[0]])

Output:

[{&#39;First&#39;: &#39;Third&#39;}, {&#39;Second&#39;: &#39;Fourth&#39;}, {&#39;First&#39;: &#39;Fifth&#39;}]

UPDATE:

Here is a slightly more robust answer. Assuming values in the header1 column can be duplicated across different header2 values, this updated answer will ensure that the dictionaries in the result list preserve the order found in the original dataframe.

out = []
df.assign(dup=df.apply(tuple, axis=1)).groupby(&#39;header2&#39;)[&#39;dup&#39;].apply(
    lambda x: out.extend([{x.iloc[0][0]:x.iloc[i]} 
    for i in range(1, len(x))]) if len(x) &gt; 1 else None)
idx = df.reset_index(drop=False).set_index([&#39;header1&#39;,&#39;header2&#39;])[&#39;index&#39;]
out = sorted(out, key=lambda x: idx[list(x.values())[0]])
out = [{key:val[0]} for item in out for key, val in item.items()]
print(out)

Sample Input: (note the duplication of Fifth, for key Second and again for key First):

  header1 header2
0   First    row1
1  Second    row2
2   Third    row1
3   Fifth    row2
4   Fifth    row1

Output: (note that for the two dicts with Fifth as value, the dict with Second as key appears before the dict with First as key, which is identical to the sequencing in the original dataframe, since the first Fifth encountered had header2 value matching Second):

[{&#39;First&#39;: &#39;Third&#39;}, {&#39;Second&#39;: &#39;Fifth&#39;}, {&#39;First&#39;: &#39;Fifth&#39;}]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Create a list of single-entry dictionaries where each group by a given column contributes a value from a 2nd column for all but 1st row which is key

问题

答案1

Sure, here’s the translation for “Need help in Image.save()”: 需要帮助 Image.save()

如何将stderr导向我的StreamWriter对象？

如何在HTML代码中链接到Python文件

如何使用VSCode调试Python命令行二进制文件（特指Poetry）。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。