2023年3月15日 20:07:32go评论102阅读模式

英文:

Pandas DataFrame Created from Dictionary vs Created from List

问题

以下是代码的翻译部分：

# 使用列表创建的DataFrame，使其表现得像使用字典创建的DataFrame一样是否有一行或两行代码？
# 从字典创建的DataFrame，这是有效的：
import pandas as pd
data = {'Salary': [30000, 40000, 50000, 85000, 75000],
        'Exp': [1, 3, 5, 10, 25],
        'Gender': ['M', 'F', 'M', 'F', 'M']}
df = pd.DataFrame(data)
print(df), print()
new_df1 = df[df['Salary'] >= 50000]
print(new_df1), print()
new_df2 = df.sort_values(['Exp'], axis=0, ascending=[False])
print(new_df2)
# 这不适用于使用df函数、排序和条件：
data = [['Salary', 'Exp', 'Gender'], [30000, 1, 'M'],
        [40000, 3, 'F'], [50000, 5, 'M'], [85000, 10, 'F'], [75000, 25, 'M']]
df = pd.DataFrame(data)
print(df), print()
new_df1 = df[df['Salary'] >= 50000]  # 不起作用
print(new_df1), print()
new_df2 = df.sort_values(['Exp'], axis=0, ascending=[False])  # 同样不起作用
print(new_df2)

请注意，这是代码的翻译，不包括任何其他内容。

英文:

Is there a line or two of code that would make the DataFrame created from lists behave like the one created from a dictionary?

#DataFrame created from dictionary, this works:
import pandas as pd
data= {&#39;Salary&#39;: [30000, 40000, 50000, 85000, 75000],            
        &#39;Exp&#39;: [1, 3, 5, 10, 25],          
        &#39;Gender&#39;: [&#39;M&#39;,&#39;F&#39;, &#39;M&#39;, &#39;F&#39;, &#39;M&#39;]} 
df = pd.DataFrame(data)
print(df), print()
new_df1 = df[df[&#39;Salary&#39;] &gt;= 50000]
print(new_df1), print()
new_df2 = df.sort_values([&#39;Exp&#39;], axis = 0, ascending=[False])
print(new_df2)
#This doesn&#39;t work with the df.functions, sort and conditionals    
data = [[&#39;Salary&#39;, &#39;Exp&#39;, &#39;Gender&#39;],[30000, 1, &#39;M&#39;],
        [40000, 3, &#39;F&#39;], [50000, 5, &#39;M&#39;], [85000, 10, &#39;F&#39;], [75000, 25, &#39;M&#39;]]
df = pd.DataFrame(data)
print(df), print()
new_df1 = df[df[&#39;Salary&#39;] &gt;= 50000]  #doesn&#39;t work
print(new_df1), print()
new_df2 = df.sort_values([&#39;Exp&#39;], axis = 0, ascending=[False])  #ditto
print(new_df2)

答案1

得分: 1

在你的第二段代码中，你没有将第一个子列表用作列名，而是用作数据。请将第一个子列表作为DataFrame构造函数的columns参数传递：

df = pd.DataFrame(data[1:], columns=data[0])

输出：

   Salary  Exp Gender
0   30000    1      M
1   40000    3      F
2   50000    5      M
3   85000   10      F
4   75000   25      M

为什么你的代码失败了

你的代码错误地将第一个子列表映射为数据：

pd.DataFrame(data)
        0    1       2   # 错误的列名
0  Salary  Exp  Gender   # 这不应该是数据行
1   30000    1       M
2   40000    3       F
3   50000    5       M
4   85000   10       F
5   75000   25       M

完整代码：

df = pd.DataFrame(data[1:], columns=data[0])
print(df), print()
new_df1 = df[df['Salary'] >= 50000]  # 无法工作
print(new_df1), print()
new_df2 = df.sort_values(['Exp'], axis=0, ascending=[False])  # 同样无法工作
print(new_df2)

输出：

   Salary  Exp Gender
0   30000    1      M
1   40000    3      F
2   50000    5      M
3   85000   10      F
4   75000   25      M
   Salary  Exp Gender
2   50000    5      M
3   85000   10      F
4   75000   25      M
   Salary  Exp Gender
4   75000   25      M
3   85000   10      F
2   50000    5      M
1   40000    3      F
0   30000    1      M

英文:

In your second code, you're not using the first sublist as column names but rather data.

Pass instead the first sublist as the columns parameter of your DataFrame constructor:

df = pd.DataFrame(data[1:], columns=data[0])

Output:

   Salary  Exp Gender
0   30000    1      M
1   40000    3      F
2   50000    5      M
3   85000   10      F
4   75000   25      M

why your code failed

You code was incorrectly mapping the first sublist as data:

pd.DataFrame(data)
        0    1       2   # incorrect header
0  Salary  Exp  Gender   # this shouldn&#39;t be a data row
1   30000    1       M
2   40000    3       F
3   50000    5       M
4   85000   10       F
5   75000   25       M

full code:

df = pd.DataFrame(data[1:], columns=data[0])
print(df), print()
new_df1 = df[df[&#39;Salary&#39;] &gt;= 50000]  #doesn&#39;t work
print(new_df1), print()
new_df2 = df.sort_values([&#39;Exp&#39;], axis = 0, ascending=[False])  #ditto
print(new_df2)

Output:

   Salary  Exp Gender
0   30000    1      M
1   40000    3      F
2   50000    5      M
3   85000   10      F
4   75000   25      M
   Salary  Exp Gender
2   50000    5      M
3   85000   10      F
4   75000   25      M
   Salary  Exp Gender
4   75000   25      M
3   85000   10      F
2   50000    5      M
1   40000    3      F
0   30000    1      M

答案2

得分: 1

这里需要通过所有值创建DataFrame，不包括第一行，并传递参数columns：

# 这不适用于df函数、排序和条件
data = [['Salary', 'Exp', 'Gender'], [30000, 1, 'M'],
        [40000, 3, 'F'], [50000, 5, 'M'], [85000, 10, 'F'], [75000, 25, 'M']]
df = pd.DataFrame(data[1:], columns=data[0])
print(df), print()
   Salary  Exp Gender
0   30000    1      M
1   40000    3      F
2   50000    5      M
3   85000   10      F
4   75000   25      M
new_df1 = df[df['Salary'] >= 50000]  # 运行良好
print(new_df1), print()
   Salary  Exp Gender
2   50000    5      M
3   85000   10      F
4   75000   25      M
new_df2 = df.sort_values(['Exp'], axis=0, ascending=[False])  # 同样适用
print(new_df2)
   Salary  Exp Gender
4   75000   25      M
3   85000   10      F
2   50000    5      M
1   40000    3      F
0   30000    1      M

英文:

Here is necessary create DataFrame by all values without first and pass parameter columns:

#This doesn&#39;t work with the df.functions, sort and conditionals    
data = [[&#39;Salary&#39;, &#39;Exp&#39;, &#39;Gender&#39;],[30000, 1, &#39;M&#39;],
        [40000, 3, &#39;F&#39;], [50000, 5, &#39;M&#39;], [85000, 10, &#39;F&#39;], [75000, 25, &#39;M&#39;]]
df = pd.DataFrame(data[1:], columns=data[0])
print(df), print()
   Salary  Exp Gender
0   30000    1      M
1   40000    3      F
2   50000    5      M
3   85000   10      F
4   75000   25      M
new_df1 = df[df[&#39;Salary&#39;] &gt;= 50000]  #working well
print(new_df1), print()
   Salary  Exp Gender
2   50000    5      M
3   85000   10      F
4   75000   25      M
new_df2 = df.sort_values([&#39;Exp&#39;], axis = 0, ascending=[False])  #ditto
print(new_df2)
   Salary  Exp Gender
4   75000   25      M
3   85000   10      F
2   50000    5      M
1   40000    3      F
0   30000    1      M

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用字典创建的Pandas DataFrame与使用列表创建的DataFrame相比。

问题

答案1

为什么你的代码失败了

完整代码：

why your code failed

full code:

答案2

Why if I'm placing a lookbehind constraint on the capturing group, does it ensure compliance but also capture what is prior to the given constraint?

什么是确定一个字符串/集合是否是另一个子集的最佳方法？

合并两个数据框，如果一个字符串列表匹配，则将不匹配的字符串列为NA。

Python写入字节到文件的顺序错误问题

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。