使用字典创建的Pandas DataFrame与使用列表创建的DataFrame相比。

huangapple go评论102阅读模式
英文:

Pandas DataFrame Created from Dictionary vs Created from List

问题

以下是代码的翻译部分:

  1. # 使用列表创建的DataFrame,使其表现得像使用字典创建的DataFrame一样是否有一行或两行代码?
  2. # 从字典创建的DataFrame,这是有效的:
  3. import pandas as pd
  4. data = {'Salary': [30000, 40000, 50000, 85000, 75000],
  5. 'Exp': [1, 3, 5, 10, 25],
  6. 'Gender': ['M', 'F', 'M', 'F', 'M']}
  7. df = pd.DataFrame(data)
  8. print(df), print()
  9. new_df1 = df[df['Salary'] >= 50000]
  10. print(new_df1), print()
  11. new_df2 = df.sort_values(['Exp'], axis=0, ascending=[False])
  12. print(new_df2)
  13. # 这不适用于使用df函数、排序和条件:
  14. data = [['Salary', 'Exp', 'Gender'], [30000, 1, 'M'],
  15. [40000, 3, 'F'], [50000, 5, 'M'], [85000, 10, 'F'], [75000, 25, 'M']]
  16. df = pd.DataFrame(data)
  17. print(df), print()
  18. new_df1 = df[df['Salary'] >= 50000] # 不起作用
  19. print(new_df1), print()
  20. new_df2 = df.sort_values(['Exp'], axis=0, ascending=[False]) # 同样不起作用
  21. print(new_df2)

请注意,这是代码的翻译,不包括任何其他内容。

英文:

Is there a line or two of code that would make the DataFrame created from lists behave like the one created from a dictionary?

  1. #DataFrame created from dictionary, this works:
  2. import pandas as pd
  3. data= {'Salary': [30000, 40000, 50000, 85000, 75000],
  4. 'Exp': [1, 3, 5, 10, 25],
  5. 'Gender': ['M','F', 'M', 'F', 'M']}
  6. df = pd.DataFrame(data)
  7. print(df), print()
  8. new_df1 = df[df['Salary'] >= 50000]
  9. print(new_df1), print()
  10. new_df2 = df.sort_values(['Exp'], axis = 0, ascending=[False])
  11. print(new_df2)
  12. #This doesn't work with the df.functions, sort and conditionals
  13. data = [['Salary', 'Exp', 'Gender'],[30000, 1, 'M'],
  14. [40000, 3, 'F'], [50000, 5, 'M'], [85000, 10, 'F'], [75000, 25, 'M']]
  15. df = pd.DataFrame(data)
  16. print(df), print()
  17. new_df1 = df[df['Salary'] >= 50000] #doesn't work
  18. print(new_df1), print()
  19. new_df2 = df.sort_values(['Exp'], axis = 0, ascending=[False]) #ditto
  20. print(new_df2)

答案1

得分: 1

在你的第二段代码中,你没有将第一个子列表用作列名,而是用作数据。请将第一个子列表作为DataFrame构造函数的columns参数传递:

  1. df = pd.DataFrame(data[1:], columns=data[0])

输出:

  1. Salary Exp Gender
  2. 0 30000 1 M
  3. 1 40000 3 F
  4. 2 50000 5 M
  5. 3 85000 10 F
  6. 4 75000 25 M
为什么你的代码失败了

你的代码错误地将第一个子列表映射为数据:

  1. pd.DataFrame(data)
  2. 0 1 2 # 错误的列名
  3. 0 Salary Exp Gender # 这不应该是数据行
  4. 1 30000 1 M
  5. 2 40000 3 F
  6. 3 50000 5 M
  7. 4 85000 10 F
  8. 5 75000 25 M

完整代码:
  1. df = pd.DataFrame(data[1:], columns=data[0])
  2. print(df), print()
  3. new_df1 = df[df['Salary'] >= 50000] # 无法工作
  4. print(new_df1), print()
  5. new_df2 = df.sort_values(['Exp'], axis=0, ascending=[False]) # 同样无法工作
  6. print(new_df2)

输出:

  1. Salary Exp Gender
  2. 0 30000 1 M
  3. 1 40000 3 F
  4. 2 50000 5 M
  5. 3 85000 10 F
  6. 4 75000 25 M
  7. Salary Exp Gender
  8. 2 50000 5 M
  9. 3 85000 10 F
  10. 4 75000 25 M
  11. Salary Exp Gender
  12. 4 75000 25 M
  13. 3 85000 10 F
  14. 2 50000 5 M
  15. 1 40000 3 F
  16. 0 30000 1 M
英文:

In your second code, you're not using the first sublist as column names but rather data.

Pass instead the first sublist as the columns parameter of your DataFrame constructor:

  1. df = pd.DataFrame(data[1:], columns=data[0])

Output:

  1. Salary Exp Gender
  2. 0 30000 1 M
  3. 1 40000 3 F
  4. 2 50000 5 M
  5. 3 85000 10 F
  6. 4 75000 25 M
why your code failed

You code was incorrectly mapping the first sublist as data:

  1. pd.DataFrame(data)
  2. 0 1 2 # incorrect header
  3. 0 Salary Exp Gender # this shouldn't be a data row
  4. 1 30000 1 M
  5. 2 40000 3 F
  6. 3 50000 5 M
  7. 4 85000 10 F
  8. 5 75000 25 M

full code:
  1. df = pd.DataFrame(data[1:], columns=data[0])
  2. print(df), print()
  3. new_df1 = df[df['Salary'] >= 50000] #doesn't work
  4. print(new_df1), print()
  5. new_df2 = df.sort_values(['Exp'], axis = 0, ascending=[False]) #ditto
  6. print(new_df2)

Output:

  1. Salary Exp Gender
  2. 0 30000 1 M
  3. 1 40000 3 F
  4. 2 50000 5 M
  5. 3 85000 10 F
  6. 4 75000 25 M
  7. Salary Exp Gender
  8. 2 50000 5 M
  9. 3 85000 10 F
  10. 4 75000 25 M
  11. Salary Exp Gender
  12. 4 75000 25 M
  13. 3 85000 10 F
  14. 2 50000 5 M
  15. 1 40000 3 F
  16. 0 30000 1 M

答案2

得分: 1

这里需要通过所有值创建DataFrame,不包括第一行,并传递参数columns

  1. # 这不适用于df函数、排序和条件
  2. data = [['Salary', 'Exp', 'Gender'], [30000, 1, 'M'],
  3. [40000, 3, 'F'], [50000, 5, 'M'], [85000, 10, 'F'], [75000, 25, 'M']]
  4. df = pd.DataFrame(data[1:], columns=data[0])
  5. print(df), print()
  6. Salary Exp Gender
  7. 0 30000 1 M
  8. 1 40000 3 F
  9. 2 50000 5 M
  10. 3 85000 10 F
  11. 4 75000 25 M
  12. new_df1 = df[df['Salary'] >= 50000] # 运行良好
  13. print(new_df1), print()
  14. Salary Exp Gender
  15. 2 50000 5 M
  16. 3 85000 10 F
  17. 4 75000 25 M
  18. new_df2 = df.sort_values(['Exp'], axis=0, ascending=[False]) # 同样适用
  19. print(new_df2)
  20. Salary Exp Gender
  21. 4 75000 25 M
  22. 3 85000 10 F
  23. 2 50000 5 M
  24. 1 40000 3 F
  25. 0 30000 1 M
英文:

Here is necessary create DataFrame by all values without first and pass parameter columns:

  1. #This doesn't work with the df.functions, sort and conditionals
  2. data = [['Salary', 'Exp', 'Gender'],[30000, 1, 'M'],
  3. [40000, 3, 'F'], [50000, 5, 'M'], [85000, 10, 'F'], [75000, 25, 'M']]
  4. df = pd.DataFrame(data[1:], columns=data[0])
  5. print(df), print()
  6. Salary Exp Gender
  7. 0 30000 1 M
  8. 1 40000 3 F
  9. 2 50000 5 M
  10. 3 85000 10 F
  11. 4 75000 25 M
  12. new_df1 = df[df['Salary'] >= 50000] #working well
  13. print(new_df1), print()
  14. Salary Exp Gender
  15. 2 50000 5 M
  16. 3 85000 10 F
  17. 4 75000 25 M
  18. new_df2 = df.sort_values(['Exp'], axis = 0, ascending=[False]) #ditto
  19. print(new_df2)
  20. Salary Exp Gender
  21. 4 75000 25 M
  22. 3 85000 10 F
  23. 2 50000 5 M
  24. 1 40000 3 F
  25. 0 30000 1 M

huangapple
  • 本文由 发表于 2023年3月15日 20:07:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/75744477.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定