CSV数据清洗使用Python/Pandas

huangapple go评论137阅读模式
英文:

CSV Data Cleaning with Python/Pandas

问题

以下是您提供的代码部分的翻译:

  1. import pandas as pd
  2. file_path = r"C:\Users\abcd\OneDrive\Documents"
  3. messy_data_file_name = 'messy_data.csv'
  4. cleaned_data_file_name = 'cleaned_Data.csv'
  5. messy_df1 = pd.read_csv(file_path + '\\' + messy_data_file_name, skiprows=0)
  6. messy_df1.to_dict()
  7. {
  8. 'Company Name ': {0: 'Report Month', 1: 'Report Year', 2: nan, 3: nan, 4: nan, 5: nan, 6: nan, 7: nan, 8: nan, 9: nan},
  9. 'Toyota': {0: 'Jan ', 1: '2023', 2: nan, 3: nan, 4: nan, 5: nan, 6: nan, 7: nan, 8: nan, 9: nan},
  10. 'Unnamed: 2': {0: nan, 1: nan, 2: nan, 3: nan, 4: 'Total Inventory Cost', 5: 'Sold Inventory Cost', 6: 'Total Profit Incurred', 7: 'Total Manpower Expense',8: 'Total Infra Expense',9: 'Total Transaction Expense'},
  11. 'Unnamed: 3': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan, 6: nan, 7: nan, 8: nan, 9: nan}, 'Unnamed: 4': {0: nan, 1: nan, 2: 'Sales in', 3: 'Ohio Showroom', 4: '344469', 5: '300690', 6: '43779', 7: '15000', 8: '500',9: '110'},
  12. 'Unnamed: 5': {0: nan, 1: nan, 2: 'Sales in', 3: 'Wincosin Showroom ', 4: '11261', 5: '9050', 6: '2211', 7: '1000', 8: '200', 9: '55'}, 'Unnamed: 6': {0: nan, 1: nan, 2: 'Service in', 3: 'Ohio Showroom', 4: '659923', 5: '612231', 6: '47692', 7: '12000', 8: '400', 9: '110'},
  13. 'Unnamed: 7': {0: nan, 1: nan, 2: 'Service in', 3: 'Wincosin Showroom ', 4: '15656', 5: '12812', 6: '2844', 7: '1200', 8: '250', 9: '45'}
  14. }
  15. ###################################################
  16. cleaned_df1 = pd.read_csv(file_path + '\\' + cleaned_data_file_name, skiprows=0)
  17. cleaned_df1.to_dict()
  18. {
  19. 'Company Name': {0: 'Toyota ', 1: 'Toyota ', 2: 'Toyota ', 3: 'Toyota ', 4: 'Toyota ', 5: 'Toyota '},
  20. 'Report Month': {0: 'January', 1: 'January', 2: 'January', 3: 'January', 4: 'January', 5: 'January'},
  21. 'Report Year ': {0: 2023, 1: 2023, 2: 2023, 3: 2023, 4: 2023, 5: 2023},
  22. 'Parameter': {0: 'Total Inventory Cost', 1: 'Sold Inventory Cost', 2: 'Total Profit Incurred', 3: 'Total Manpower Expense', 4: 'Total Infra Expense', 5: 'Total Transaction Expense'},
  23. 'Sales in Ohio Showroom': {0: 344469, 1: 300690, 2: 43779, 3: 15000, 4: 500, 5: 110},
  24. 'Sales in Wincosin Showroom ': {0: 11261, 1: 9050, 2: 2211, 3: 1000, 4: 200, 5: 55},
  25. 'Service in Ohio Showroom': {0: 659923, 1: 612231, 2: 47692, 3: 12000, 4: 400, 5: 110},
  26. 'Service in Wincosin Showroom ': {0: 15656, 1: 12812, 2: 2844, 3: 1200, 4: 250, 5: 45}
  27. }

希望这些翻译对您有帮助。如果您需要进一步的帮助,请随时提问。

英文:

Need help with the below scenario:

Excel data in csv source file is like:

CSV数据清洗使用Python/Pandas

I want to clean and rearrange it so that it looks like:

CSV数据清洗使用Python/Pandas

Adding the messy and cleaned dicts as requested

  1. import pandas as pd
  2. file_path = r"C:\Users\abcd\OneDrive\Documents"
  3. messy_data_file_name = 'messy_data.csv'
  4. cleaned_data_file_name = 'cleaned_Data.csv'
  5. messy_df1 = pd.read_csv(file_path + '\\' + messy_data_file_name, skiprows=0)
  6. messy_df1.to_dict()
  7. {
  8. 'Company Name ': {0: 'Report Month', 1: 'Report Year', 2: nan, 3: nan, 4: nan, 5: nan, 6: nan, 7: nan, 8: nan, 9: nan}
  9. 'Toyota': {0: 'Jan ', 1: '2023', 2: nan, 3: nan, 4: nan, 5: nan, 6: nan, 7: nan, 8: nan, 9: nan},
  10. 'Unnamed: 2': {0: nan, 1: nan, 2: nan, 3: nan, 4: 'Total Inventory Cost', 5: 'Sold Inventory Cost', 6: 'Total Profit Incurred', 7: 'Total Manpower Expense',8: 'Total Infra Expense',9: 'Total Transaction Expense'},
  11. 'Unnamed: 3': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan, 6: nan, 7: nan, 8: nan, 9: nan}, 'Unnamed: 4': {0: nan, 1: nan, 2: 'Sales in', 3: 'Ohio Showroom', 4: '344469', 5: '300690', 6: '43779', 7: '15000', 8: '500',9: '110'},
  12. 'Unnamed: 5': {0: nan, 1: nan, 2: 'Sales in', 3: 'Wincosin Showroom ', 4: '11261', 5: '9050', 6: '2211', 7: '1000', 8: '200', 9: '55'}, 'Unnamed: 6': {0: nan, 1: nan, 2: 'Service in', 3: 'Ohio Showroom', 4: '659923', 5: '612231', 6: '47692', 7: '12000', 8: '400', 9: '110'},
  13. 'Unnamed: 7': {0: nan, 1: nan, 2: 'Service in', 3: 'Wincosin Showroom ', 4: '15656', 5: '12812', 6: '2844', 7: '1200', 8: '250', 9: '45'}
  14. }
  15. ###################################################
  16. cleaned_df1 = pd.read_csv(file_path + '\\' + cleaned_data_file_name, skiprows=0)
  17. cleaned_df1.to_dict()
  18. {
  19. 'Company Name': {0: 'Toyota ', 1: 'Toyota ', 2: 'Toyota ', 3: 'Toyota ', 4: 'Toyota ', 5: 'Toyota '},
  20. 'Report Month': {0: 'January', 1: 'January', 2: 'January', 3: 'January', 4: 'January', 5: 'January'},
  21. 'Report Year ': {0: 2023, 1: 2023, 2: 2023, 3: 2023, 4: 2023, 5: 2023},
  22. 'Parameter': {0: 'Total Inventory Cost', 1: 'Sold Inventory Cost', 2: 'Total Profit Incurred', 3: 'Total Manpower Expense', 4: 'Total Infra Expense', 5: 'Total Transaction Expense'},
  23. 'Sales in Ohio Showroom': {0: 344469, 1: 300690, 2: 43779, 3: 15000, 4: 500, 5: 110},
  24. 'Sales in Wincosin Showroom': {0: 11261, 1: 9050, 2: 2211, 3: 1000, 4: 200, 5: 55},
  25. 'Service in Ohio Showroom': {0: 659923, 1: 612231, 2: 47692, 3: 12000, 4: 400, 5: 110},
  26. 'Service in Wincosin Showroom ': {0: 15656, 1: 12812, 2: 2844, 3: 1200, 4: 250, 5: 45}
  27. }

答案1

得分: 1

你可以这样做
从你的CSV文件中读取数据

  1. df=pd.read_csv(r"C:\Users\Aparna\Downloads\messydata.csv",skiprows=5,header=None)
  2. # 更改文件的位置路径
  3. df1=df.dropna(axis=1)
  4. # 由于你的列名没有正确对齐,尝试使用以下方式创建新的列名
  5. df1.columns = ["Parameter", "Sales in Ohio Showroom", "Sales in Wisconsin Showroom", "Service in Ohio Showroom","Service in Wisconsin Showroom"]
  6. df1["Company"]="Toyota"
  7. df1["Report Month"]="January"
  8. df1["Report Year"]=2023
  9. df1 = df1.reindex(columns=["Company","Report Month","Report Year","Parameter", "Sales in Ohio Showroom", "Sales in Wisconsin Showroom", "Service in Ohio Showroom","Service in Wisconsin Showroom"])
  10. # 最后将数据框导出到CSV文件
  11. df1.to_csv(r"C:\Users\abcd\OneDrive\Documents\file_path.csv")

输出:

CSV数据清洗使用Python/Pandas

英文:

you can do like this
read data from your csv file

  1. df=pd.read_csv(r"C:\Users\Aparna\Downloads\messydata.csv",skiprows=5,header=None)
  2. # change location path of your file
  3. df1=df.dropna(axis=1)
  4. # since your column names are not alligned correctly try to make new column names like this
  5. df1.columns = ["Parameter", "Sales in Ohio Showroom", "Sales in Wincosin Showroom", "Service in Ohio Showroom","Service in Wincosin Showroom"]
  6. df1["Company"]="Toyota"
  7. df1["Report Month"]="January"
  8. df1["Report Year"]=2023
  9. df1 = df1.reindex(columns=["Company","Report Month","Report Year","Parameter", "Sales in Ohio Showroom", "Sales in Wincosin Showroom", "Service in Ohio Showroom","Service in Wincosin Showroom"])
  10. # finally export dataframe in to csv file
  11. df1.to_csv(r"C:\Users\abcd\OneDrive\Documents\file_path.csv")
  12. output:[![enter image description here][1]][1]
  13. [1]: https://i.stack.imgur.com/1hM5P.png
  14. </details>

huangapple
  • 本文由 发表于 2023年8月4日 03:40:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/76831180.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定