2023年8月4日 03:40:17go评论137阅读模式

英文:

CSV Data Cleaning with Python/Pandas

问题

以下是您提供的代码部分的翻译：

import pandas as pd
file_path = r"C:\Users\abcd\OneDrive\Documents"
messy_data_file_name = 'messy_data.csv'
cleaned_data_file_name = 'cleaned_Data.csv'
messy_df1 = pd.read_csv(file_path + '\\' + messy_data_file_name, skiprows=0)
messy_df1.to_dict()
{
 'Company Name ': {0: 'Report Month',  1: 'Report Year',  2: nan,  3: nan,  4: nan,  5: nan,  6: nan,  7: nan,  8: nan,  9: nan},
 'Toyota': {0: 'Jan ',	1: '2023',  2: nan,  3: nan,  4: nan,  5: nan,  6: nan,  7: nan,  8: nan,  9: nan},
 'Unnamed: 2': {0: nan,  1: nan,  2: nan,  3: nan,  4: 'Total Inventory Cost',  5: 'Sold Inventory Cost',  6: 'Total Profit Incurred',  7: 'Total Manpower Expense',8: 'Total Infra Expense',9: 'Total Transaction Expense'},
 'Unnamed: 3': {0: nan,  1: nan,  2: nan,  3: nan,  4: nan,  5: nan,  6: nan,  7: nan,  8: nan,  9: nan}, 'Unnamed: 4': {0: nan,  1: nan,  2: 'Sales in',  3: 'Ohio Showroom',  4: '344469',  5: '300690',  6: '43779',  7: '15000',  8: '500',9: '110'},
 'Unnamed: 5': {0: nan,  1: nan,  2: 'Sales in',  3: 'Wincosin Showroom ',  4: '11261',  5: '9050',  6: '2211',  7: '1000',  8: '200',  9: '55'}, 'Unnamed: 6': {0: nan,  1: nan,  2: 'Service in',  3: 'Ohio Showroom',  4: '659923',  5: '612231',  6: '47692',  7: '12000',  8: '400',  9: '110'},
  'Unnamed: 7': {0: nan,  1: nan,  2: 'Service in',  3: 'Wincosin Showroom ',  4: '15656',  5: '12812',  6: '2844',  7: '1200',  8: '250',  9: '45'}
}
###################################################
cleaned_df1 = pd.read_csv(file_path + '\\' + cleaned_data_file_name, skiprows=0)
cleaned_df1.to_dict()
{
 'Company Name': {0: 'Toyota ',  1: 'Toyota ',  2: 'Toyota ',  3: 'Toyota ',  4: 'Toyota ',  5: 'Toyota '},
 'Report Month': {0: 'January',  1: 'January',  2: 'January',  3: 'January',  4: 'January',  5: 'January'},
 'Report Year ': {0: 2023, 1: 2023, 2: 2023, 3: 2023, 4: 2023, 5: 2023},
 'Parameter': {0: 'Total Inventory Cost',  1: 'Sold Inventory Cost',  2: 'Total Profit Incurred',  3: 'Total Manpower Expense',  4: 'Total Infra Expense',  5: 'Total Transaction Expense'},
 'Sales in Ohio Showroom': {0: 344469,  1: 300690,  2: 43779,  3: 15000,  4: 500,  5: 110}, 
 'Sales in Wincosin Showroom ': {0: 11261,  1: 9050,  2: 2211,  3: 1000,  4: 200,  5: 55},
 'Service in Ohio Showroom': {0: 659923,  1: 612231,  2: 47692,  3: 12000,  4: 400,  5: 110},
 'Service in Wincosin Showroom ': {0: 15656,  1: 12812,  2: 2844,  3: 1200,  4: 250,  5: 45}
 }

希望这些翻译对您有帮助。如果您需要进一步的帮助，请随时提问。

英文:

Need help with the below scenario:

Excel data in csv source file is like:

CSV数据清洗使用Python/Pandas

I want to clean and rearrange it so that it looks like:

CSV数据清洗使用Python/Pandas

Adding the messy and cleaned dicts as requested

import pandas as pd
file_path = r&quot;C:\Users\abcd\OneDrive\Documents&quot;
messy_data_file_name = &#39;messy_data.csv&#39;
cleaned_data_file_name = &#39;cleaned_Data.csv&#39;
messy_df1 = pd.read_csv(file_path + &#39;\\&#39; + messy_data_file_name, skiprows=0)
messy_df1.to_dict()
{
&#39;Company Name &#39;: {0: &#39;Report Month&#39;,  1: &#39;Report Year&#39;,  2: nan,  3: nan,  4: nan,  5: nan,  6: nan,  7: nan,  8: nan,  9: nan}
&#39;Toyota&#39;: {0: &#39;Jan &#39;,	1: &#39;2023&#39;,  2: nan,  3: nan,  4: nan,  5: nan,  6: nan,  7: nan,  8: nan,  9: nan},
&#39;Unnamed: 2&#39;: {0: nan,  1: nan,  2: nan,  3: nan,  4: &#39;Total Inventory Cost&#39;,  5: &#39;Sold Inventory Cost&#39;,  6: &#39;Total Profit Incurred&#39;,  7: &#39;Total Manpower Expense&#39;,8: &#39;Total Infra Expense&#39;,9: &#39;Total Transaction Expense&#39;},
&#39;Unnamed: 3&#39;: {0: nan,  1: nan,  2: nan,  3: nan,  4: nan,  5: nan,  6: nan,  7: nan,  8: nan,  9: nan}, &#39;Unnamed: 4&#39;: {0: nan,  1: nan,  2: &#39;Sales in&#39;,  3: &#39;Ohio Showroom&#39;,  4: &#39;344469&#39;,  5: &#39;300690&#39;,  6: &#39;43779&#39;,  7: &#39;15000&#39;,  8: &#39;500&#39;,9: &#39;110&#39;},
&#39;Unnamed: 5&#39;: {0: nan,  1: nan,  2: &#39;Sales in&#39;,  3: &#39;Wincosin Showroom &#39;,  4: &#39;11261&#39;,  5: &#39;9050&#39;,  6: &#39;2211&#39;,  7: &#39;1000&#39;,  8: &#39;200&#39;,  9: &#39;55&#39;}, &#39;Unnamed: 6&#39;: {0: nan,  1: nan,  2: &#39;Service in&#39;,  3: &#39;Ohio Showroom&#39;,  4: &#39;659923&#39;,  5: &#39;612231&#39;,  6: &#39;47692&#39;,  7: &#39;12000&#39;,  8: &#39;400&#39;,  9: &#39;110&#39;},
&#39;Unnamed: 7&#39;: {0: nan,  1: nan,  2: &#39;Service in&#39;,  3: &#39;Wincosin Showroom &#39;,  4: &#39;15656&#39;,  5: &#39;12812&#39;,  6: &#39;2844&#39;,  7: &#39;1200&#39;,  8: &#39;250&#39;,  9: &#39;45&#39;}
}
###################################################
cleaned_df1 = pd.read_csv(file_path + &#39;\\&#39; + cleaned_data_file_name, skiprows=0)
cleaned_df1.to_dict()
{
&#39;Company Name&#39;: {0: &#39;Toyota &#39;,  1: &#39;Toyota &#39;,  2: &#39;Toyota &#39;,  3: &#39;Toyota &#39;,  4: &#39;Toyota &#39;,  5: &#39;Toyota &#39;},
&#39;Report Month&#39;: {0: &#39;January&#39;,  1: &#39;January&#39;,  2: &#39;January&#39;,  3: &#39;January&#39;,  4: &#39;January&#39;,  5: &#39;January&#39;},
&#39;Report Year &#39;: {0: 2023, 1: 2023, 2: 2023, 3: 2023, 4: 2023, 5: 2023},
&#39;Parameter&#39;: {0: &#39;Total Inventory Cost&#39;,  1: &#39;Sold Inventory Cost&#39;,  2: &#39;Total Profit Incurred&#39;,  3: &#39;Total Manpower Expense&#39;,  4: &#39;Total Infra Expense&#39;,  5: &#39;Total Transaction Expense&#39;},
&#39;Sales in Ohio Showroom&#39;: {0: 344469,  1: 300690,  2: 43779,  3: 15000,  4: 500,  5: 110}, 
&#39;Sales in Wincosin Showroom&#39;: {0: 11261,  1: 9050,  2: 2211,  3: 1000,  4: 200,  5: 55},
&#39;Service in Ohio Showroom&#39;: {0: 659923,  1: 612231,  2: 47692,  3: 12000,  4: 400,  5: 110},
&#39;Service in Wincosin Showroom &#39;: {0: 15656,  1: 12812,  2: 2844,  3: 1200,  4: 250,  5: 45}
}

答案1

得分: 1

你可以这样做
从你的CSV文件中读取数据

df=pd.read_csv(r"C:\Users\Aparna\Downloads\messydata.csv",skiprows=5,header=None) 
# 更改文件的位置路径
df1=df.dropna(axis=1)
# 由于你的列名没有正确对齐，尝试使用以下方式创建新的列名
df1.columns = ["Parameter", "Sales in Ohio Showroom", "Sales in Wisconsin Showroom", "Service in Ohio Showroom","Service in Wisconsin Showroom"]
df1["Company"]="Toyota"
df1["Report Month"]="January"
df1["Report Year"]=2023
df1 = df1.reindex(columns=["Company","Report Month","Report Year","Parameter", "Sales in Ohio Showroom", "Sales in Wisconsin Showroom", "Service in Ohio Showroom","Service in Wisconsin Showroom"])
# 最后将数据框导出到CSV文件
df1.to_csv(r"C:\Users\abcd\OneDrive\Documents\file_path.csv")

输出:

英文:

you can do like this
read data from your csv file

df=pd.read_csv(r&quot;C:\Users\Aparna\Downloads\messydata.csv&quot;,skiprows=5,header=None) 
# change location path of your file 
df1=df.dropna(axis=1)
# since your column names are not alligned correctly try to make new column names like this
df1.columns = [&quot;Parameter&quot;, &quot;Sales in Ohio Showroom&quot;, &quot;Sales in Wincosin Showroom&quot;, &quot;Service in Ohio Showroom&quot;,&quot;Service in Wincosin Showroom&quot;]
df1[&quot;Company&quot;]=&quot;Toyota&quot;
df1[&quot;Report Month&quot;]=&quot;January&quot;
df1[&quot;Report Year&quot;]=2023
df1 = df1.reindex(columns=[&quot;Company&quot;,&quot;Report Month&quot;,&quot;Report Year&quot;,&quot;Parameter&quot;, &quot;Sales in Ohio Showroom&quot;, &quot;Sales in Wincosin Showroom&quot;, &quot;Service in Ohio Showroom&quot;,&quot;Service in Wincosin Showroom&quot;])
# finally export dataframe in to csv file
df1.to_csv(r&quot;C:\Users\abcd\OneDrive\Documents\file_path.csv&quot;)
output:[![enter image description here][1]][1]
[1]: https://i.stack.imgur.com/1hM5P.png
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

CSV数据清洗使用Python/Pandas

问题

答案1

numpy.in1d() 在我的示例中为什么比我的对象运行得更快？

Pandas根据两列中的分隔符拆分对应的行，并复制其他所有内容。

按列分组并获取组中行的字典列表。

In PySide6, if a `QMainWindow` has a pushbutton connected to it's own partial function, it could be opened many times, why?

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论