2023年6月16日 02:07:32go评论70阅读模式

英文:

Analyzing restaurant data using python. Need help merging two datasets on both check # and date

问题

我正在进行差异分析，研究了一段时间之前餐厅菜单格式的变化对业务的影响。我们改变了用于小型聚会的菜单，而用于大型聚会的菜单在整个分析期间保持不变。我基本上是将大型聚会用作比较的基准，以查看小型聚会菜单的变化是否降低了我们的人均收入。

我已经创建了每个时间段和不同菜单的数据框如下：

dfLargePre = 菜单变更前的大型聚会
dfSmallPre = 菜单变更前的小型聚会
dfLargePost = 菜单变更后的大型聚会
dfSmallPost = 菜单变更后的小型聚会

这些数据框具有"日期"、"账单号"和"总支付"列。

然而，我只想分析食品销售，不包括酒水销售（这在每个账单中占了很大比例）。由于我们的销售报告系统的工作方式，我不得不创建一个名为dfFood的新数据框，它具有"日期"、"账单号"和"总食品"列。这个数据框非常庞大，包含比我需要分析的数据更多的账单，但我需要"总食品"列在每个不同聚会大小和时间段的数据框中。

"账单号"在任何给定日期内是唯一的，但在分析的多个月时间范围内，有多个重复的账单号。因此，我需要基于"日期"和"账单号"进行合并。

有人知道我应该如何最好地完成这些合并吗？

df_merged = pd.merge(dfLargePre, dfFood, how='left', on=['Check_number', 'Date'])

我尝试了上面的代码行，但收到一个错误，说我需要使用pd.concat，但我不确定这是否是最佳行动方案，也不熟悉pd.concat的编码程序。

英文:

I am running a diff in diff analysis for my restaurant that changed the format of their menus a while back. The menus we changed are only used for smaller parties and the menus we use for larger parties have remained the same for the entire period of analysis. I am essentially using the larger parties as a point of comparison to see if the change in the smaller party menu has reduced our per person revenue.

I have made a df for each of the time frames and distinct menus used as follows:

dfLargePre = Large party in the period before menus were changed
dfSmallPre = Small party in the period before menus were changed
dfLargePost = Large party after menus were changed
dfSmallPost = Small party after menus were changed

These dfs have columns for 'Date', 'Check_number', and 'Total_payment'

However, I only want to analyze the food sales and no alcohol (which makes up a great deal of each check). Because of how our sales reporting system works, I had to create a new df called dfFood that has columns 'Date', 'Check_number', and 'Total_food'. This data frame is huge and has way more checks than the ones I need to analyze but I need the 'Total_food' column to be in each of the party size and time distinguished data frames.

'Check_number' is unique within any given day but there are several repeat check numbers within the many months time frame being analyzed. Therefore, I need to merge based on 'Date' and 'Check_number'.

Does anyone know how I can best complete these merges?

df_merged = pd.merge(dfLargePre, dfFood, how=&#39;left&#39;, on=[&#39;Check_Number&#39;, &#39;Date&#39;])

I tried the above line and I got an error saying I need to use pd.concat but I am unsure if that is the best plan of action and I am not familiar with the pd.concat coding procedures.

############ EDIT ############# (Adding traceback for attempted merge)

Traceback Screenshot

答案1

得分: 1

你正在合并一对数据框架。
每个都有Check_Number和Date列。

现在看看每个数据框架的.dtype。
日期应该是datetime64[ns]类型。
可能Check_Number应该是整数，
或者如果你将其建模为字符串，则可能是"object"类型。

两个数据框架中的Check_Number类型必须相同。
两个数据框架中的Date类型必须相同。
如果它们不匹配，你的.merge() 将无法成功。

英文:

You are merging a pair of dataframes.
Each has Check_Number and Date columns.

Now look at the .dtype of each dataframe.
The Date should be of datetime64[ns].
Presumably Check_Number should be an integer,
or perhaps "object" if you're modeling that as a string.

The Check_Number type in both dataframes must be identical.
The Date type in both dataframes must be identical.
If they don't match, your .merge() won't succeed.

答案2

得分: 0

谢谢大家。我找到了错误。我在加载其中一个csv文件时没有使用parse_dates。我修复了csv上传，原始的代码行就可以完美运行了。

英文:

Thank you all. I found the error. I had not used parse_dates when loading one of my csv files. I fixed the csv upload and the original line of code worked perfectly.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Analyzing restaurant data using python. Need help merging two datasets on both check # and date

问题

答案1

答案2

Pyinstaller error Failed to execute script 'pyiboot01_bootstrap' due to unhandled exception! when I tried to make an app from .py file

Echo State Network – AttributeError: ‘NoneType’ object has no attribute ‘lower’

生成KivyMD应用程序的.exe文件而不出现控制台窗口的方法？

Crispy forms label inside of field

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论