2023年7月28日 05:15:45go评论94阅读模式

英文:

Why are there so many duplicate columns when I draw histograms from .csv data using R?

问题

我将.csv文件读入R，以下是我的代码。

ggplot(data = data)+
geom_bar(mapping = aes(x = platform))+
geom_text(aes(x = platform, label =..count..), stat = "count", vjust = -0.5)


为什么有重复的列（例如多个“聊天室”和“提示”等）？
![enter image description here](https://i.stack.imgur.com/KKg6G.png)
我尝试将Excel表中的一个“提示”单元格复制，然后粘贴到所有现有的“提示”单元格中，然后重新加载R中的数据。仍然有重复的列。我想将这些重复项合并到直方图中。有什么想法如何做到这一点？

英文:

I read the .csv file into R and my code is below.

ggplot(data = data)+
  geom_bar(mapping = aes(x = platform))+
  geom_text(aes(x = platform, label =..count..), stat = &quot;count&quot;, vjust = -0.5)

Why are there duplicate columns (e.g. multiple "chat rooms" and "tip", etc.)?

为什么使用R从.csv数据绘制直方图时会有这么多重复的列？

I tried copying one "tip" cell in the excel sheet, and pasting that to all the existing "tip" cells and then reload data in R. Still duplicate columns. I want to combine those duplicates in the histogram. Any ideas how to do so?

答案1

得分: 0

看着字符串及其在刻度下的对齐方式，似乎重复的单词对齐方式不同。例如，请注意左侧的刻度略微偏离 r 的左侧，并且右侧的刻度位于 r 的中心，表明右侧的单词在左侧相比稍微向左居中。

类似地，

等等。

运行以下代码：

data$platform &lt;- trimws(data$platform)

然后重新绘制。

如果这不起作用，_可能_是一些空格被重复使用或相似（尽管图片并不真正表明如此）。如果将“1个或多个空格”安全地缩减为一个空格，那么也可以执行以下操作：

data$platform &lt;- gsub(&quot;[:space:]+&quot;, &quot; &quot;, data$platform)

然后重新绘制。[:space:] 组匹配空格、制表符、换行符、换页符、垂直换页符等等，+ 表示“1个或多个”。" " 是替换值。这相对贪婪，因此如果有两个或更多个连续的空格（这些字符中的任何一个），它们将被替换为单个 " "。（再次强调，我不认为这是问题的根本原因，但这是你可以尝试的更多方法。）

英文:

Looking closely at the strings and their alignment under the ticks, it appears that repeat words are aligned differently. For example,notice how the tick on the left is slightly off-center-left of the r, and the tick on the right is centered on the r, suggesting that the words on the right are centered a little more left than on the left.

Similarly,

etc.

Run this:

data$platform &lt;- trimws(data$platform)

and then plot again.

If that doesn't do it, it might be that some spaces are repeated or similar (though the pictures don't really suggest that). If it's safe to reduce "1 or more blanks" to a single space, then also do this:

data$platform &lt;- gsub(&quot;[:space:]+&quot;, &quot; &quot;, data$platform)

and then plot again. The [:space:] group matches space, tab, newline, form feed, vertical form feed, and perhaps other not-so-obvious characters; the + means "1 or more". The " " is the replacement value. This is relatively greedy, so if there are two or more blank spaces (of any of those characters I just listed) in a row, then they will be replaced with a single " ". (Again, I don't think this is the culprit, but it's more ammo for you to work with.)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么使用R从.csv数据绘制直方图时会有这么多重复的列？

问题

答案1

如何将这个ASCII文本文件转换为可用的数据格式？

计算R中列表的各元素的特定向量的平均值，并转换为data.frame。

书的四开PDF格式中，前言（frontmatter）和正文（mainmatter）的页码不同。

修改热图的距离和链接方式

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。