为什么使用R从.csv数据绘制直方图时会有这么多重复的列?

huangapple go评论69阅读模式
英文:

Why are there so many duplicate columns when I draw histograms from .csv data using R?

问题

我将.csv文件读入R,以下是我的代码。

ggplot(data = data)+
geom_bar(mapping = aes(x = platform))+
geom_text(aes(x = platform, label =..count..), stat = "count", vjust = -0.5)


为什么有重复的列(例如多个“聊天室”和“提示”等)?

![enter image description here](https://i.stack.imgur.com/KKg6G.png)

我尝试将Excel表中的一个“提示”单元格复制,然后粘贴到所有现有的“提示”单元格中,然后重新加载R中的数据。仍然有重复的列。我想将这些重复项合并到直方图中。有什么想法如何做到这一点?
英文:

I read the .csv file into R and my code is below.

ggplot(data = data)+
  geom_bar(mapping = aes(x = platform))+
  geom_text(aes(x = platform, label =..count..), stat = "count", vjust = -0.5)

Why are there duplicate columns (e.g. multiple "chat rooms" and "tip", etc.)?

为什么使用R从.csv数据绘制直方图时会有这么多重复的列?

I tried copying one "tip" cell in the excel sheet, and pasting that to all the existing "tip" cells and then reload data in R. Still duplicate columns. I want to combine those duplicates in the histogram. Any ideas how to do so?

答案1

得分: 0

看着字符串及其在刻度下的对齐方式,似乎重复的单词对齐方式不同。例如,请注意左侧的刻度略微偏离 r 的左侧,并且右侧的刻度位于 r 的中心,表明右侧的单词在左侧相比稍微向左居中。

类似地,

等等。

运行以下代码:

data$platform <- trimws(data$platform)

然后重新绘制。

如果这不起作用,_可能_是一些空格被重复使用或相似(尽管图片并不真正表明如此)。如果将“1个或多个空格”安全地缩减为一个空格,那么也可以执行以下操作:

data$platform <- gsub("[:space:]+", " ", data$platform)

然后重新绘制。[:space:] 组匹配空格、制表符、换行符、换页符、垂直换页符等等,+ 表示“1个或多个”。" " 是替换值。这相对贪婪,因此如果有两个或更多个连续的空格(这些字符中的任何一个),它们将被替换为单个 " "。(再次强调,我不认为这是问题的根本原因,但这是你可以尝试的更多方法。)

英文:

Looking closely at the strings and their alignment under the ticks, it appears that repeat words are aligned differently. For example,notice how the tick on the left is slightly off-center-left of the r, and the tick on the right is centered on the r, suggesting that the words on the right are centered a little more left than on the left.

为什么使用R从.csv数据绘制直方图时会有这么多重复的列?

Similarly,

为什么使用R从.csv数据绘制直方图时会有这么多重复的列?

etc.

Run this:

data$platform <- trimws(data$platform)

and then plot again.

If that doesn't do it, it might be that some spaces are repeated or similar (though the pictures don't really suggest that). If it's safe to reduce "1 or more blanks" to a single space, then also do this:

data$platform <- gsub("[:space:]+", " ", data$platform)

and then plot again. The [:space:] group matches space, tab, newline, form feed, vertical form feed, and perhaps other not-so-obvious characters; the + means "1 or more". The " " is the replacement value. This is relatively greedy, so if there are two or more blank spaces (of any of those characters I just listed) in a row, then they will be replaced with a single " ". (Again, I don't think this is the culprit, but it's more ammo for you to work with.)

huangapple
  • 本文由 发表于 2023年7月28日 05:15:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76783452.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定