将列拆分为不同的字符串。

huangapple go评论109阅读模式
英文:

How do i split a column into distinct strings

问题

我希望将“genres”列拆分为多个列,其中每个新列只包含一个特定的流派(例如:动作|冒险|恐怖|喜剧等)。有20种不同的流派,所以我应该有20个新列。

我尝试过这样做:

new_column = df['genres'].str.split(pat='|', expand=True)

但它只创建了5个新列,其中流派没有分开。

英文:

将列拆分为不同的字符串。

I wish to split the 'genres' column into multiple columns, where each of the new column will only contain a particular genre( example: action | adventure| horror| Comedy etc). There are 20 different different genres so I should have 20 new columns.

I tried to do this :

new_column=df['genres'].str.split(pat='|',expand=True)

But all it did was create only 5 new columns where the genres are not separated by column.

答案1

得分: 1

你可以使用str.get_dummies来拆分值并创建新的列。以下是示例代码:

import pandas as pd

# 创建数据框
data = ["action | adventure | horror | Comedy | action1 | adventure1 | horror1 | Comedy1 | action2 | adventure2 | horror2 | Comedy2 "]
df = pd.DataFrame(data, columns=['genres'])

# 定义genres列
new_columns = df['genres'].str.get_dummies(sep='|')

# 拆分列
new_columns.columns = new_columns.columns.str.strip()

# 使用新结构重新创建数据框
df = pd.concat([df, new_columns], axis=1)
英文:

You could use str.get_dummies to split the values and create new columns.

Something like this can work.

import pandas as pd
  
# Creating dataframe
data = ["action | adventure| horror| Comedy | action1 | adventure1 | horror1 | Comedy1 | action2 | adventure2 | horror2 | Comedy2 "]
df = pd.DataFrame(data, columns=['genres'])
  
# Define genres columns
new_columns = df['genres'].str.get_dummies(sep='|')

# Split the columns
new_columns.columns = new_columns.columns.str.strip()

# Recreating the dataframe with the new structure
df = pd.concat([df, new_columns], axis=1)

答案2

得分: 0

这是翻译好的部分:

splt = lambda x: pd.Series([i for i in x.split('|')])
new_columns = df['genres'].apply(splt)

然后可以拼接

df_concat = pd.concat([df, new_columns], axis=1)
英文:

Like this:

splt = lambda x: pd.Series([i for i in x.split('|')])
new_columns = df['genres'].apply(splt)

Then you can concat:

df_concat = pd.concat([df, new_columns], axis=1)

答案3

得分: 0

如果你想将各种类型拆分到它们自己的列中,一个更好的方法是使用 pandas 中的独热编码或 get_dummies 函数。

这将为每个唯一的类型创建一个新列,并在该类型出现的每一行的对应列中放置一个1。

df = pd.concat([df, df['genres'].str.get_dummies(sep='|')], axis=1)
英文:

If you want to split the genres into their own columns, a better approach would be to use one-hot encoding or get_dummies function in pandas.

This would create a new column for each unique genre, and place a 1 in the corresponding column for each row where that genre appears

df = pd.concat([df, df['genres'].str.get_dummies(sep='|')], axis=1)

huangapple
  • 本文由 发表于 2023年6月9日 02:27:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/76434726.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定