英文:
How do i split a column into distinct strings
问题
我希望将“genres”列拆分为多个列,其中每个新列只包含一个特定的流派(例如:动作|冒险|恐怖|喜剧等)。有20种不同的流派,所以我应该有20个新列。
我尝试过这样做:
new_column = df['genres'].str.split(pat='|', expand=True)
但它只创建了5个新列,其中流派没有分开。
英文:
I wish to split the 'genres' column into multiple columns, where each of the new column will only contain a particular genre( example: action | adventure| horror| Comedy etc). There are 20 different different genres so I should have 20 new columns.
I tried to do this :
new_column=df['genres'].str.split(pat='|',expand=True)
But all it did was create only 5 new columns where the genres are not separated by column.
答案1
得分: 1
你可以使用str.get_dummies来拆分值并创建新的列。以下是示例代码:
import pandas as pd
# 创建数据框
data = ["action | adventure | horror | Comedy | action1 | adventure1 | horror1 | Comedy1 | action2 | adventure2 | horror2 | Comedy2 "]
df = pd.DataFrame(data, columns=['genres'])
# 定义genres列
new_columns = df['genres'].str.get_dummies(sep='|')
# 拆分列
new_columns.columns = new_columns.columns.str.strip()
# 使用新结构重新创建数据框
df = pd.concat([df, new_columns], axis=1)
英文:
You could use str.get_dummies to split the values and create new columns.
Something like this can work.
import pandas as pd
# Creating dataframe
data = ["action | adventure| horror| Comedy | action1 | adventure1 | horror1 | Comedy1 | action2 | adventure2 | horror2 | Comedy2 "]
df = pd.DataFrame(data, columns=['genres'])
# Define genres columns
new_columns = df['genres'].str.get_dummies(sep='|')
# Split the columns
new_columns.columns = new_columns.columns.str.strip()
# Recreating the dataframe with the new structure
df = pd.concat([df, new_columns], axis=1)
答案2
得分: 0
这是翻译好的部分:
splt = lambda x: pd.Series([i for i in x.split('|')])
new_columns = df['genres'].apply(splt)
然后可以拼接:
df_concat = pd.concat([df, new_columns], axis=1)
英文:
Like this:
splt = lambda x: pd.Series([i for i in x.split('|')])
new_columns = df['genres'].apply(splt)
Then you can concat:
df_concat = pd.concat([df, new_columns], axis=1)
答案3
得分: 0
如果你想将各种类型拆分到它们自己的列中,一个更好的方法是使用 pandas 中的独热编码或 get_dummies 函数。
这将为每个唯一的类型创建一个新列,并在该类型出现的每一行的对应列中放置一个1。
df = pd.concat([df, df['genres'].str.get_dummies(sep='|')], axis=1)
英文:
If you want to split the genres into their own columns, a better approach would be to use one-hot encoding or get_dummies function in pandas.
This would create a new column for each unique genre, and place a 1 in the corresponding column for each row where that genre appears
df = pd.concat([df, df['genres'].str.get_dummies(sep='|')], axis=1)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论