英文:
How can I make transpose till particular value of a column?
问题
我想让它看起来像这样:'
我尝试了这种方法,但没有得到我想要的结果:
import pandas as pd
data2 = pd.read_excel(r'C:\Users\nokla\Desktop\Book11.xlsx', sheet_name='Sheet6', usecols=['Col1'])
num_columns = 11
num_rows = len(data2) // num_columns
data_dict = {}
for i in range(1, num_columns + 1):
column_name = 'Col{}'.format(i)
column_values = data2.iloc[(i - 1) * num_rows: i * num_rows, 0].tolist()
data_dict[column_name] = column_values
df5 = pd.DataFrame(data_dict)
df5
<details>
<summary>英文:</summary>
My data frame:
``` import pandas as pd
column_data = [33.5,"W","A to B, OK","slinks down to hammer","T c V b Rell 10 (82b 6x1) DW:84.14",
33.4,"•","A to B, no","Tosses it uo",
33.3,2,"A to B, 2 R","On a right way","slinks down to hammer","BAN: 185/4CRR: 5.60 ","T 69 (80b 6x4)","Mu 7 (17b)","Mark 6-0-29-1","George Dockrel","Bet 31",
33.2,2,"A to T, 2 R","slinks down to hammer",
33.1,"2","A to T, 2 r","angling away, cuts it"]
df = pd.DataFrame(column_data, columns=['Col1'])
I tried this way but not getting as I desire:
import pandas as pd
num_columns = 11
num_rows = len(data2) // num_columns
data_dict = {}
for i in range(1, num_columns + 1):
column_name = 'Col{}'.format(i)
column_values = data2.iloc[(i - 1) * num_rows: i * num_rows, 0].tolist()
data_dict[column_name] = column_values
df5 = pd.DataFrame(data_dict)
df5
答案1
得分: 2
如果第一列始终是唯一包含浮点数的列,这在您的示例数据中似乎是这样的,那么以下代码将起作用:
# 基于浮点数确定不同行的拆分
rows = (df.Col1.map(type)==float).cumsum()
df1 = df.groupby(rows).agg(list)\
.Col1.astype(str).str[1:-1]\
.str.split(',', expand=True)\
.add_prefix("col_")
这将按行分组(通过在列中找到浮点数进行计算),并为每行创建一个列表。列表被转换为字符串,然后按“,”拆分为列 - 这是因为列表的长度不相等。添加了一个前缀以匹配您图片中的列(尽管数字是“-1”)。
如果第1列中不仅包含浮点数,或者始终包含浮点数的假设是不正确的,那么您将需要一种方法来区分列表中的每行的起始位置。
您的代码不起作用,因为它假设每行的列表中有相同数量的项 - 这并不是情况,因为空数据框单元格没有None
值,所以无法均匀地按11拆分列,以返回所需的结果。
英文:
If the first column is always the only column with floats, which appears to be the case in your sample data, then the following code would work:
# determines splits for different rows based on floats
rows = (df.Col1.map(type)==float).cumsum()
df1 = df.groupby(rows).agg(list)\
.Col1.astype(str).str[1:-1]\
.str.split(',', expand=True)\
.add_prefix("col_")
This groups by the rows (calculated by finding floats in the column), and creates a list for each row. The list is converted to a string, and split by "," to columns - this is because the lists are not of equal length. A prefix is added to match the columns in your picture (although numbers are -1
).
If the assumption of only floats in column 1 and always floats in column 1 is incorrect, then you will need some way of distinguishing where each row starts in the list.
Your code does not work because it assumes that there are an equal number of items in the list for each row - which is not the case as there are no None
values for empty dataframe cells, so the column cannot be split evenly by 11 to return the desired result.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论