2023年1月9日 04:47:50go评论106阅读模式

英文:

Add multiple columns and values to a pandas dataframe

问题

ID Name #Required J01 J02 J03 J04 J05 J06 J07
123 纽约 5 123W001 123W002 123W003 123W004 123W005 "blank" "blank"
234 波士顿 6 234W001 234W002 234W003 234W005 234W005 234W006 "blank"
345 迈阿密 2 345W001 345W002 "blank" "blank" "blank" "blank" "blank"
399 达拉斯 7 399W001 399W002 399W003 399W004 399W005 399W006 399W007

英文:

I will ask my question with a small dataframe, but the real one is 1000s of lines.

ID  Name     #Required
123 New York  5
234 Boston    6
345 Miami     2
345 Dallas    7

I need the to the number of columns equals to the max in the '#Required' column and fill them in like this:

ID  Name     #Required J01     J02      J03     J04      J05     J06     J07      
123 New York  5        123W001 123W002  123W003 123W004 123W005  &quot;blank&quot;    &quot;blank&quot;
234 Boston    6        234W001 234W002  234W003 234W005 234W005  234W006 &quot;blank&quot;
345 Miami     2        345W001 345W002  &quot;blank&quot; &quot;blank&quot; &quot;blank&quot;  &quot;blank&quot;   &quot;blank&quot;
399 Dallas    7        399W001 399W002  399W003 399W004 399W005  399W006    399W007

The number of values for each row depends on the number in the "#Required" column. with a "W" after "ID" and also I need to know how to handle the values if the number in "#Required" is greater than 10, because then the entry should be 123W010, only 1 leading 0.

ID  Name     #Required J01     J02      J03     J04      J05     J06     J07     
123 New York  5        123W001 123W002  123W003 123W004 123W005  &quot;blank&quot;  &quot;blank&quot;
234 Boston    6        234W001 234W002  234W003 234W005 234W005  234W006 &quot;blank&quot;
345 Miami     2        345W001 345W002  &quot;blank&quot; &quot;blank&quot; &quot;blank&quot;  &quot;blank&quot; &quot;blank&quot;
399 Dallas    7        399W001 399W002  399W003 399W004 399W005  399W006   399W007

答案1

得分: 2

以下是代码的翻译部分：

# Your dummy dataset
df = pd.DataFrame({'ID': [123, 234, 345, 399],
                   'Name': ['纽约', '波士顿', '迈阿密', '达拉斯'],
                   '#Required': [5, 6, 2, 7]})
# 创建每行所需值的列表
new_lists = df.apply(lambda x: [str(x['ID']) + 'W' + format(i, '03d') for i in range(1, x['#Required'] + 1)], 1)
# 将列表转换为具有NaN值的单独列
new_cols = new_lists.apply(pd.Series)
# 创建列名
col_names = ['J' + format(i, '02d') for i in range(1, len(new_cols.columns) + 1)]
# 将新列添加到原始数据框
df[col_names] = new_cols
# 用“blank”替换NaN值
df = df.fillna("blank")
df

希望这有所帮助。如果您需要任何进一步的翻译或解释，请告诉我。

英文:

Interesting problem. Here is a pandas way of doing this. Explanation in the second section -

# Your dummy dataset
df = pd.DataFrame({&#39;ID&#39;: [123, 234, 345, 399],
                   &#39;Name&#39;: [&#39;New York&#39;, &#39;Boston&#39;, &#39;Miami&#39;, &#39;Dallas&#39;],
                   &#39;#Required&#39;: [5, 6, 2, 7]})
# Creating list of the required values per row
new_lists = df.apply(lambda x: [str(x[&#39;ID&#39;])+&#39;W&#39;+format(i, &#39;03d&#39;) for i in range(1,x[&#39;#Required&#39;]+1)], 1)
# Converting the list to individual columns with nan values
new_cols = new_lists.apply(pd.Series)
# Creating column names
col_names = [&#39;J&#39;+format(i, &#39;02d&#39;) for i in range(1,len(new_cols.columns)+1)]
# Adding new columns to original dataframe
df[col_names] = new_cols
# Replacing nan values via &quot;blank&quot;
df = df.fillna(&#39;&quot;blank&quot;&#39;)
df

Explanation

The format(x, '03d') gives you are 3 length string formed with a digit, so for 1 -> 001 and for 10 -> 010
You can form the required strings for each row using str(x['ID'])+'W'+format(i, '03d') where i is the range of digits from 1 to the value in the #Required column.
The [str(x['ID'])+'W'+format(i, '03d') for i in range(1,x['#Required']+1)] is a list comprehension that gives you the result as a list. So, if you print the new_lists, it will look like this -

print(new_lists)
0        [123W001, 123W002, 123W003, 123W004, 123W005]
1    [234W001, 234W002, 234W003, 234W004, 234W005, ...
2                                   [345W001, 345W002]
3    [399W001, 399W002, 399W003, 399W004, 399W005, ...
dtype: object

Next, with the .apply(pd.Series) you get the lists, expanded to individual columns, and the smaller length lists fill up the rest of the columns with Nan values. Printing the new_cols at this stage results in this -

print(new_cols)
         0        1        2        3        4        5        6
0  123W001  123W002  123W003  123W004  123W005      NaN      NaN
1  234W001  234W002  234W003  234W004  234W005  234W006      NaN
2  345W001  345W002      NaN      NaN      NaN      NaN      NaN
3  399W001  399W002  399W003  399W004  399W005  399W006  399W007

Now to create the column names, we again use just a simple list comprehension, with the range of the number of columns in the new_cols and leverage the previously used format(x, '02d') but this time for 2 digits.

print(col_names)
[&#39;J01&#39;, &#39;J02&#39;, &#39;J03&#39;, &#39;J04&#39;, &#39;J05&#39;, &#39;J06&#39;, &#39;J07&#39;]

Finally you add the new_cols as new columns to the original dataframe df by using df[col_names] = new_cols
And, last but not the least, you replace the nan values with "blank" as your question show, using a simple df.fillna('"blank"')

Bonus

Here is how the code works if you have double digit integers in #Required column, such as 10 or 12

# Sample dataframe with 12 and 10 values in #Required
df = df = pd.DataFrame({&#39;ID&#39;: [123, 234, 345, 345],
                   &#39;Name&#39;: [&#39;New&#39;, &#39;Boston&#39;, &#39;Miami&#39;, &#39;Dallas&#39;],
                   &#39;#Required&#39;: [5, 10, 2, 12]})
new_lists = df.apply(lambda x: [str(x[&#39;ID&#39;])+&#39;W&#39;+format(i, &#39;03d&#39;) for i in range(1,x[&#39;#Required&#39;]+1)], 1)
new_cols = new_lists.apply(pd.Series)
col_names = [&#39;J&#39;+format(i, &#39;02d&#39;) for i in range(1,len(new_cols.columns)+1)]
df[col_names] = new_cols
df = df.fillna(&#39;&quot;blank&quot;&#39;)
df

Notice that the column names went from J01 to J12 and the values for the new columns for the rows with #Required values like 10 or 12 look like 234W010 or 399W012.

EDIT

For 1-2 digit ids you can modify the code with the same format logic as before.

# Sample dataframe 2 digit IDs
df = df = pd.DataFrame({&#39;ID&#39;: [123, 34, 345, 99],
                   &#39;Name&#39;: [&#39;New York&#39;, &#39;Boston&#39;, &#39;Miami&#39;, &#39;Dallas&#39;],
                   &#39;#Required&#39;: [5, 3, 2, 4]})
new_lists = df.apply(lambda x: [format(x[&#39;ID&#39;], &#39;03d&#39;)+&#39;W&#39;+format(i, &#39;03d&#39;) for i in range(1,x[&#39;#Required&#39;]+1)], 1)
new_cols = new_lists.apply(pd.Series)
col_names = [&#39;J&#39;+format(i, &#39;02d&#39;) for i in range(1,len(new_cols.columns)+1)]
df[col_names] = new_cols
df = df.fillna(&#39;&quot;blank&quot;&#39;)
df

答案2

得分: 0

import pandas as pd
# Your dummy dataset
df = pd.DataFrame({'ID': [123, 234, 345, 399],
                   'Name': ['New York', 'Boston', 'Miami', 'Dallas'],
                   '#Required': [5, 6, 2, 7]})
m = max(df['#Required'])  # Maximum number in '#Required'
lm = len(str(m))  # length of the maximum number in '#Required'
how_to_fill = {'J%s' % str(n).zfill(1 + lm): ['%sW%s' % (i, str(n).zfill(2 + lm)) if n <= r else 'Blank' for i, r in zip(df['ID'], df['#Required'])] for n in range(1, m + 1)}
df = df.join(pd.DataFrame(how_to_fill)).set_index('ID')
print(df)

Result :

         Name  Required      J01      J02      J03      J04      J05      J06      J07
ID                                                                                         
123  New York         5  123W001  123W002  123W003  123W004  123W005    Blank    Blank
234    Boston         6  234W001  234W002  234W003  234W004  234W005  234W006    Blank
345     Miami         2  345W001  345W002    Blank    Blank    Blank    Blank    Blank
399    Dallas         7  399W001  399W002  399W003  399W004  399W005  399W006  399W007

Note :

建议使用import numpy as np代替"Blank"，并使用np.nan。

英文:

Proposed code

import pandas as pd
# Your dummy dataset
df = pd.DataFrame({&#39;ID&#39;: [123, 234, 345, 399],
                   &#39;Name&#39;: [&#39;New York&#39;, &#39;Boston&#39;, &#39;Miami&#39;, &#39;Dallas&#39;],
                   &#39;#Required&#39;: [5, 6, 2, 7]})
m = max(df[&#39;#Required&#39;]) # Maximum number in &#39;#Required&#39;
lm = len(str(m)) # length of the maximum number in &#39;#Required&#39;
how_to_fill = {&#39;J%s&#39;%str(n).zfill(1+lm):[&#39;%sW%s&#39;%(i,str(n).zfill(2+lm)) if n &lt;= r else &#39;Blank&#39; for i,r in zip(df[&#39;ID&#39;], df[&#39;#Required&#39;])] for n in range(1, m+1)}
df = df.join(pd.DataFrame(how_to_fill)).set_index(&#39;ID&#39;)
print(df)

Result :

         Name  Required      J01      J02  ...      J04      J05      J06      J07
ID                                          ...                                    
123  New York          5  123W001  123W002  ...  123W004  123W005    Blank    Blank
234    Boston          6  234W001  234W002  ...  234W004  234W005  234W006    Blank
345     Miami          2  345W001  345W002  ...    Blank    Blank    Blank    Blank
399    Dallas          7  399W001  399W002  ...  399W004  399W005  399W006  399W007

Note :

I suggest you the use of np.nan with import numpy as np instead of "Blank"

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在pandas数据框中添加多个列和数值。

问题

答案1

Explanation

Bonus

EDIT

答案2

将数字转换为基数的代码部分

如何更改 kdeplot 或 scatterplot 中变量的颜色

最佳方法来检查一个NumPy数组是否全为非负数：

如何创建一个字典来统计两个文件中值（用户名）的出现次数？ – Python

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。