在pandas数据框中添加多个列和数值。

huangapple go评论106阅读模式
英文:

Add multiple columns and values to a pandas dataframe

问题

ID Name #Required J01 J02 J03 J04 J05 J06 J07
123 纽约 5 123W001 123W002 123W003 123W004 123W005 "blank" "blank"
234 波士顿 6 234W001 234W002 234W003 234W005 234W005 234W006 "blank"
345 迈阿密 2 345W001 345W002 "blank" "blank" "blank" "blank" "blank"
399 达拉斯 7 399W001 399W002 399W003 399W004 399W005 399W006 399W007

英文:

I will ask my question with a small dataframe, but the real one is 1000s of lines.

  1. ID Name #Required
  2. 123 New York 5
  3. 234 Boston 6
  4. 345 Miami 2
  5. 345 Dallas 7

I need the to the number of columns equals to the max in the '#Required' column and fill them in like this:

  1. ID Name #Required J01 J02 J03 J04 J05 J06 J07
  2. 123 New York 5 123W001 123W002 123W003 123W004 123W005 "blank" "blank"
  3. 234 Boston 6 234W001 234W002 234W003 234W005 234W005 234W006 "blank"
  4. 345 Miami 2 345W001 345W002 "blank" "blank" "blank" "blank" "blank"
  5. 399 Dallas 7 399W001 399W002 399W003 399W004 399W005 399W006 399W007

The number of values for each row depends on the number in the "#Required" column. with a "W" after "ID" and also I need to know how to handle the values if the number in "#Required" is greater than 10, because then the entry should be 123W010, only 1 leading 0.

  1. ID Name #Required J01 J02 J03 J04 J05 J06 J07
  2. 123 New York 5 123W001 123W002 123W003 123W004 123W005 "blank" "blank"
  3. 234 Boston 6 234W001 234W002 234W003 234W005 234W005 234W006 "blank"
  4. 345 Miami 2 345W001 345W002 "blank" "blank" "blank" "blank" "blank"
  5. 399 Dallas 7 399W001 399W002 399W003 399W004 399W005 399W006 399W007

答案1

得分: 2

以下是代码的翻译部分:

  1. # Your dummy dataset
  2. df = pd.DataFrame({'ID': [123, 234, 345, 399],
  3. 'Name': ['纽约', '波士顿', '迈阿密', '达拉斯'],
  4. '#Required': [5, 6, 2, 7]})
  5. # 创建每行所需值的列表
  6. new_lists = df.apply(lambda x: [str(x['ID']) + 'W' + format(i, '03d') for i in range(1, x['#Required'] + 1)], 1)
  7. # 将列表转换为具有NaN值的单独列
  8. new_cols = new_lists.apply(pd.Series)
  9. # 创建列名
  10. col_names = ['J' + format(i, '02d') for i in range(1, len(new_cols.columns) + 1)]
  11. # 将新列添加到原始数据框
  12. df[col_names] = new_cols
  13. # 用“blank”替换NaN值
  14. df = df.fillna("blank")
  15. df

希望这有所帮助。如果您需要任何进一步的翻译或解释,请告诉我。

英文:

Interesting problem. Here is a pandas way of doing this. Explanation in the second section -

  1. # Your dummy dataset
  2. df = pd.DataFrame({'ID': [123, 234, 345, 399],
  3. 'Name': ['New York', 'Boston', 'Miami', 'Dallas'],
  4. '#Required': [5, 6, 2, 7]})
  5. # Creating list of the required values per row
  6. new_lists = df.apply(lambda x: [str(x['ID'])+'W'+format(i, '03d') for i in range(1,x['#Required']+1)], 1)
  7. # Converting the list to individual columns with nan values
  8. new_cols = new_lists.apply(pd.Series)
  9. # Creating column names
  10. col_names = ['J'+format(i, '02d') for i in range(1,len(new_cols.columns)+1)]
  11. # Adding new columns to original dataframe
  12. df[col_names] = new_cols
  13. # Replacing nan values via "blank"
  14. df = df.fillna('"blank"')
  15. df

在pandas数据框中添加多个列和数值。


Explanation

  1. The format(x, '03d') gives you are 3 length string formed with a digit, so for 1 -> 001 and for 10 -> 010
  2. You can form the required strings for each row using str(x['ID'])+'W'+format(i, '03d') where i is the range of digits from 1 to the value in the #Required column.
  3. The [str(x['ID'])+'W'+format(i, '03d') for i in range(1,x['#Required']+1)] is a list comprehension that gives you the result as a list. So, if you print the new_lists, it will look like this -
  1. print(new_lists)
  2. 0 [123W001, 123W002, 123W003, 123W004, 123W005]
  3. 1 [234W001, 234W002, 234W003, 234W004, 234W005, ...
  4. 2 [345W001, 345W002]
  5. 3 [399W001, 399W002, 399W003, 399W004, 399W005, ...
  6. dtype: object
  1. Next, with the .apply(pd.Series) you get the lists, expanded to individual columns, and the smaller length lists fill up the rest of the columns with Nan values. Printing the new_cols at this stage results in this -
  1. print(new_cols)
  2. 0 1 2 3 4 5 6
  3. 0 123W001 123W002 123W003 123W004 123W005 NaN NaN
  4. 1 234W001 234W002 234W003 234W004 234W005 234W006 NaN
  5. 2 345W001 345W002 NaN NaN NaN NaN NaN
  6. 3 399W001 399W002 399W003 399W004 399W005 399W006 399W007
  1. Now to create the column names, we again use just a simple list comprehension, with the range of the number of columns in the new_cols and leverage the previously used format(x, '02d') but this time for 2 digits.
  1. print(col_names)
  2. ['J01', 'J02', 'J03', 'J04', 'J05', 'J06', 'J07']
  1. Finally you add the new_cols as new columns to the original dataframe df by using df[col_names] = new_cols

  2. And, last but not the least, you replace the nan values with "blank" as your question show, using a simple df.fillna('"blank"')


Bonus

Here is how the code works if you have double digit integers in #Required column, such as 10 or 12

  1. # Sample dataframe with 12 and 10 values in #Required
  2. df = df = pd.DataFrame({'ID': [123, 234, 345, 345],
  3. 'Name': ['New', 'Boston', 'Miami', 'Dallas'],
  4. '#Required': [5, 10, 2, 12]})
  5. new_lists = df.apply(lambda x: [str(x['ID'])+'W'+format(i, '03d') for i in range(1,x['#Required']+1)], 1)
  6. new_cols = new_lists.apply(pd.Series)
  7. col_names = ['J'+format(i, '02d') for i in range(1,len(new_cols.columns)+1)]
  8. df[col_names] = new_cols
  9. df = df.fillna('"blank"')
  10. df

在pandas数据框中添加多个列和数值。

Notice that the column names went from J01 to J12 and the values for the new columns for the rows with #Required values like 10 or 12 look like 234W010 or 399W012.


EDIT

For 1-2 digit ids you can modify the code with the same format logic as before.

  1. # Sample dataframe 2 digit IDs
  2. df = df = pd.DataFrame({'ID': [123, 34, 345, 99],
  3. 'Name': ['New York', 'Boston', 'Miami', 'Dallas'],
  4. '#Required': [5, 3, 2, 4]})
  5. new_lists = df.apply(lambda x: [format(x['ID'], '03d')+'W'+format(i, '03d') for i in range(1,x['#Required']+1)], 1)
  6. new_cols = new_lists.apply(pd.Series)
  7. col_names = ['J'+format(i, '02d') for i in range(1,len(new_cols.columns)+1)]
  8. df[col_names] = new_cols
  9. df = df.fillna('"blank"')
  10. df

在pandas数据框中添加多个列和数值。

答案2

得分: 0

  1. import pandas as pd
  2. # Your dummy dataset
  3. df = pd.DataFrame({'ID': [123, 234, 345, 399],
  4. 'Name': ['New York', 'Boston', 'Miami', 'Dallas'],
  5. '#Required': [5, 6, 2, 7]})
  6. m = max(df['#Required']) # Maximum number in '#Required'
  7. lm = len(str(m)) # length of the maximum number in '#Required'
  8. how_to_fill = {'J%s' % str(n).zfill(1 + lm): ['%sW%s' % (i, str(n).zfill(2 + lm)) if n <= r else 'Blank' for i, r in zip(df['ID'], df['#Required'])] for n in range(1, m + 1)}
  9. df = df.join(pd.DataFrame(how_to_fill)).set_index('ID')
  10. print(df)

Result :

  1. Name Required J01 J02 J03 J04 J05 J06 J07
  2. ID
  3. 123 New York 5 123W001 123W002 123W003 123W004 123W005 Blank Blank
  4. 234 Boston 6 234W001 234W002 234W003 234W004 234W005 234W006 Blank
  5. 345 Miami 2 345W001 345W002 Blank Blank Blank Blank Blank
  6. 399 Dallas 7 399W001 399W002 399W003 399W004 399W005 399W006 399W007

Note :

建议使用import numpy as np代替"Blank",并使用np.nan

英文:

Proposed code

  1. import pandas as pd
  2. # Your dummy dataset
  3. df = pd.DataFrame({&#39;ID&#39;: [123, 234, 345, 399],
  4. &#39;Name&#39;: [&#39;New York&#39;, &#39;Boston&#39;, &#39;Miami&#39;, &#39;Dallas&#39;],
  5. &#39;#Required&#39;: [5, 6, 2, 7]})
  6. m = max(df[&#39;#Required&#39;]) # Maximum number in &#39;#Required&#39;
  7. lm = len(str(m)) # length of the maximum number in &#39;#Required&#39;
  8. how_to_fill = {&#39;J%s&#39;%str(n).zfill(1+lm):[&#39;%sW%s&#39;%(i,str(n).zfill(2+lm)) if n &lt;= r else &#39;Blank&#39; for i,r in zip(df[&#39;ID&#39;], df[&#39;#Required&#39;])] for n in range(1, m+1)}
  9. df = df.join(pd.DataFrame(how_to_fill)).set_index(&#39;ID&#39;)
  10. print(df)

Result :

  1. Name Required J01 J02 ... J04 J05 J06 J07
  2. ID ...
  3. 123 New York 5 123W001 123W002 ... 123W004 123W005 Blank Blank
  4. 234 Boston 6 234W001 234W002 ... 234W004 234W005 234W006 Blank
  5. 345 Miami 2 345W001 345W002 ... Blank Blank Blank Blank
  6. 399 Dallas 7 399W001 399W002 ... 399W004 399W005 399W006 399W007

Note :

I suggest you the use of np.nan with import numpy as np instead of "Blank"

huangapple
  • 本文由 发表于 2023年1月9日 04:47:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/75051169.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定