英文:
Multiline string containing numbers
问题
你可以尝试使用apply
函数和lambda
表达式来处理所有的'price'列,而不需要为每一列创建新的变量。以下是一个示例代码:
import pandas as pd
# 定义一个函数来处理每个'price'列
def process_price_column(column):
# 使用apply和lambda将每个元素拆分成两部分
parts = column.apply(lambda x: str(x).split("\n"))
# 计算每一行的和,并返回
return parts.apply(lambda x: int(x[0]) + int(x[1].replace("****", "")))
# 在你的DataFrame中应用这个函数,它会处理所有'price'列
df1[['price2022', 'price2023']] = df1[['price2022', 'price2023']].apply(process_price_column)
# 打印处理后的DataFrame
print(df1)
这个代码会处理所有的'price'列,而不需要创建额外的变量,使代码更加高效和可扩展。
英文:
I have a dataframe that looks like:
data1 = [{'price2022': "12014\n205****", 'company': "toyota",'price2023': "10014\n180****"},
{'price2022': "22018", 'company': "apple",'price2023': "22018"},
{'price2022': "32020", 'company': "general electric",'price2023': "31020"},
{'price2022': "80170", 'company': "alibaba",'price2023': "83170"}
]
df1 = pd.DataFrame(data1)
The first value is a multiline string, which also contains the redundant string ''. Instead of the multiline string "12014\n205", I would like to have a single line number that is the sum of the two lines. (12014+205=12219).
I could try something like this:
dfa[['b', 'c']] = df1["price2022"].apply(lambda x: pd.Series(str(x).split("\n")))
dfa['c'] = dfa['c'].map(lambda x: str(x)[:-4]) #gets rid of the ****, probably not the smartest method
dfa['b']= dfa['b'].astype('int')
dfa['c'].replace('', 0, inplace=True)
dfa['c']= dfa['c'].astype('int')
dfa['d']=dfa['b']+dfa['c']
However, this seems incredibly inefficient. Not to mention that I have several 'price' columns I need to run through. Creating new variables for each seems like a bad way to deal with this.
Is there a more efficient way to do this without creating multiple new columns? How would I extend this such that I don't have to go have a look which columns have these multi lines and which don't, but the code just runs through all?
答案1
得分: 1
你可以循环遍历你的列,并应用一个函数来拆分/求和价格:
import pandas as pd
data1 = [{'price2022': "12014\n205****", 'company': "toyota",'price2023': "10014\n180****"},
{'price2022': "22018", 'company': "apple",'price2023': "22018"},
{'price2022': "32020", 'company': "general electric",'price2023': "31020"},
{'price2022': "80170", 'company': "alibaba",'price2023': "83170"}
]
df1 = pd.DataFrame(data1)
for col in df1.columns:
if col.startswith('price'):
df1[f'{col}_sum'] = df1[col].apply(lambda x: sum(map(int, x.strip('****').split('\n'))))
print(df1)
输出结果:
price2022 company price2023 price2022_sum price2023_sum
0 12014\n205**** toyota 10014\n180**** 12219 10194
1 22018 apple 22018 22018 22018
2 32020 general electric 31020 32020 31020
3 80170 alibaba 83170 80170 83170
英文:
You can loop over you columns and apply a function that will split/sum the prices:
import pandas as pd
data1 = [{'price2022': "12014\n205****", 'company': "toyota",'price2023': "10014\n180****"},
{'price2022': "22018", 'company': "apple",'price2023': "22018"},
{'price2022': "32020", 'company': "general electric",'price2023': "31020"},
{'price2022': "80170", 'company': "alibaba",'price2023': "83170"}
]
df1 = pd.DataFrame(data1)
for col in df1.columns:
if col.startswith('price'):
df1[f'{col}_sum'] = df1[col].apply(lambda x: sum(map(int, x.strip('****').split('\n'))))
print(df1)
Output:
price2022 company price2023 price2022_sum price2023_sum
0 12014\n205**** toyota 10014\n180**** 12219 10194
1 22018 apple 22018 22018 22018
2 32020 general electric 31020 32020 31020
3 80170 alibaba 83170 80170 83170
答案2
得分: 0
你可以尝试以下代码:
```python
df["price2022"] = df["price2022"].str.replace("\n", "+").apply(lambda row: eval(row.replace("****", "")))
它会产生以下结果:
df["price2022"].head()
<details>
<summary>英文:</summary>
You can try by:
```python
df["price2022"] = df["price2022"].str.replace("\n", "+").apply(lambda row: eval(row.replace("****", "")))
It gives:
df["price2022"].head()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论