多行字符串包含数字

huangapple go评论71阅读模式
英文:

Multiline string containing numbers

问题

你可以尝试使用apply函数和lambda表达式来处理所有的'price'列,而不需要为每一列创建新的变量。以下是一个示例代码:

import pandas as pd

# 定义一个函数来处理每个'price'列
def process_price_column(column):
    # 使用apply和lambda将每个元素拆分成两部分
    parts = column.apply(lambda x: str(x).split("\n"))
    
    # 计算每一行的和,并返回
    return parts.apply(lambda x: int(x[0]) + int(x[1].replace("****", "")))

# 在你的DataFrame中应用这个函数,它会处理所有'price'列
df1[['price2022', 'price2023']] = df1[['price2022', 'price2023']].apply(process_price_column)

# 打印处理后的DataFrame
print(df1)

这个代码会处理所有的'price'列,而不需要创建额外的变量,使代码更加高效和可扩展。

英文:

I have a dataframe that looks like:

 data1 = [{'price2022': "12014\n205****", 'company': "toyota",'price2023': "10014\n180****"}, 
    {'price2022': "22018", 'company': "apple",'price2023': "22018"}, 
    {'price2022': "32020", 'company': "general electric",'price2023': "31020"}, 
    {'price2022': "80170", 'company': "alibaba",'price2023': "83170"}
   ]
 df1 = pd.DataFrame(data1)

The first value is a multiline string, which also contains the redundant string ''. Instead of the multiline string "12014\n205", I would like to have a single line number that is the sum of the two lines. (12014+205=12219).

I could try something like this:

dfa[['b', 'c']] = df1["price2022"].apply(lambda x: pd.Series(str(x).split("\n")))
dfa['c'] = dfa['c'].map(lambda x: str(x)[:-4])   #gets rid of the ****, probably not the smartest method
dfa['b']=  dfa['b'].astype('int')
dfa['c'].replace('', 0, inplace=True)
dfa['c']=  dfa['c'].astype('int')
dfa['d']=dfa['b']+dfa['c']

However, this seems incredibly inefficient. Not to mention that I have several 'price' columns I need to run through. Creating new variables for each seems like a bad way to deal with this.
Is there a more efficient way to do this without creating multiple new columns? How would I extend this such that I don't have to go have a look which columns have these multi lines and which don't, but the code just runs through all?

答案1

得分: 1

你可以循环遍历你的列,并应用一个函数来拆分/求和价格:

import pandas as pd

data1 = [{'price2022': "12014\n205****", 'company': "toyota",'price2023': "10014\n180****"}, 
    {'price2022': "22018", 'company': "apple",'price2023': "22018"}, 
    {'price2022': "32020", 'company': "general electric",'price2023': "31020"}, 
    {'price2022': "80170", 'company': "alibaba",'price2023': "83170"}
]
df1 = pd.DataFrame(data1)

for col in df1.columns:
    if col.startswith('price'):
        df1[f'{col}_sum'] = df1[col].apply(lambda x: sum(map(int, x.strip('****').split('\n'))))

print(df1)

输出结果:

        price2022           company       price2023  price2022_sum  price2023_sum
0  12014\n205****            toyota  10014\n180****          12219          10194
1           22018             apple           22018          22018          22018
2           32020  general electric           31020          32020          31020
3           80170           alibaba           83170          80170          83170
英文:

You can loop over you columns and apply a function that will split/sum the prices:

import pandas as pd

data1 = [{'price2022': "12014\n205****", 'company': "toyota",'price2023': "10014\n180****"}, 
    {'price2022': "22018", 'company': "apple",'price2023': "22018"}, 
    {'price2022': "32020", 'company': "general electric",'price2023': "31020"}, 
    {'price2022': "80170", 'company': "alibaba",'price2023': "83170"}
   ]
df1 = pd.DataFrame(data1)

for col in df1.columns:
    if col.startswith('price'):
        df1[f'{col}_sum'] = df1[col].apply(lambda x: sum(map(int, x.strip('****').split('\n'))))

print(df1)

Output:

        price2022           company       price2023  price2022_sum  price2023_sum
0  12014\n205****            toyota  10014\n180****          12219          10194
1           22018             apple           22018          22018          22018
2           32020  general electric           31020          32020          31020
3           80170           alibaba           83170          80170          83170

答案2

得分: 0

你可以尝试以下代码

```python
df["price2022"] = df["price2022"].str.replace("\n", "+").apply(lambda row: eval(row.replace("****", "")))

它会产生以下结果:

df["price2022"].head()

多行字符串包含数字


<details>
<summary>英文:</summary>

You can try by:

```python
df[&quot;price2022&quot;] = df[&quot;price2022&quot;].str.replace(&quot;\n&quot;, &quot;+&quot;).apply(lambda row: eval(row.replace(&quot;****&quot;, &quot;&quot;)))

It gives:

df[&quot;price2022&quot;].head()

多行字符串包含数字

huangapple
  • 本文由 发表于 2023年6月12日 15:58:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76454605.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定