2023年6月12日 15:58:35go评论71阅读模式

英文:

Multiline string containing numbers

问题

你可以尝试使用apply函数和lambda表达式来处理所有的'price'列，而不需要为每一列创建新的变量。以下是一个示例代码：

import pandas as pd

# 定义一个函数来处理每个'price'列
def process_price_column(column):
    # 使用apply和lambda将每个元素拆分成两部分
    parts = column.apply(lambda x: str(x).split("\n"))
    
    # 计算每一行的和，并返回
    return parts.apply(lambda x: int(x[0]) + int(x[1].replace("****", "")))

# 在你的DataFrame中应用这个函数，它会处理所有'price'列
df1[['price2022', 'price2023']] = df1[['price2022', 'price2023']].apply(process_price_column)

# 打印处理后的DataFrame
print(df1)

这个代码会处理所有的'price'列，而不需要创建额外的变量，使代码更加高效和可扩展。

英文:

I have a dataframe that looks like:

 data1 = [{&#39;price2022&#39;: &quot;12014\n205****&quot;, &#39;company&#39;: &quot;toyota&quot;,&#39;price2023&#39;: &quot;10014\n180****&quot;}, 
    {&#39;price2022&#39;: &quot;22018&quot;, &#39;company&#39;: &quot;apple&quot;,&#39;price2023&#39;: &quot;22018&quot;}, 
    {&#39;price2022&#39;: &quot;32020&quot;, &#39;company&#39;: &quot;general electric&quot;,&#39;price2023&#39;: &quot;31020&quot;}, 
    {&#39;price2022&#39;: &quot;80170&quot;, &#39;company&#39;: &quot;alibaba&quot;,&#39;price2023&#39;: &quot;83170&quot;}
   ]
 df1 = pd.DataFrame(data1)

The first value is a multiline string, which also contains the redundant string ''. Instead of the multiline string "12014\n205", I would like to have a single line number that is the sum of the two lines. (12014+205=12219).

I could try something like this:

dfa[[&#39;b&#39;, &#39;c&#39;]] = df1[&quot;price2022&quot;].apply(lambda x: pd.Series(str(x).split(&quot;\n&quot;)))
dfa[&#39;c&#39;] = dfa[&#39;c&#39;].map(lambda x: str(x)[:-4])   #gets rid of the ****, probably not the smartest method
dfa[&#39;b&#39;]=  dfa[&#39;b&#39;].astype(&#39;int&#39;)
dfa[&#39;c&#39;].replace(&#39;&#39;, 0, inplace=True)
dfa[&#39;c&#39;]=  dfa[&#39;c&#39;].astype(&#39;int&#39;)
dfa[&#39;d&#39;]=dfa[&#39;b&#39;]+dfa[&#39;c&#39;]

However, this seems incredibly inefficient. Not to mention that I have several 'price' columns I need to run through. Creating new variables for each seems like a bad way to deal with this.
Is there a more efficient way to do this without creating multiple new columns? How would I extend this such that I don't have to go have a look which columns have these multi lines and which don't, but the code just runs through all?

答案1

得分: 1

你可以循环遍历你的列，并应用一个函数来拆分/求和价格：

import pandas as pd

data1 = [{'price2022': "12014\n205****", 'company': "toyota",'price2023': "10014\n180****"}, 
    {'price2022': "22018", 'company': "apple",'price2023': "22018"}, 
    {'price2022': "32020", 'company': "general electric",'price2023': "31020"}, 
    {'price2022': "80170", 'company': "alibaba",'price2023': "83170"}
]
df1 = pd.DataFrame(data1)

for col in df1.columns:
    if col.startswith('price'):
        df1[f'{col}_sum'] = df1[col].apply(lambda x: sum(map(int, x.strip('****').split('\n'))))

print(df1)

输出结果：

        price2022           company       price2023  price2022_sum  price2023_sum
0  12014\n205****            toyota  10014\n180****          12219          10194
1           22018             apple           22018          22018          22018
2           32020  general electric           31020          32020          31020
3           80170           alibaba           83170          80170          83170

英文:

You can loop over you columns and apply a function that will split/sum the prices:

import pandas as pd

data1 = [{&#39;price2022&#39;: &quot;12014\n205****&quot;, &#39;company&#39;: &quot;toyota&quot;,&#39;price2023&#39;: &quot;10014\n180****&quot;}, 
    {&#39;price2022&#39;: &quot;22018&quot;, &#39;company&#39;: &quot;apple&quot;,&#39;price2023&#39;: &quot;22018&quot;}, 
    {&#39;price2022&#39;: &quot;32020&quot;, &#39;company&#39;: &quot;general electric&quot;,&#39;price2023&#39;: &quot;31020&quot;}, 
    {&#39;price2022&#39;: &quot;80170&quot;, &#39;company&#39;: &quot;alibaba&quot;,&#39;price2023&#39;: &quot;83170&quot;}
   ]
df1 = pd.DataFrame(data1)

for col in df1.columns:
    if col.startswith(&#39;price&#39;):
        df1[f&#39;{col}_sum&#39;] = df1[col].apply(lambda x: sum(map(int, x.strip(&#39;****&#39;).split(&#39;\n&#39;))))

print(df1)

Output:

        price2022           company       price2023  price2022_sum  price2023_sum
0  12014\n205****            toyota  10014\n180****          12219          10194
1           22018             apple           22018          22018          22018
2           32020  general electric           31020          32020          31020
3           80170           alibaba           83170          80170          83170

答案2

得分: 0

你可以尝试以下代码：

```python
df["price2022"] = df["price2022"].str.replace("\n", "+").apply(lambda row: eval(row.replace("****", "")))

它会产生以下结果：

df["price2022"].head()


<details>
<summary>英文:</summary>

You can try by:

```python
df[&quot;price2022&quot;] = df[&quot;price2022&quot;].str.replace(&quot;\n&quot;, &quot;+&quot;).apply(lambda row: eval(row.replace(&quot;****&quot;, &quot;&quot;)))

It gives:

df[&quot;price2022&quot;].head()

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

多行字符串包含数字

问题

答案1

答案2

从一个包含 n 个数字的列表中选择样本，不重复。

如何在 tkinter 关闭窗口时停止工作线程？

在Python中进行数字和字母的排序。

数据框最大匹配两列

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论