将字符串列转换为浮点数,但当我尝试减去两列时,出现了一个 ValueError。

huangapple go评论64阅读模式
英文:

Converted columns from str to floats, but when I attempt to subtract two columns I get a ValueError

问题

我从CSV文件中导入了数据到一个pandas数据框。我去掉了字符串中不是数字的部分,即所有值前面的"$"符号。然后,我将列的数据类型转换为float。在转换后,我运行了print(df.dtypes),显示所有列都是float64类型。在打印语句之后,我尝试从一个列中减去另一个列,但出现了错误:

line 23, in <module>
    Price_Diff = df["HTB_Price" - "McMaster_Price"]
TypeError: unsupported operand type(s) for -: 'str' and 'str'

以下是我的代码:

import pandas as pd
import matplotlib.pyplot as mp
import numpy as np

# 读取CSV文件并创建一个名为"df"的数据框
df = pd.read_csv('Example Price Dataset.csv', sep=',', engine='python')

# 去掉所有列中的"$"符号
df['HTB_Price'] = df['HTB_Price'].map(lambda x: x.lstrip('$'))
df['McMaster_Price'] = df['McMaster_Price'].map(lambda x: x.lstrip('$'))
df['Motion_Price'] = df['Motion_Price'].map(lambda x: x.lstrip('$'))
df['MRO_Price'] = df['MRO_Price'].map(lambda x: x.lstrip('$'))

# 将每列转换为浮点数数据类型而不是字符串
df["HTB_Price"] = df["HTB_Price"].astype(float)
df["McMaster_Price"] = df["McMaster_Price"].astype(float)
df["Motion_Price"] = df["Motion_Price"].astype(float)
df["MRO_Price"] = df["MRO_Price"].astype(float)
print(df.dtypes)

# 计算价格差并存储在变量Price_Diff中
Price_Diff = df["HTB_Price"] - df["McMaster_Price"]

错误发生在Price_Diff那一行,我不确定为什么会出现不能从字符串中减去字符串的错误,因为就在那一行之前,我检查了数据类型并显示它们都是浮点数。

我期望每列中的值相互相减,并将结果存储在变量Price_Diff中。

英文:

I imported data from a csv into a pandas data frame. I removed values from the string that are not numbers, the "$" in front of all the values. I then converted the columns to a float data type. I run a print(df.dtypes) after the conversion and it shows all the columns as being a float64. After the print statement I attempt to subtract one column from another but get an error saying:

line 23, in &lt;module&gt;
    Price_Diff = df[&quot;HTB_Price&quot; - &quot;McMaster_Price&quot;]
TypeError: unsupported operand type(s) for -: &#39;str&#39; and &#39;str&#39;

Here is my code

import pandas as pd
import matplotlib.pyplot as mp
import numpy as np

# Reads the csv and create a dataframe titled &quot;df&quot;
df = pd.read_csv(&#39;Example Price Dataset.csv&#39;, sep=&#39;\s*,\s*&#39;, engine=&#39;python&#39;)

# Removes the &quot;$&quot; from all columns using a left strip
df[&#39;HTB_Price&#39;] = df[&#39;HTB_Price&#39;].map(lambda x: x.lstrip(&#39;$&#39;))
df[&#39;McMaster_Price&#39;] = df[&#39;McMaster_Price&#39;].map(lambda x: x.lstrip(&#39;$&#39;))
df[&#39;Motion_Price&#39;] = df[&#39;Motion_Price&#39;].map(lambda x: x.lstrip(&#39;$&#39;))
df[&#39;MRO_Price&#39;] = df[&#39;MRO_Price&#39;].map(lambda x: x.lstrip(&#39;$&#39;))

# Converts each column to a float datatype instead of a string
df[&quot;HTB_Price&quot;] = df[&quot;HTB_Price&quot;].astype(float)
df[&quot;McMaster_Price&quot;] = df[&quot;McMaster_Price&quot;].astype(float)
df[&quot;Motion_Price&quot;] = df[&quot;Motion_Price&quot;].astype(float)
df[&quot;MRO_Price&quot;] = df[&quot;MRO_Price&quot;].astype(float)
print(df.dtypes)


#
Price_Diff = df[&quot;HTB_Price&quot; - &quot;McMaster_Price&quot;]


# Prints the dataframe
# print(df.dtypes)

The error is on the Price_Diff line, and I'm not sure why it is throwing an error about not being able to subtract strings from each other, when right before that line I'm checking the data types and it says they are both floats.

I'm expecting the values in each column to be subtracted and placed in the variable Price_Diff

答案1

得分: 2

你确实可以尝试 Price_Diff = df["HTB_Price"] - df["McMaster_Price"],但也存在一种基于字符串的接口

df.eval("HTB_Price - McMaster_Price")

用于筛选的类似接口也存在:

df.query("HTB_Price < McMaster_Price")

你甚至可以直接在原始数据帧上添加列:

>>> df.eval('C = A + B', inplace=True)
>>> df
   A   B   C
0  1  10  11
1  2   8  10
2  3   6   9
3  4   4   8
4  5   2   7
英文:

You could indeed try Price_Diff = df[&quot;HTB_Price&quot;] - df[&quot;McMaster_Price&quot;] but a string-based interface exists too:

df.eval(&quot;HTB_Price - McMaster_Price&quot;)

A similar interface exists for filtering:

df.query(&quot;HTB_Price &lt; McMaster_Price&quot;)

You can even modify the original dataframe inplace adding columns directly:

&gt;&gt;&gt; df.eval(&#39;C = A + B&#39;, inplace=True)
&gt;&gt;&gt; df
   A   B   C
0  1  10  11
1  2   8  10
2  3   6   9
3  4   4   8
4  5   2   7

答案2

得分: 1

Price_Diff = df["HTB_Price"] - df["McMaster_Price"]

英文:

What you want to write instead is:

Price_Diff = df[&quot;HTB_Price&quot;] - df[&quot;McMaster_Price&quot;]

The part inside the brackets is related to indexing your dataframe, so here, Python just tells you it is not able to substract &quot;McMaster_Price&quot; from &quot;HTB_Price&quot;.

huangapple
  • 本文由 发表于 2023年3月9日 21:41:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/75685399.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定