2023年5月22日 03:40:51go评论69阅读模式

英文:

Pandas issue pulling a number from a string input from a CSV file

问题

I have a csv file with the contents:

StrainMax 0.125
StrainMin -0.125
cycles_sec 10
time_percent 50

I read it in with:

df = pd.read_csv('Strains_SI.csv', index_col=None, header=None)

Then this:

StrainMax = df.iat[0,0]
StrainMin = df.iat[1,0]
cycles_sec = df.iat[2,0]
time_percent = df.iat[3,0]
print(StrainMax)
print(StrainMin)
print(cycles_sec)
print(time_percent)

And what I get is this (I believe these are strings):

StrainMax 0.125
StrainMin -0.125
cycles_sec 10
time_percent 50

But what I want is this:

0.125 (I want a float here)
-0.125 (I want a float here)
10 (I want an integer here)
50 (I want an integer here)

Is this possible?

I tried numerous Pandas commands and can't get it. I am expecting floats and integers of the numbers only.

英文:

I have a csv file with the contents:

StrainMax 0.125
StrainMin -0.125
cycles_sec 10
time_percent 50

I read it in with:

df = pd.read_csv(&#39;Strains_SI.csv&#39;, index_col=None, header=None)

Then this:

StrainMax = df.iat[0,0]
StrainMin = df.iat[1,0]
cycles_sec = df.iat[2,0]
time_percent = df.iat[3,0]
print(StrainMax)
print(StrainMin)
print(cycles_sec)
print(time_percent)

And what I get is this (I believe these are strings):

StrainMax 0.125
StrainMin -0.125
cycles_sec 10
time_percent 50

But what I want is this:

0.125 (I want an float here)
-0.125 (I want an float here)
10 (I want an integer here)
50 (I want an integer here)

Is this possible?

I tried numerous Pandas commands and can't get it. I am expecting floats and integers of the numbers only.

答案1

得分: 1

我注意到你的CSV文件中没有逗号。如果查看.read_csv方法的文档，你会看到它默认期望逗号分隔符。如果实际上没有逗号分隔，可以指定实际的分隔符以节省后续工作。在这里，我们可以使用空格作为分隔符读取文件：

我将你的数据保存到 data.csv 中（也可以选择 data.txt）。

import pandas as pd

df = pd.read_csv('data.csv',sep=' ',header=None)

请注意，我还将 header=None 设置为，这样文件的第一行将被视为数据而不是列名。在这种情况下，pandas 会自动生成列名 0 和 1。

你可以看到它立即将标签与数值分开，因此你不必手动处理每个字符串。这样做的额外方便之处在于，pandas 现在看到了完全是数值的列。如果我们查看 df.dtypes，现在有：

列标为 1 的列现在是 float64 类型。你的四个浮点值可以通过 df[column][row] 访问，如：df[1][0]、df[1][1]、df[1][2]、df[1][3]。

为了在使用时更加灵活，我们可以再进一步。我们不仅可以告诉 pandas 不要将第一行用作标题，还可以告诉它每列是什么，并且甚至可以设置测量名称为方便起见的索引：

import pandas as pd

df = pd.read_csv('data.csv',sep=' ',names=['measurement','value'])
df.set_index('measurement',inplace=True)

现在你可以使用 df.loc['StrainMax']['value'] 很好地访问你的数据，以获得 0.125（确实是一个浮点数）。

附注：如果你发现 pandas 没有正确设置列的数据类型，你可以强制设置类型。在这个例子中，可以这样做：

df['value'] = df['value'].astype(float)

英文:

I notice there are no commas in your CSV file. If you check the documentation for the .read_csv method, you'll see that it does by default expect a comma separator. If you don't actually have comma separation, it can be useful to specify the actual separator to save effort later. Here we can read in the file using a space as the separator:

I saved your data to data.csv (I could have done data.txt just as easily.)

StrainMax 0.125
StrainMin -0.125
cycles_sec 10
time_percent 50

import pandas as pd

df = pd.read_csv(&#39;data.csv&#39;,sep=&#39; &#39;,header=None)

Note that I also set header=None so that the first row of the file would be treated as data and not column names. In this case, pandas makes up column names 0 and 1.

You can see that it separates the label from the numerical value right away so you don't have to do that manually with each string. The extra convenient thing about that is that pandas now sees an entirely numeric column. If we look at df.dtypes we now have:

The column labeled 1 is now of type float64. Your four float values are then accessible by df[column][row] as in: df[1][0] df[1][1] df[1][2] df[1][3]

To be a little slicker in your usage, we can go a bit further. Instead of just telling pandas to not use the first row as headers, we can instead tell it what each column is and even set the name of the measurement as the index for convenience:

import pandas as pd

df = pd.read_csv(&#39;data.csv&#39;,sep=&#39; &#39;,names=[&#39;measurement&#39;,&#39;value&#39;])
df.set_index(&#39;measurement&#39;,inplace=True)

Now you can get at your data quite nicely with df.loc['StrainMax']['value'] to get 0.125 (which is indeed a float).

PS. If you ever find that pandas didn't set the data type of a column correctly, you can force the type. In this example, it would be

df[&#39;value&#39;] = df[&#39;value&#39;].astype(float)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从CSV文件中提取字符串输入中的数字的Pandas问题

问题

答案1

Python ProcessPoolExecutor 问题

Python：嵌套JSON转DataFrame

使用 Synapse Spark 将数据发送到 Azure Event Hub

如何配置我的工具以忽略或阻止在 Jupyter 笔记本中更新 execution_count 字段？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论