从CSV文件中提取字符串输入中的数字的Pandas问题

huangapple go评论69阅读模式
英文:

Pandas issue pulling a number from a string input from a CSV file

问题

I have a csv file with the contents:

StrainMax 0.125
StrainMin -0.125
cycles_sec 10
time_percent 50

I read it in with:

df = pd.read_csv('Strains_SI.csv', index_col=None, header=None)

Then this:

StrainMax = df.iat[0,0]
StrainMin = df.iat[1,0]
cycles_sec = df.iat[2,0]
time_percent = df.iat[3,0]
print(StrainMax)
print(StrainMin)
print(cycles_sec)
print(time_percent)

And what I get is this (I believe these are strings):

StrainMax 0.125
StrainMin -0.125
cycles_sec 10
time_percent 50

But what I want is this:

0.125 (I want a float here)
-0.125 (I want a float here)
10 (I want an integer here)
50 (I want an integer here)

Is this possible?

I tried numerous Pandas commands and can't get it. I am expecting floats and integers of the numbers only.

英文:

I have a csv file with the contents:

StrainMax 0.125
StrainMin -0.125
cycles_sec 10
time_percent 50

I read it in with:

df = pd.read_csv('Strains_SI.csv', index_col=None, header=None)

Then this:

StrainMax = df.iat[0,0]
StrainMin = df.iat[1,0]
cycles_sec = df.iat[2,0]
time_percent = df.iat[3,0]
print(StrainMax)
print(StrainMin)
print(cycles_sec)
print(time_percent)

And what I get is this (I believe these are strings):

StrainMax 0.125
StrainMin -0.125
cycles_sec 10
time_percent 50

But what I want is this:

0.125 (I want an float here)
-0.125 (I want an float here)
10 (I want an integer here)
50 (I want an integer here)

Is this possible?

I tried numerous Pandas commands and can't get it. I am expecting floats and integers of the numbers only.

答案1

得分: 1

我注意到你的CSV文件中没有逗号。如果查看.read_csv方法的文档,你会看到它默认期望逗号分隔符。如果实际上没有逗号分隔,可以指定实际的分隔符以节省后续工作。在这里,我们可以使用空格作为分隔符读取文件:

我将你的数据保存到 data.csv 中(也可以选择 data.txt)。

import pandas as pd

df = pd.read_csv('data.csv',sep=' ',header=None)

请注意,我还将 header=None 设置为,这样文件的第一行将被视为数据而不是列名。在这种情况下,pandas 会自动生成列名 01

你可以看到它立即将标签与数值分开,因此你不必手动处理每个字符串。这样做的额外方便之处在于,pandas 现在看到了完全是数值的列。如果我们查看 df.dtypes,现在有:

列标为 1 的列现在是 float64 类型。你的四个浮点值可以通过 df[column][row] 访问,如:df[1][0]df[1][1]df[1][2]df[1][3]

为了在使用时更加灵活,我们可以再进一步。我们不仅可以告诉 pandas 不要将第一行用作标题,还可以告诉它每列是什么,并且甚至可以设置测量名称为方便起见的索引:

import pandas as pd

df = pd.read_csv('data.csv',sep=' ',names=['measurement','value'])
df.set_index('measurement',inplace=True)

现在你可以使用 df.loc['StrainMax']['value'] 很好地访问你的数据,以获得 0.125(确实是一个浮点数)。

附注:如果你发现 pandas 没有正确设置列的数据类型,你可以强制设置类型。在这个例子中,可以这样做:

df['value'] = df['value'].astype(float)
英文:

I notice there are no commas in your CSV file. If you check the documentation for the .read_csv method, you'll see that it does by default expect a comma separator. If you don't actually have comma separation, it can be useful to specify the actual separator to save effort later. Here we can read in the file using a space as the separator:

I saved your data to data.csv (I could have done data.txt just as easily.)

StrainMax 0.125
StrainMin -0.125
cycles_sec 10
time_percent 50
import pandas as pd

df = pd.read_csv('data.csv',sep=' ',header=None)

从CSV文件中提取字符串输入中的数字的Pandas问题

Note that I also set header=None so that the first row of the file would be treated as data and not column names. In this case, pandas makes up column names 0 and 1.

You can see that it separates the label from the numerical value right away so you don't have to do that manually with each string. The extra convenient thing about that is that pandas now sees an entirely numeric column. If we look at df.dtypes we now have:

从CSV文件中提取字符串输入中的数字的Pandas问题

The column labeled 1 is now of type float64. Your four float values are then accessible by df[column][row] as in: df[1][0] df[1][1] df[1][2] df[1][3]


To be a little slicker in your usage, we can go a bit further. Instead of just telling pandas to not use the first row as headers, we can instead tell it what each column is and even set the name of the measurement as the index for convenience:

import pandas as pd

df = pd.read_csv('data.csv',sep=' ',names=['measurement','value'])
df.set_index('measurement',inplace=True)

从CSV文件中提取字符串输入中的数字的Pandas问题

Now you can get at your data quite nicely with df.loc['StrainMax']['value'] to get 0.125 (which is indeed a float).

PS. If you ever find that pandas didn't set the data type of a column correctly, you can force the type. In this example, it would be

df['value'] = df['value'].astype(float)

huangapple
  • 本文由 发表于 2023年5月22日 03:40:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/76301629.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定