英文:
Pandas issue pulling a number from a string input from a CSV file
问题
I have a csv file with the contents:
StrainMax 0.125
StrainMin -0.125
cycles_sec 10
time_percent 50
I read it in with:
df = pd.read_csv('Strains_SI.csv', index_col=None, header=None)
Then this:
StrainMax = df.iat[0,0]
StrainMin = df.iat[1,0]
cycles_sec = df.iat[2,0]
time_percent = df.iat[3,0]
print(StrainMax)
print(StrainMin)
print(cycles_sec)
print(time_percent)
And what I get is this (I believe these are strings):
StrainMax 0.125
StrainMin -0.125
cycles_sec 10
time_percent 50
But what I want is this:
0.125 (I want a float here)
-0.125 (I want a float here)
10 (I want an integer here)
50 (I want an integer here)
Is this possible?
I tried numerous Pandas commands and can't get it. I am expecting floats and integers of the numbers only.
英文:
I have a csv file with the contents:
StrainMax 0.125
StrainMin -0.125
cycles_sec 10
time_percent 50
I read it in with:
df = pd.read_csv('Strains_SI.csv', index_col=None, header=None)
Then this:
StrainMax = df.iat[0,0]
StrainMin = df.iat[1,0]
cycles_sec = df.iat[2,0]
time_percent = df.iat[3,0]
print(StrainMax)
print(StrainMin)
print(cycles_sec)
print(time_percent)
And what I get is this (I believe these are strings):
StrainMax 0.125
StrainMin -0.125
cycles_sec 10
time_percent 50
But what I want is this:
0.125 (I want an float here)
-0.125 (I want an float here)
10 (I want an integer here)
50 (I want an integer here)
Is this possible?
I tried numerous Pandas commands and can't get it. I am expecting floats and integers of the numbers only.
答案1
得分: 1
我注意到你的CSV文件中没有逗号。如果查看.read_csv
方法的文档,你会看到它默认期望逗号分隔符。如果实际上没有逗号分隔,可以指定实际的分隔符以节省后续工作。在这里,我们可以使用空格作为分隔符读取文件:
我将你的数据保存到 data.csv
中(也可以选择 data.txt
)。
import pandas as pd
df = pd.read_csv('data.csv',sep=' ',header=None)
请注意,我还将 header=None
设置为,这样文件的第一行将被视为数据而不是列名。在这种情况下,pandas 会自动生成列名 0
和 1
。
你可以看到它立即将标签与数值分开,因此你不必手动处理每个字符串。这样做的额外方便之处在于,pandas 现在看到了完全是数值的列。如果我们查看 df.dtypes
,现在有:
列标为 1
的列现在是 float64
类型。你的四个浮点值可以通过 df[column][row]
访问,如:df[1][0]
、df[1][1]
、df[1][2]
、df[1][3]
。
为了在使用时更加灵活,我们可以再进一步。我们不仅可以告诉 pandas 不要将第一行用作标题,还可以告诉它每列是什么,并且甚至可以设置测量名称为方便起见的索引:
import pandas as pd
df = pd.read_csv('data.csv',sep=' ',names=['measurement','value'])
df.set_index('measurement',inplace=True)
现在你可以使用 df.loc['StrainMax']['value']
很好地访问你的数据,以获得 0.125
(确实是一个浮点数)。
附注:如果你发现 pandas 没有正确设置列的数据类型,你可以强制设置类型。在这个例子中,可以这样做:
df['value'] = df['value'].astype(float)
英文:
I notice there are no commas in your CSV file. If you check the documentation for the .read_csv
method, you'll see that it does by default expect a comma separator. If you don't actually have comma separation, it can be useful to specify the actual separator to save effort later. Here we can read in the file using a space as the separator:
I saved your data to data.csv
(I could have done data.txt
just as easily.)
StrainMax 0.125
StrainMin -0.125
cycles_sec 10
time_percent 50
import pandas as pd
df = pd.read_csv('data.csv',sep=' ',header=None)
Note that I also set header=None
so that the first row of the file would be treated as data and not column names. In this case, pandas makes up column names 0
and 1
.
You can see that it separates the label from the numerical value right away so you don't have to do that manually with each string. The extra convenient thing about that is that pandas now sees an entirely numeric column. If we look at df.dtypes
we now have:
The column labeled 1
is now of type float64
. Your four float values are then accessible by df[column][row]
as in: df[1][0]
df[1][1]
df[1][2]
df[1][3]
To be a little slicker in your usage, we can go a bit further. Instead of just telling pandas to not use the first row as headers, we can instead tell it what each column is and even set the name of the measurement as the index for convenience:
import pandas as pd
df = pd.read_csv('data.csv',sep=' ',names=['measurement','value'])
df.set_index('measurement',inplace=True)
Now you can get at your data quite nicely with df.loc['StrainMax']['value']
to get 0.125
(which is indeed a float).
PS. If you ever find that pandas didn't set the data type of a column correctly, you can force the type. In this example, it would be
df['value'] = df['value'].astype(float)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论