2023年5月26日 17:05:15go评论150阅读模式

英文:

How do I prevent a numpy ndarray column from being converted to string when saving a Pandas DataFrame to csv?

问题

我有一个包含"ID"列和"Vector"（包含(1,500)大小的数组）列的DataFrame。我必须将DF保存为csv。当我将保存的csv转换为DF时，数组变成了字符串，我无法再使用它进行操作。

例如，在保存DF之前，向量列如下所示：

>> DataFrame_example["Vector"][0]
Out:
array([[-4.51561287e-02, -5.02060959e-03,  1.01038935e-02,
        -3.24810972e-03,  8.50208327e-02, -3.12430300e-02,
        -3.06447037e-02, -6.82420060e-02,  4.08798642e-02
             ...........................................
        -6.08731210e-02,  4.24617827e-02,  2.90670991e-02,
         1.87119041e-02,  5.67540973e-02,  4.65381369e-02,
         3.42479758e-02,  9.88676678e-03, -1.62497200e-02,
         1.46159781e-02, -6.39008060e-02]], dtype=float32)
>> type(DataFrame_example["Vector"][0])
Out: numpy.ndarray

但是在保存为csv并重新读取后，相同的块输出变成了：

>> DataFrame_example["Vector"][0]
'[[-4.51561287e-02 -5.02060959e-03  1.01038935e-02 -3.24810972e-03\n   8.50208327e-02 -3.12430300e-02 -3.06447037e-02 -6.82420060e-02\n   4.08798642e-02  2.49120360e-03 -6.40684515e-02  
 ............................................................................................
-5.22072986e-02\n   6.16791770e-02 -8.88353493e-03  1.65628344e-02 -5.95084354e-02\n  -8.45786110e-02 -8.65871832e-03  3.98499370e-02 -3.41838486e-02\n  -2.02250257e-02  5.18149361e-02 -5.80132604e-02  7.66506651e-03\n  -5.49656115e-02 -6.08731210e-02  4.24617827e-02  2.90670991e-02\n   1.87119041e-02  5.67540973e-02  4.65381369e-02  3.42479758e-02\n   9.88676678e-03 -1.62497200e-02  1.46159781e-02 -6.39008060e-02]]'

如何保留格式，任何帮助将不胜感激。

我以csv格式保存DF：

compression_opts = dict(method='zip',
                        archive_name=save_name+'.csv')
DataFrame_example.to_csv(save_name+'.zip', index=False,
          compression=compression_opts)

我用以下方式读取它：

DataFrame_example = read_csv("example.csv")

我尝试使用delimiter=","或sep=","也尝试了。

英文:

I have an DataFrame which is including an "ID" column and "Vector"(which includes (1,500) sized arrays) column. I have to save the DF as csv. When I convert the saved csv to DF again; the array becomes string and I could not use it with the functions anymore.

For example before saving the DF vector column is like:

&gt;&gt;DataFrame_example[&quot;Vector&quot;][0]
Out:
array([[-4.51561287e-02, -5.02060959e-03,  1.01038935e-02,
        -3.24810972e-03,  8.50208327e-02, -3.12430300e-02,
        -3.06447037e-02, -6.82420060e-02,  4.08798642e-02
             ...........................................
        -6.08731210e-02,  4.24617827e-02,  2.90670991e-02,
         1.87119041e-02,  5.67540973e-02,  4.65381369e-02,
         3.42479758e-02,  9.88676678e-03, -1.62497200e-02,
         1.46159781e-02, -6.39008060e-02]], dtype=float32)
&gt;&gt;type(DataFrame_example[&quot;Vector&quot;][0])
Out: numpy.ndarray

But after saving as csv and read it again same block output becomes;

&gt;&gt;DataFrame_example[&quot;Vector&quot;][0]
&#39;[[-4.51561287e-02 -5.02060959e-03  1.01038935e-02 -3.24810972e-03\n   8.50208327e-02 -3.12430300e-02 -3.06447037e-02 -6.82420060e-02\n   4.08798642e-02  2.49120360e-03 -6.40684515e-02  
 ............................................................................................
-5.22072986e-02\n   6.16791770e-02 -8.88353493e-03  1.65628344e-02 -5.95084354e-02\n  -8.45786110e-02 -8.65871832e-03  3.98499370e-02 -3.41838486e-02\n  -2.02250257e-02  5.18149361e-02 -5.80132604e-02  7.66506651e-03\n  -5.49656115e-02 -6.08731210e-02  4.24617827e-02  2.90670991e-02\n   1.87119041e-02  5.67540973e-02  4.65381369e-02  3.42479758e-02\n   9.88676678e-03 -1.62497200e-02  1.46159781e-02 -6.39008060e-02]]&#39;

How can I keep the format, any help would appreciated.

I am saving the DF in csv format;

compression_opts = dict(method=&#39;zip&#39;,
                        archive_name=save_name+&#39;.csv&#39;)
DataFrame_example.to_csv(save_name+&#39;.zip&#39;, index=False,
          compression=compression_opts)

I am reading it with;

DataFrame_example=read_csv(&quot;example.csv&quot;)

I have triedreading it with deliiter="," or sep="," also.

答案1

得分: 0

你需要从字符串中去掉括号，然后按空格分割结果。这将给你一个字符串数组，你可以将其转换为浮点数。

bracket_strip = str.maketrans('', '', '[]')
new_column = []
for vector in DataFrame_example.vector:
    print(vector)
    vector = vector.translate(bracket_strip).split(' ')
    new_vector = []
    for val in vector:
        new_vector.append(float(val))
    new_column.append(new_vector)
DataFrame_example.vector = new_column

类似这样的代码应该能完成任务。我只是将变量名更改为您的变量名。

英文:

You need to strip the brackets from that string, and split the result by spaces. This will give you an array of strings that you can cast to floats.

bracket_strip = str.maketrans(&quot;&quot;,&quot;&quot;,&quot;[]&quot;)
new_column = []
for vector in DataFrame_example.vector:
    print(vector)
    vector = vector.translate(bracket_strip).split(&quot; &quot;)
    new_vector = []
    for val in vector: 
        new_vector.append(float(val))
    new_column.append(new_vector)
DataFrame_example.vector = new_column

Something like that should do the trick. I just changed the variable names to yours.

答案2

得分: 0

你可以使用 pandas.DataFrame.to_pickle 代替：

df = pd.DataFrame({'a': [1, 2, 3, 4], 'b': [np.array([5, 6, 7, 8]), np.array([5, 6, 7, 8]), np.array([5, 6, 7, 8]), np.array([5, 6, 7, 8])]})
#   a             b
#0  1  [5, 6, 7, 8]
#1  2  [5, 6, 7, 8]
#2  3  [5, 6, 7, 8]
#3  4  [5, 6, 7, 8]
type(df.b[0])
#&lt;class 'numpy.ndarray'&gt;
df.to_pickle("out.txt")
new = pd.read_pickle("out.txt")
type(new.b[0])
#&lt;class 'numpy.ndarray'&gt;

英文:

You can use pandas.DataFrame.to_pickle instead:

df = pd.DataFrame({&#39;a&#39;: [1,2,3,4], &#39;b&#39; : [np.array([5,6,7,8]), np.array([5,6,7,8]),np.array([5,6,7,8]),np.array([5,6,7,8])]})
#   a             b
#0  1  [5, 6, 7, 8]
#1  2  [5, 6, 7, 8]
#2  3  [5, 6, 7, 8]
#3  4  [5, 6, 7, 8]
type(df.b[0])
#&lt;class &#39;numpy.ndarray&#39;&gt;
df.to_pickle(&quot;out.txt&quot;)
new = pd.read_pickle(&quot;out.txt&quot;)
type(new.b[0])
#&lt;class &#39;numpy.ndarray&#39;&gt;

答案3

得分: 0

如果您想使用.csv文件，那么我建议您在读取时使用dtype参数将其转换为相应的数据类型：

dtypeType名称或列 -&gt;类型的字典，可选
数据或列的数据类型。例如，{'a': np.float64, 'b': np.int32, 'c': 'Int64'}。使用str或object以及适当的na_values设置，以保留并不解释dtype。如果指定了转换器，它们将代替dtype转换应用。

否则，您可以使用另一种类型的文件保存您的数据（parquet、pickel...）。这可以使用pandas实现：pandas to_parquet。

df.to_parquet('df.parquet.gzip',
              compression='gzip')  
pd.read_parquet('df.parquet.gzip')

后者通常在性能方面更好！

英文:

If you want to use a .csv file, then I would suggest that you convert to the corresponding datatype on read using the dtype argument :

> dtypeType name or dict of column -> type, optional
Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’} Use str or object together with suitable na_values settings to preserve and not interpret dtype. If converters are specified, they will be applied INSTEAD of dtype conversion.

Otherwise, you should save your data using another type of file (parquet, pickel...). This can be achieved using pandas : pandas to_parquet.

df.to_parquet(&#39;df.parquet.gzip&#39;,
              compression=&#39;gzip&#39;)  
pd.read_parquet(&#39;df.parquet.gzip&#39;)

The latter is often a better option performance-wise!

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何防止将numpy ndarray列转换为字符串，当将Pandas DataFrame保存为csv时？

问题

答案1

答案2

答案3

如何更改Plotly图的y轴点分隔符？

如何从网页中抓取所有链接的链接并向下滚动

Python列表中的数据框之间的分段线性插值

执行一个 .class 文件（Java）从 .py 脚本中如何实现？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论