2023年4月11日 00:45:39go评论74阅读模式

英文:

Polars read_excel not equal to Pandas read_excel for columns with "mixed" types

问题

I'm trying to read some excel data via Polars.read_excel(), and the data is not identical to the Pandas.read_excel() approach for columns with mixed data. Here's an example to illustrate:

# create sample data, save to excel. 
test = pd.DataFrame(
    {
    'nums':  [1, 2, 3],
    'mixed': [1, 4, '6A'],
    'factor': ['A', 'B', 'C']
    }
)
test.to_excel('test.xlsx', index = False)

# read data using Pandas and Polars. Convert polars version to pandas.
test_pd = pd.read_excel('test.xlsx', engine='openpyxl')

test_pl = pl.read_excel('test.xlsx')
test_pl = test_pl.to_pandas()

# compare the two
print(test_pd)
print(test_pl)
print(test_pd == test_pl)

print(test_pd) and print(test_pl), suggest the data is identical. However, print(test_pd == test_pl) returns the following:

   nums  mixed  factor
0  True  False    True
1  True  False    True
2  True   True    True

Is there something causing the data to not be identical? And is this a Polars (or Arrow) limitation when dealing with object variables? I want the pl.read_excel() / conversion to pandas approach to ultimately yield an identical DataFrame to pd.read_excel().

Thanks!

英文:

I'm trying to read some excel data via Polars.read_excel(), and the data is not identical to the Pandas.read_excel() approach for columns with mixed data.

Here's an example to illustrate:

# create sample data, save to excel. 
test = pd.DataFrame(
    {
    &#39;nums&#39;:  [1, 2, 3],
    &#39;mixed&#39;: [1, 4, &#39;6A&#39;],
    &#39;factor&#39;: [&#39;A&#39;, &#39;B&#39;, &#39;C&#39;]
    }
)
test.to_excel(&#39;test.xlsx&#39;, index = False)

# read data using Pandas and Polars. Convert polars version to pandas.
test_pd = pd.read_excel(&#39;test.xlsx&#39;, engine=&#39;openpyxl&#39;)

test_pl = pl.read_excel(&#39;test.xlsx&#39;)
test_pl = test_pl.to_pandas()

# compare the two
print(test_pd)
print(test_pl)
print(test_pd == test_pl)

print(test_pd) and print(test_pl), suggest the data is identical. However, print(test_pd == test_pl) returns the following:

   nums  mixed  factor
0  True  False    True
1  True  False    True
2  True   True    True

Thanks!

答案1

得分: 1

somehow polars made some of your numbers to strings. Look here:

test_pl.iloc[0,1]
'1'

while pandas made integers, where it is possible. The same cell in pandas:

test_pd.iloc[0,1]
1

If you enforce typecast to both tables all cells are equal:

test_pd.astype('string') == test_pl.astype('string')

  nums  mixed  factor
0  True   True    True
1  True   True    True
2  True   True    True

英文:

somehow polars made some of your numbers to strings. Look here:

test_pl.iloc[0,1]
&#39;1&#39;

while pandas made integers, where it is possible. The same cell in pandas:

test_pd.iloc[0,1]
1

If you enforce typecast to both tables all cells are equal:

test_pd.astype(&#39;string&#39;) == test_pl.astype(&#39;string&#39;)

  nums  mixed  factor
0  True   True    True
1  True   True    True
2  True   True    True

答案2

得分: 1

Polars 和 Arrow 依赖于严格的数据类型，因此，从根本上来说，是的，这是一个限制。您永远无法拥有一个有时是 Utf8 有时是 Floatxx 的列。

另一方面，Pandas 乐于拥有混合数据类型的列，因为它基本上只是一个 Python 列表。

英文:

Polars and arrow rely on strict data types so ultimately, yes, it's a limitation. You can never have a column that is sometimes Utf8 and sometimes Floatxx.

Pandas, on the other hand, is happy to have a column of mixed data types because it's basically just a python list.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Polars read_excel 不等于 Pandas read_excel 对于具有 “混合” 类型的列。

问题

答案1

答案2

无法导入langchain.agents.load_tools

水平条形图，每个索引框中有子图的图表。

如何使用GoLang通过HTTP的”Host”头部来定位特定的应用服务器。

创建动态类属性

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论