2023年2月16日 04:06:36go评论96阅读模式

英文:

Detecting Excel column data types in Python Pandas

问题

我是你的中文翻译，以下是翻译好的内容：

新手学习Python和Pandas。我尝试从S3中读取一个Excel文件（使用boto3），并读取标题（电子表格的第一行），并确定每个标题的数据类型，_如果可能的话_。如果可能的话，我需要一个键-值对的映射，其中每个键是标题名称，值是它的数据类型。例如，如果我从S3获取的文件中包含以下数据：

Date,Name,Balance
02/01/2022,Jerry Jingleheimer,45.07
02/14/2022,Jane Jingleheimer,102.29


那么我将寻找一个如下的键-值对映射：
- 键 1: "Date"，值 1: "datetime"（或者是适当的日期类型）
- 键 2: "Name"，值 2: "string"（或者是适当的字符串类型）
- 键 3: "Balance"，值 3: "numeric"（或者是适当的数值类型）
到目前为止，我有以下代码：
```python
s3Client = Res.resource('s3')
obj = s3Client.get_object(Bucket="some-bucket", Key="some-key")
file_headers = pd.read_excel(io.BytesIO(obj['Body'].read()), engine="openpyxl").columns.tolist()

我只是不确定如何提取Pandas检测到的数据类型或如何生成这个映射。

请问有人能指点我正确的方向吗？


<details>
<summary>英文:</summary>
New to Python and Pandas here. I am trying to read an Excel file off of S3 (using boto3) and read the headers (first row of the spreadsheet) and determine what data type each header is, _if this is possible to do_. If it is, I need a map of key-value pairs where each key is the header name and value is its data type. So for example if the file I fetch from S3 has the following data in it:

Date,Name,Balance
02/01/2022,Jerry Jingleheimer,45.07
02/14/2022,Jane Jingleheimer,102.29


Then I would be looking for a map of KV pairs like so:
- Key 1: &quot;Date&quot;, Value 1: &quot;datetime&quot; (or whatever is the appropriate date type)
- Key 2: &quot;Name&quot;, Value 2: &quot;string&quot; (or whatever is the appropriate date type)
- Key 3: &quot;Balance&quot;, Value 3: &quot;numeric&quot; (or whatever is the appropriate date type)
So far I have:

s3Client = Res.resource('s3')
obj = s3Client.get_object(Bucket="some-bucket", Key="some-key")
file_headers = pd.read_excel(io.BytesIO(obj['Body'].read()), engine="openpyxl").columns.tolist()


I&#39;m just not sure about how to go about extracting the data types that Pandas has detected or how to generate the map.
Can anyone point me in the right direction please?
</details>
# 答案1
**得分**: 1
IIUC，您可以使用`dtypes`：
```python
>>> df.dtypes.to_dict()
{'Date': dtype('<M8[ns]'), 'Name': dtype('O'), 'Balance': dtype('float64')}
>>> {k: v.name for k, v in df.dtypes.to_dict().items()}
{'Date': 'datetime64[ns]', 'Name': 'object', 'Balance': 'float64'}

英文:

IIUC, you can use dtypes:

&gt;&gt;&gt; df.dtypes.to_dict()
{&#39;Date&#39;: dtype(&#39;&lt;M8[ns]&#39;), &#39;Name&#39;: dtype(&#39;O&#39;), &#39;Balance&#39;: dtype(&#39;float64&#39;)}
&gt;&gt;&gt; {k: v.name for k, v in df.dtypes.to_dict().items()}
{&#39;Date&#39;: &#39;datetime64[ns]&#39;, &#39;Name&#39;: &#39;object&#39;, &#39;Balance&#39;: &#39;float64&#39;}

答案2

得分: 0

我建议你查看这个pandas教程。

pandas.read_excel('my_file.xlsx').dtypes应该给你列的数据类型。

英文:

I suggest you to check this pandas tutorial.

The pandas.read_excel('my_file.xlsx').dtypes should give you the types of the columns.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Python Pandas中检测Excel列的数据类型

问题

答案2

Converting aiohttp script to asyncio + requests (aiohttp not working on ubuntu while asyncio + requests works)

无法从pandas中获取字符串，而是获取到了数据类型。

Pytest覆盖夹具的参数默认值

使用小数范围

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。