在Python Pandas中检测Excel列的数据类型

huangapple go评论59阅读模式
英文:

Detecting Excel column data types in Python Pandas

问题

我是你的中文翻译,以下是翻译好的内容:

新手学习Python和Pandas我尝试从S3中读取一个Excel文件使用boto3),并读取标题电子表格的第一行),并确定每个标题的数据类型_如果可能的话_如果可能的话我需要一个键-值对的映射其中每个键是标题名称值是它的数据类型例如如果我从S3获取的文件中包含以下数据

Date,Name,Balance
02/01/2022,Jerry Jingleheimer,45.07
02/14/2022,Jane Jingleheimer,102.29


那么我将寻找一个如下的键-值对映射:
- 键 1: "Date",值 1: "datetime"(或者是适当的日期类型)
- 键 2: "Name",值 2: "string"(或者是适当的字符串类型)
- 键 3: "Balance",值 3: "numeric"(或者是适当的数值类型)

到目前为止,我有以下代码:
```python
s3Client = Res.resource('s3')
obj = s3Client.get_object(Bucket="some-bucket", Key="some-key")
file_headers = pd.read_excel(io.BytesIO(obj['Body'].read()), engine="openpyxl").columns.tolist()

我只是不确定如何提取Pandas检测到的数据类型或如何生成这个映射。

请问有人能指点我正确的方向吗?


<details>
<summary>英文:</summary>

New to Python and Pandas here. I am trying to read an Excel file off of S3 (using boto3) and read the headers (first row of the spreadsheet) and determine what data type each header is, _if this is possible to do_. If it is, I need a map of key-value pairs where each key is the header name and value is its data type. So for example if the file I fetch from S3 has the following data in it:

Date,Name,Balance
02/01/2022,Jerry Jingleheimer,45.07
02/14/2022,Jane Jingleheimer,102.29


Then I would be looking for a map of KV pairs like so:
- Key 1: &quot;Date&quot;, Value 1: &quot;datetime&quot; (or whatever is the appropriate date type)
- Key 2: &quot;Name&quot;, Value 2: &quot;string&quot; (or whatever is the appropriate date type)
- Key 3: &quot;Balance&quot;, Value 3: &quot;numeric&quot; (or whatever is the appropriate date type)

So far I have:

s3Client = Res.resource('s3')
obj = s3Client.get_object(Bucket="some-bucket", Key="some-key")
file_headers = pd.read_excel(io.BytesIO(obj['Body'].read()), engine="openpyxl").columns.tolist()


I&#39;m just not sure about how to go about extracting the data types that Pandas has detected or how to generate the map.

Can anyone point me in the right direction please?

</details>


# 答案1
**得分**: 1

IIUC,您可以使用`dtypes`:

```python
>>> df.dtypes.to_dict()
{'Date': dtype('<M8[ns]'), 'Name': dtype('O'), 'Balance': dtype('float64')}

>>> {k: v.name for k, v in df.dtypes.to_dict().items()}
{'Date': 'datetime64[ns]', 'Name': 'object', 'Balance': 'float64'}
英文:

IIUC, you can use dtypes:

&gt;&gt;&gt; df.dtypes.to_dict()
{&#39;Date&#39;: dtype(&#39;&lt;M8[ns]&#39;), &#39;Name&#39;: dtype(&#39;O&#39;), &#39;Balance&#39;: dtype(&#39;float64&#39;)}

&gt;&gt;&gt; {k: v.name for k, v in df.dtypes.to_dict().items()}
{&#39;Date&#39;: &#39;datetime64[ns]&#39;, &#39;Name&#39;: &#39;object&#39;, &#39;Balance&#39;: &#39;float64&#39;}

答案2

得分: 0

我建议你查看这个pandas教程

pandas.read_excel('my_file.xlsx').dtypes应该给你列的数据类型。

英文:

I suggest you to check this pandas tutorial.

The pandas.read_excel(&#39;my_file.xlsx&#39;).dtypes should give you the types of the columns.

huangapple
  • 本文由 发表于 2023年2月16日 04:06:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/75464957.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定