英文:
Detecting Excel column data types in Python Pandas
问题
我是你的中文翻译,以下是翻译好的内容:
新手学习Python和Pandas。我尝试从S3中读取一个Excel文件(使用boto3),并读取标题(电子表格的第一行),并确定每个标题的数据类型,_如果可能的话_。如果可能的话,我需要一个键-值对的映射,其中每个键是标题名称,值是它的数据类型。例如,如果我从S3获取的文件中包含以下数据:
Date,Name,Balance
02/01/2022,Jerry Jingleheimer,45.07
02/14/2022,Jane Jingleheimer,102.29
那么我将寻找一个如下的键-值对映射:
- 键 1: "Date",值 1: "datetime"(或者是适当的日期类型)
- 键 2: "Name",值 2: "string"(或者是适当的字符串类型)
- 键 3: "Balance",值 3: "numeric"(或者是适当的数值类型)
到目前为止,我有以下代码:
```python
s3Client = Res.resource('s3')
obj = s3Client.get_object(Bucket="some-bucket", Key="some-key")
file_headers = pd.read_excel(io.BytesIO(obj['Body'].read()), engine="openpyxl").columns.tolist()
我只是不确定如何提取Pandas检测到的数据类型或如何生成这个映射。
请问有人能指点我正确的方向吗?
<details>
<summary>英文:</summary>
New to Python and Pandas here. I am trying to read an Excel file off of S3 (using boto3) and read the headers (first row of the spreadsheet) and determine what data type each header is, _if this is possible to do_. If it is, I need a map of key-value pairs where each key is the header name and value is its data type. So for example if the file I fetch from S3 has the following data in it:
Date,Name,Balance
02/01/2022,Jerry Jingleheimer,45.07
02/14/2022,Jane Jingleheimer,102.29
Then I would be looking for a map of KV pairs like so:
- Key 1: "Date", Value 1: "datetime" (or whatever is the appropriate date type)
- Key 2: "Name", Value 2: "string" (or whatever is the appropriate date type)
- Key 3: "Balance", Value 3: "numeric" (or whatever is the appropriate date type)
So far I have:
s3Client = Res.resource('s3')
obj = s3Client.get_object(Bucket="some-bucket", Key="some-key")
file_headers = pd.read_excel(io.BytesIO(obj['Body'].read()), engine="openpyxl").columns.tolist()
I'm just not sure about how to go about extracting the data types that Pandas has detected or how to generate the map.
Can anyone point me in the right direction please?
</details>
# 答案1
**得分**: 1
IIUC,您可以使用`dtypes`:
```python
>>> df.dtypes.to_dict()
{'Date': dtype('<M8[ns]'), 'Name': dtype('O'), 'Balance': dtype('float64')}
>>> {k: v.name for k, v in df.dtypes.to_dict().items()}
{'Date': 'datetime64[ns]', 'Name': 'object', 'Balance': 'float64'}
英文:
IIUC, you can use dtypes
:
>>> df.dtypes.to_dict()
{'Date': dtype('<M8[ns]'), 'Name': dtype('O'), 'Balance': dtype('float64')}
>>> {k: v.name for k, v in df.dtypes.to_dict().items()}
{'Date': 'datetime64[ns]', 'Name': 'object', 'Balance': 'float64'}
答案2
得分: 0
我建议你查看这个pandas教程。
pandas.read_excel('my_file.xlsx').dtypes
应该给你列的数据类型。
英文:
I suggest you to check this pandas tutorial.
The pandas.read_excel('my_file.xlsx').dtypes
should give you the types of the columns.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论