英文:
Merge two or more tables checking the unique value using python
问题
以下是翻译好的部分:
我是Python的初学者。
我必须检查2或3个表格,看看是否有一些数据缺失。
这些数据是`pandas.core.frame.DataFrame`。
我有这些表格:
start end val accn fy fp form filed frame
0 2016-01-01 2016-12-31 90272000000 0001652044-19-000004 2018 FY 10-K 2019-02-05 CY2016
6 2017-01-01 2017-12-31 110855000000 0001652044-19-000004 2018 FY 10-K 2019-02-05 NaN
7 2017-01-01 2017-12-31 110855000000 0001652044-20-000008 2019 FY 10-K 2020-02-04 CY2017
18 2018-01-01 2018-12-31 136819000000 0001652044-19-000004 2018 FY 10-K 2019-02-05 NaN
19 2018-01-01 2018-12-31 136819000000 0001652044-20-000008 2019 FY 10-K 2020-02-04 CY2018
30 2019-01-01 2019-12-31 161857000000 0001652044-20-000008 2019 FY 10-K 2020-02-04 CY2019
41 2020-01-01 2020-12-31 182527000000 0001652044-23-000016 2022 FY 10-K 2023-02-03 CY2020
52 2021-01-01 2021-12-31 257637000000 0001652044-23-000016 2022 FY 10-K 2023-02-03 CY2021
61 2022-01-01 2022-12-31 282836000000 0001652044-23-000016 2022 FY 10-K 2023-02-03 CY2022
和
start end val accn fy fp form filed frame
0 2006-12-31 2007-12-29 39474000000 0001193125-10-036385 2009 FY 10-K 2010-02-22 CY2007
4 2007-12-30 2008-12-27 43251000000 0001193125-11-040427 2010 FY 10-K 2011-02-18 NaN
5 2007-12-30 2008-12-27 43251000000 0001193125-10-036385 2009 FY 10-K 2010-02-22 NaN
13 2008-12-28 2009-12-26 43232000000 0001193125-12-081822 2011 FY 10-K 2012-02-27 CY2009
15 2008-12-28 2009-12-26 43232000000 0001193125-11-040427 2010 FY 10-K 2011-02-18 NaN
16 2008-12-28 2009-12-26 43232000000 0001193125-10-036385 2009 FY 10-K 2010-02-22 NaN
27 2009-12-27 2010-12-25 57838000000 0001445305-13-000278 2012 FY 10-K 2013-02-21 CY2010
28 2009-12-27 2010-12-25 57838000000 0001193125-12-081822 2011 FY 10-K 2012-02-27 NaN
30 2009-12-27 2010-12-25 57838000000 0001193125-11-040427 2010 FY 10-K 2011-02-18 NaN
41 2010-12-26 2011-12-31 66504000000 0001445305-13-000278 2012 FY 10-K 2013-02-21 NaN
42 2010-12-26 2011-12-31 66504000000 0001193125-12-081822 2011 FY 10-K 2012-02-27 NaN
43 2010-12-26 2011-12-31 66504000000 0000077476-14-000007 2013 FY 10-K 2014-02-14 CY2011
54 2012-01-01 2012-12-29 65492000000 0001445305-13-000278 2012 FY 10-K 2013-02-21 NaN
56 2012-01-01 2012-12-29 65492000000 0000077476-15-000012 2014 FY 10-K 2015-02-12 NaN
57 2012-01-01 2012-12-29 65492000000 0000077476-14-000007 2013 FY 10-K 2014-02-14 NaN
68 2012-12-30 2013-12-28 66415000000 0000077476-16-000066 2015 FY 10-K 2016-02-11 CY2013
70 2012-12-30 2013-12-28 66415000000 0000077476-15-000012 2014 FY 10-K 2015-02-12 NaN
71 2012-12-30 2013-12-28 66415000000 0000077476-14-000007 2013 FY 10-K 2014-02-14 NaN
82 2013-12-29 2014-12-27 66683000000 0000077476-17-000010 2016 FY 10-K 2017-02-15 CY2014
83 2013-12-29 2014
<details>
<summary>英文:</summary>
I am a beginner with Python.
I have to check 2 or 3 tables and see if some data is missing.
The data are `pandas.core.frame.DataFrame`.
I have these tables:
start end val accn fy fp form filed frame
0 2016-01-01 2016-12-31 90272000000 0001652044-19-000004 2018 FY 10-K 2019-02-05 CY2016
6 2017-01-01 2017-12-31 110855000000 0001652044-19-000004 2018 FY 10-K 2019-02-05 NaN
7 2017-01-01 2017-12-31 110855000000 0001652044-20-000008 2019 FY 10-K 2020-02-04 CY2017
18 2018-01-01 2018-12-31 136819000000 0001652044-19-000004 2018 FY 10-K 2019-02-05 NaN
19 2018-01-01 2018-12-31 136819000000 0001652044-20-000008 2019 FY 10-K 2020-02-04 CY2018
30 2019-01-01 2019-12-31 161857000000 0001652044-20-000008 2019 FY 10-K 2020-02-04 CY2019
41 2020-01-01 2020-12-31 182527000000 0001652044-23-000016 2022 FY 10-K 2023-02-03 CY2020
52 2021-01-01 2021-12-31 257637000000 0001652044-23-000016 2022 FY 10-K 2023-02-03 CY2021
61 2022-01-01 2022-12-31 282836000000 0001652044-23-000016 2022 FY 10-K 2023-02-03 CY2022
and
start end val accn fy fp form filed frame
0 2006-12-31 2007-12-29 39474000000 0001193125-10-036385 2009 FY 10-K 2010-02-22 CY2007
4 2007-12-30 2008-12-27 43251000000 0001193125-11-040427 2010 FY 10-K 2011-02-18 NaN
5 2007-12-30 2008-12-27 43251000000 0001193125-10-036385 2009 FY 10-K 2010-02-22 NaN
13 2008-12-28 2009-12-26 43232000000 0001193125-12-081822 2011 FY 10-K 2012-02-27 CY2009
15 2008-12-28 2009-12-26 43232000000 0001193125-11-040427 2010 FY 10-K 2011-02-18 NaN
16 2008-12-28 2009-12-26 43232000000 0001193125-10-036385 2009 FY 10-K 2010-02-22 NaN
27 2009-12-27 2010-12-25 57838000000 0001445305-13-000278 2012 FY 10-K 2013-02-21 CY2010
28 2009-12-27 2010-12-25 57838000000 0001193125-12-081822 2011 FY 10-K 2012-02-27 NaN
30 2009-12-27 2010-12-25 57838000000 0001193125-11-040427 2010 FY 10-K 2011-02-18 NaN
41 2010-12-26 2011-12-31 66504000000 0001445305-13-000278 2012 FY 10-K 2013-02-21 NaN
42 2010-12-26 2011-12-31 66504000000 0001193125-12-081822 2011 FY 10-K 2012-02-27 NaN
43 2010-12-26 2011-12-31 66504000000 0000077476-14-000007 2013 FY 10-K 2014-02-14 CY2011
54 2012-01-01 2012-12-29 65492000000 0001445305-13-000278 2012 FY 10-K 2013-02-21 NaN
56 2012-01-01 2012-12-29 65492000000 0000077476-15-000012 2014 FY 10-K 2015-02-12 NaN
57 2012-01-01 2012-12-29 65492000000 0000077476-14-000007 2013 FY 10-K 2014-02-14 NaN
68 2012-12-30 2013-12-28 66415000000 0000077476-16-000066 2015 FY 10-K 2016-02-11 CY2013
70 2012-12-30 2013-12-28 66415000000 0000077476-15-000012 2014 FY 10-K 2015-02-12 NaN
71 2012-12-30 2013-12-28 66415000000 0000077476-14-000007 2013 FY 10-K 2014-02-14 NaN
82 2013-12-29 2014-12-27 66683000000 0000077476-17-000010 2016 FY 10-K 2017-02-15 CY2014
83 2013-12-29 2014-12-27 66683000000 0000077476-16-000066 2015 FY 10-K 2016-02-11 NaN
85 2013-12-29 2014-12-27 66683000000 0000077476-15-000012 2014 FY 10-K 2015-02-12 NaN
96 2014-12-28 2015-12-26 63056000000 0000077476-18-000012 2017 FY 10-K 2018-02-13 CY2015
97 2014-12-28 2015-12-26 63056000000 0000077476-17-000010 2016 FY 10-K 2017-02-15 NaN
98 2014-12-28 2015-12-26 63056000000 0000077476-16-000066 2015 FY 10-K 2016-02-11 NaN
109 2015-12-27 2016-12-31 62799000000 0000077476-18-000012 2017 FY 10-K 2018-02-13 CY2016
110 2015-12-27 2016-12-31 62799000000 0000077476-17-000010 2016 FY 10-K 2017-02-15 NaN
117 2017-01-01 2017-12-30 63525000000 0000077476-18-000012 2017 FY 10-K 2018-02-13 CY2017
So the target is to check and merge the data from all years.
So basically I have to check the 2 tables and create a new table with all CY Years.
For example the first table start from 2016, but the second from 2007. So I need to create a table with `val` and `frame` merged and unique values. The table basically should have all years found from an API.
Example result:
39474000000 2007
43232000000 2009
57838000000 2010
...
110855000000 2017
etc etc
There is a thing I have to check if both tables contain data for `the same year`(example `2017`), then I have to check which `val` is bigger and insert this in the new table.
How can I do an iter for loo year by year and check the val value between the two tables?
</details>
# 答案1
**得分**: 1
如果您想忽略`frame`列中包含`NaN`的行,您可以在应用`max`之前,简单地连接您过滤后的数据框并按`frame`进行分组:
```python
df1 = df1.dropna(subset='frame')[['val', 'frame']]
df2 = df2.dropna(subset='frame')[['val', 'frame']]
pd.concat([
df1[~df1['frame'].str.contains('Q')],
df2[~df2['frame'].str.contains('Q')]
]).groupby('frame').max()
输出:
val
frame
CY2007 39474000000
CY2009 43232000000
CY2010 57838000000
CY2011 66504000000
CY2013 66415000000
CY2014 66683000000
CY2015 63056000000
CY2016 90272000000
CY2017 110855000000
CY2018 136819000000
CY2019 161857000000
CY2020 182527000000
CY2021 257637000000
CY2022 282836000000
编辑:不包括frame
单元格中包含'Q'
的行。
英文:
If you want to ignore rows where frame
is NaN
, you can simply concat your filtered dfs and group by frame
before applying max
:
df1 = df1.dropna(subset='frame')[['val', 'frame']]
df2 = df2.dropna(subset='frame')[['val', 'frame']]
pd.concat([
df1[~df1['frame'].str.contains('Q')],
df2[~df2['frame'].str.contains('Q')]
]).groupby('frame').max()
Output:
val
frame
CY2007 39474000000
CY2009 43232000000
CY2010 57838000000
CY2011 66504000000
CY2013 66415000000
CY2014 66683000000
CY2015 63056000000
CY2016 90272000000
CY2017 110855000000
CY2018 136819000000
CY2019 161857000000
CY2020 182527000000
CY2021 257637000000
CY2022 282836000000
Edit: excluding frame
cells containing 'Q'
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论