如何确保在pandas数据框中考虑到每个美国州?

huangapple go评论70阅读模式
英文:

How can I ensure every us state is accounted for in a pandas dataframe?

问题

我非常新手 pandas

我有一个包含43个州以及每个州中事件发生次数的 CSV 文件。

STATE,Count
AL,1
AK,4
AZ,7

我的 CSV 文件并不包含所有的州,如何确保每个州都有对应的计数?如果原始数据中没有该州的数据,应该将其 Count 设置为 0。

到目前为止,我已经尝试了以下方法,但它给我返回了 Count_xCount_y,并且仍然没有所有50个州的数据。

# 原始 CSV 文件只包含了43个州
states = pd.read_csv("states.csv")

# 创建一个新的数据框,包含所有的州,计数设置为0
all_states = [["AL", 0], ["AK", 0], ["AZ", 0], ["AR", 0], ["CA", 0], ["CO", 0], ["CT", 0], ["DE", 0], ["FL", 0], ["GA", 0], ["HI", 0], ["ID", 0], ["IL", 0], ["IN", 0], ["IA", 0], ["KS", 0], ["KY", 0], ["LA", 0], ["ME", 0], ["MD", 0], ["MA", 0], ["MI", 0], ["MN", 0], ["MS", 0], ["MO", 0], ["MT", 0], ["NE", 0], ["NV", 0], ["NH", 0], ["NJ", 0], ["NM", 0], ["NY", 0], ["NC", 0], ["ND", 0], ["OH", 0], ["OK", 0], ["OR", 0], ["PA", 0], ["RI", 0], ["SC", 0], ["SD", 0], ["TN", 0], ["TX", 0], ["UT", 0], ["VT", 0], ["VA", 0], ["WA", 0], ["WV", 0], ["WI", 0], ["WY", 0]]
all_states = pd.DataFrame(all_states, columns=["STATE", "Count"])

# 合并这两个数据框
new_df = states.merge(all_states, on="STATE")

# 仍然只有43个州
new_df

请注意,AK 仍然丢失(还有一些其他州)。

英文:

Im very new to pandas

I have a CSV that contains 43 states and a count of how many times something has happened in that state.

STATE,Count
AL,1
AK,4
AZ,7

My CSV does not contain every state, how can I ensure that every state is accounted for? If its not in the original dataframe it should have a Count of 0.

Heres what I have so far, but its giving me Count_x and Count_y and its still not got all 50 states.

# Original CSV only has 43 states
states = pd.read_csv("states.csv")

# Create a new dataframe with all states and count set to 0
all_states = [["AL", 0], ["AK", 0], ["AZ", 0], ["AR", 0], ["CA", 0], ["CO", 0], ["CT", 0], ["DE", 0], ["FL", 0], ["GA", 0], ["HI", 0], ["ID", 0], ["IL", 0], ["IN", 0], ["IA", 0], ["KS", 0], ["KY", 0], ["LA", 0], ["ME", 0], ["MD", 0], ["MA", 0], ["MI", 0], ["MN", 0], ["MS", 0], ["MO", 0], ["MT", 0], ["NE", 0], ["NV", 0], ["NH", 0], ["NJ", 0], ["NM", 0], ["NY", 0], ["NC", 0], ["ND", 0], ["OH", 0], ["OK", 0], ["OR", 0], ["PA", 0], ["RI", 0], ["SC", 0], ["SD", 0], ["TN", 0], ["TX", 0], ["UT", 0], ["VT", 0], ["VA", 0], ["WA", 0], ["WV", 0], ["WI", 0], ["WY", 0]]
all_states = pd.DataFrame(all_states, columns=["STATE", "Count"])

# Merge the two Dataframes
new_df = states.merge(all_states, on="STATE")

# Still only has 43 states
new_df

Notice AK is still missing (and a few other states)
如何确保在pandas数据框中考虑到每个美国州?

答案1

得分: 2

你正在错误地合并数据,你需要将statesall_states合并,而不是相反。使用left 合并,然后使用fillna填充为0

new_df = all_states.merge(states, on='STATE', how='left').fillna(0)

请注意,你可以考虑删除Count_x列,因为它将全部为0;你可以使用drop来实现:

new_df.drop('Count_x', axis=1, inplace=True)
英文:

You are merging in the wrong direction; you need to merge states with all_states, not the other way around. Use a left merge and then fillna with 0:

new_df = all_states.merge(states, on='STATE', how='left').fillna(0)

Note you may decide to get rid of the Count_x column as it will be all 0; you can do that using drop:

new_df.drop('Count_x', axis=1, inplace=True)

答案2

得分: 1

尝试重新索引:

df.set_index('STATE').reindex(list(zip(*all_states))[0], fill_value=0).reset_index()
英文:

You could try reindexing:

df.set_index('STATE').reindex(list(zip(*all_states))[0],fill_value = 0).reset_index()

huangapple
  • 本文由 发表于 2023年3月12日 11:37:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/75710928.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定