分享
三行代码  ›  专栏  ›  技术社区  ›  Nicholas Dela Torre

帮助我用Python连接这些数据帧

  •  0
  • Nicholas Dela Torre  · 技术社区  · 5 天前

    这是我的设想。假设我有这两个数据集-

    dic = {'firstname':['John','John','John','John','John','Susan','Susan',
                        'Susan','Susan','Susan','Mike','Mike','Mike','Mike',
                        'Mike'],
           'lastname':['Smith','Smith','Smith','Smith','Smith','Wilson',
                       'Wilson','Wilson','Wilson','Wilson','Jones','Jones',
                       'Jones','Jones','Jones'],
           'company':['KFC','BK','KFC','KFC','KFC','BK','BK','WND','WND',
                      'WND','TB','CHP','TB','CHP','TB'],
           'paid':[200,300,250,100,900,650,430,218,946,789,305,750,140,860,310]}
    df1 = pd.DataFrame(dic)
    print(df1)
    

    dic = {'firstname':['John','John','Susan','Susan','Mike','Mike'],
           'lastname':['Smith','Smith','Wilson','Wilson','Jones','Jones'],
           'company':['KFC','BK','BK','WND','TB','CHP'],
           'paid':[1450,300,1080,1953,755,1610]}
    df2 = pd.DataFrame(dic)
    print(df2)
    

    输出1为-

       firstname lastname company  paid
    0       John    Smith     KFC   200
    1       John    Smith      BK   300
    2       John    Smith     KFC   250
    3       John    Smith     KFC   100
    4       John    Smith     KFC   900
    5      Susan   Wilson      BK   650
    6      Susan   Wilson      BK   430
    7      Susan   Wilson     WND   218
    8      Susan   Wilson     WND   946
    9      Susan   Wilson     WND   789
    10      Mike    Jones      TB   305
    11      Mike    Jones     CHP   750
    12      Mike    Jones      TB   140
    13      Mike    Jones     CHP   860
    14      Mike    Jones      TB   310
    

    输出2是-

      firstname lastname company  paid
    0      John    Smith     KFC  1450
    1      John    Smith      BK   300
    2     Susan   Wilson      BK  1080
    3     Susan   Wilson     WND  1953
    4      Mike    Jones      TB   755
    5      Mike    Jones     CHP  1610
    

    我想做的是将df2 paid列添加到df1的详细视图的每个部分

    所以我的理想输出是-

       firstname lastname company  paid sum_paid
    0       John    Smith     KFC   200     1450
    1       John    Smith      BK   300      300
    2       John    Smith     KFC   250     1450
    3       John    Smith     KFC   100     1450
    4       John    Smith     KFC   900     1450
    5      Susan   Wilson      BK   650     1080
    6      Susan   Wilson      BK   430     1080
    7      Susan   Wilson     WND   218     1953
    8      Susan   Wilson     WND   946     1953
    9      Susan   Wilson     WND   789     1953
    10      Mike    Jones      TB   305      755
    11      Mike    Jones     CHP   750     1610
    12      Mike    Jones      TB   140      755
    13      Mike    Jones     CHP   860     1610
    14      Mike    Jones      TB   310      755
    
    1 回复  |  直到 5 天前
        1
  •  2
  •   NYC Coder    5 天前

    就这样做:

    df = df1.merge(df2, on=['firstname', 'lastname', 'company']).rename(columns={'paid_y': 'sum_paid', 'paid_x': 'paid'})
    print(df)
    
       firstname lastname company    paid  sum_paid
    0       John    Smith     KFC     200    1450
    1       John    Smith     KFC     250    1450
    2       John    Smith     KFC     100    1450
    3       John    Smith     KFC     900    1450
    4       John    Smith      BK     300     300
    5      Susan   Wilson      BK     650    1080
    6      Susan   Wilson      BK     430    1080
    7      Susan   Wilson     WND     218    1953
    8      Susan   Wilson     WND     946    1953
    9      Susan   Wilson     WND     789    1953
    10      Mike    Jones      TB     305     755
    11      Mike    Jones      TB     140     755
    12      Mike    Jones      TB     310     755
    13      Mike    Jones     CHP     750    1610
    14      Mike    Jones     CHP     860    1610