Flameater

Learn computer programming, for beginners and non-programmers


Python DataFrame Operations

This example shows how to summarise, sort and filter data in Python:

#%%

#import applicable libraries
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt 

#this function makes printing nicer
def niceprint(title, msg):
    print(title)
    print(msg)
    print()

#define the basic set of name data
names = ["Aiman","Abu","Awosh","Rhiana","Rhania"]

#generate a lot of purchase records using the names data
np.random.seed(1000)
who=[] #1st list
purchase = [] #2nd list

#ceate members of 1st and 2nd lists randomly
for i in range(0,1001):
    nameidx = np.random.randint(0,5)
    amount = np.random.random()*100
    who.append(names[nameidx])
    purchase.append(amount)

#combine 1st and 2nd list
purchasedata = pd.DataFrame(data=list(zip(who,purchase)),columns=["names","purchases"])
niceprint("generated data",purchasedata)

#Now we are going to create a groupby object to group the data
purgb = purchasedata.groupby("names") #groupby names
pursum=purgb.sum() #summarise one for summation
purcount=purgb.count() #summarise one for record count

niceprint("sum",pursum)
niceprint("count",purcount)

#combine the two summations using joints
purtotals = pd.merge(pursum,purcount,how="inner",on="names")
purtotals.columns = ["amount","count"]
niceprint("merged",purtotals)

#produce sorted results

#sort in place
purtotals.sort_values("amount",ascending="true",inplace=True)
niceprint("sorted in place", purtotals)

#produce new sorted dataframes
sortamount=purtotals.sort_values("amount")
niceprint("sorted by amount",sortamount)

sortcount=purtotals.sort_values("count",ascending=False)
niceprint("sorted by count",sortcount)

#filter data for purchases between 30 and 60
filter30 = purchasedata["purchases"] >= 30
filter60 = purchasedata["purchases"] <= 60
purfiltered = purchasedata[filter30 & filter60]

niceprint("filtered data",purfiltered)




#%%

This is the output

Variables

[55]


generated data
names purchases
0 Rhiana 20.552172
1 Aiman 95.028286
2 Rhania 36.080188
3 Aiman 21.233268
4 Aiman 34.423968
5 Rhiana 23.313220
6 Awosh 77.250273
7 Rhania 39.215413
8 Abu 74.353941
9 Rhania 88.533720
10 Awosh 93.114343
11 Awosh 2.898166
12 Rhania 24.071122
13 Rhiana 36.187707
14 Awosh 86.243159
15 Rhania 65.725351
16 Rhania 84.241124
17 Rhiana 88.509294
18 Rhiana 60.165179
19 Awosh 24.462921
20 Rhiana 69.792510
21 Abu 88.312219
22 Rhiana 43.249917
23 Rhiana 45.237551
24 Aiman 46.969065
25 Rhiana 89.133705
26 Awosh 31.787830
27 Rhiana 4.544794
28 Aiman 60.172093
29 Aiman 66.304169
… … …
971 Rhania 1.978523
972 Aiman 16.752706
973 Rhiana 15.880324
974 Awosh 98.510316
975 Awosh 95.255921
976 Rhania 5.013892
977 Rhania 30.891109
978 Rhania 40.412212
979 Awosh 30.134770
980 Abu 28.299123
981 Abu 45.322875
982 Aiman 95.145937
983 Rhiana 91.555349
984 Rhania 89.832245
985 Abu 48.247856
986 Rhania 66.321709
987 Awosh 96.368585
988 Rhiana 73.525881
989 Awosh 32.661999
990 Rhiana 61.458042
991 Aiman 50.781281
992 Awosh 46.699133
993 Rhania 37.440226
994 Aiman 37.716045
995 Rhiana 1.318825
996 Aiman 2.450063
997 Rhiana 23.211527
998 Rhiana 92.106465
999 Aiman 68.334227
1000 Awosh 88.682560

[1001 rows x 2 columns]

sum
purchases
names
Abu 8603.023475
Aiman 9390.908611
Awosh 10273.769212
Rhania 9834.609034
Rhiana 10550.058871

count
purchases
names
Abu 193
Aiman 199
Awosh 200
Rhania 192
Rhiana 217

merged
amount count
names
Abu 8603.023475 193
Aiman 9390.908611 199
Awosh 10273.769212 200
Rhania 9834.609034 192
Rhiana 10550.058871 217

sorted in place
amount count
names
Abu 8603.023475 193
Aiman 9390.908611 199
Rhania 9834.609034 192
Awosh 10273.769212 200
Rhiana 10550.058871 217

sorted by amount
amount count
names
Abu 8603.023475 193
Aiman 9390.908611 199
Rhania 9834.609034 192
Awosh 10273.769212 200
Rhiana 10550.058871 217

sorted by count
amount count
names
Rhiana 10550.058871 217
Awosh 10273.769212 200
Aiman 9390.908611 199
Abu 8603.023475 193
Rhania 9834.609034 192

filtered data
names purchases
2 Rhania 36.080188
4 Aiman 34.423968
7 Rhania 39.215413
13 Rhiana 36.187707
22 Rhiana 43.249917
23 Rhiana 45.237551
24 Aiman 46.969065
26 Awosh 31.787830
34 Awosh 32.229670
36 Abu 47.401302
37 Rhiana 32.187262
41 Rhania 35.071497
46 Awosh 53.751182
49 Rhiana 39.923924
50 Rhiana 43.964812
55 Awosh 54.916138
58 Rhania 55.957500
61 Abu 54.143928
62 Aiman 47.647761
66 Abu 45.811813
69 Aiman 55.874991
70 Rhania 58.021803
74 Rhiana 56.906421
76 Rhiana 40.131755
77 Abu 51.871345
79 Aiman 51.321477
85 Rhania 45.732367
87 Aiman 44.331968
101 Abu 33.323648
102 Rhiana 44.977947
.. … …
917 Awosh 58.353926
921 Abu 56.150716
922 Awosh 42.710243
923 Rhania 56.262235
924 Rhania 51.589357
925 Rhiana 31.243809
926 Aiman 38.704033
930 Rhiana 33.129981
931 Abu 57.535410
939 Abu 46.969104
949 Rhiana 51.311065
951 Rhiana 45.759879
952 Abu 38.453125
956 Abu 35.500427
960 Rhiana 50.629506
961 Awosh 57.945835
965 Rhiana 44.395549
966 Rhania 34.966344
967 Rhania 56.642133
968 Rhiana 45.764473
977 Rhania 30.891109
978 Rhania 40.412212
979 Awosh 30.134770
981 Abu 45.322875
985 Abu 48.247856
989 Awosh 32.661999
991 Aiman 50.781281
992 Awosh 46.699133
993 Rhania 37.440226
994 Aiman 37.716045

[309 rows x 2 columns]

[56]

Type code here and press shift-enter to run



Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.