Current issue

Vol.26 No.4

Vol.26 No.4

Volumes

© 1984-2017
British APL Association
All rights reserved.

Archive articles posted online on request: ask the archivist.

archive/22/1

Volume 22, No.1

Statistical Methods
Discovered by the Power of APL

by Peter-Michael Hager (email: hager@dortmund.net)

Through its array functionality APL can treat masses of data, piled up in huge spreadsheets or databases, much more intelligently to generate meaningful perceptions, rather than just doing primitive operations such as column reductions.

Odd Man Out

Spotting the ‘odd man out’, the exceptional figure in a mass of numbers, is a recurring task of management. A simple APL technique finds and highlights anomalous figures in arrays, where human eyes are likely to overlook them. By using these results to add colour, the tables immediately gain visual appeal.

In magazines there are sometimes intelligence test with questions like this:

Which number does not belong?

1001 996 1000 1005 2004 998 997 999 1004 1003

Table 1: Magazine’s level 1 IQ question

Of course, it is easy to see that one number is twice as big as the others. Anyway, for us this indeed is a question APL can solve:

      IQuestion
1001 996 1000 1005 2004 998 997 999 1004 1003
   
      0⍕Normalize IQuestion
1 1 1 1 2 1 1 1 1 1  

All we need for that is a little function which will average the values to 1. The rounding to whole numbers will then do the rest:

      Normalize←{(⍵×⍴⍵)÷+/|⍵}

You do not think it should have been necessary to ask APL for a solution? Then let’s take on the next level in the contest.

Which number does not belong?


    507    364  1196    689    611  1235      78    1001  1014  1079

      78      56    184    106      94    190      12      154    156    166

  2691  1932  6348  3657  3243  6555    414    5313  5382  5727

  3393  2436  8004  4611  4089  8265    522    6699  6786  7221

  2457  1764  5796  3339  5922  5985    378    4851  4914  5229

  2886  2072  6808  3922  3478  7030    444    5698  5772  6142

  2847  2044  6716  3869  3431  6935    438    5621  5694  6059

  3900  2800  9200  5300  4700  9500    600    7700  7800  8300

  3471  2492  8188  4717  4183  8455    534    6853  6942  7387

    936    672  2208  1272  1128  2280    144    1848  1872  1992

Table 2: Magazine’s level 2 IQ question

I hope you could find the non-matching number! Again APL can do it for you:

         Sales←(10?100)∘.×(10?100)
         Sales[5;5]×←2
         Sales
    507  364 1196  689  611 1235  78 1001 1014 1079
     78   56  184  106   94  190  12  154  156  166
   2691 1932 6348 3657 3243 6555 414 5313 5382 5727
   3393 2436 8004 4611 4089 8265 522 6699 6786 7221
   2457 1764 5796 3339 5922 5985 378 4851 4914 5229
   2886 2072 6808 3922 3478 7030 444 5698 5772 6142
   2847 2044 6716 3869 3431 6935 438 5621 5694 6059
   3900 2800 9200 5300 4700 9500 600 7700 7800 8300
   3471 2492 8188 4717 4183 8455 534 6853 6942 7387
    936  672 2208 1272 1128 2280 144 1848 1872 1992
         0⍕Normalize Sales
    1 1 1 1 1 1 1 1 1 1
    1 1 1 1 1 1 1 1 1 1
    1 1 1 1 1 1 1 1 1 1
    1 1 1 1 1 1 1 1 1 1
    1 1 1 1 2 1 1 1 1 1
    1 1 1 1 1 1 1 1 1 1
    1 1 1 1 1 1 1 1 1 1
    1 1 1 1 1 1 1 1 1 1 
    1 1 1 1 1 1 1 1 1 1   
    1 1 1 1 1 1 1 1 1 1
	

Imagine that the numbers in Sales are the revenues of item groups in several regions. Then this function will show, by colouring a table, that there were far more jackets sold in Germany, when compared to other regions. It would have been difficult to get that information just by studying the numbers.

Region \ Items Pants Jeans Shirts Skirts Jackets Suits Sweater Slippers Sneakers Socks
Austria
507
364
1196
689
611
1235
78
1001
1014
1079
Belgium
78
56
184
106
94
190
12
154
156
166
Denmark
2691
1932
6348
3657
3243
6555
414
5313
5382
5727
France
3393
2436
8004
4611
4089
8265
522
6699
6786
7221
Germany
2457
1764
5796
3339
5922
5985
378
4851
4914
5229
Great Britain
2886
2072
6808
3922
3478
7030
444
5698
5772
6142
Holland
2847
2044
6716
3869
3431
6935
438
5621
5694
6059
Italy
3900
2800
9200
5300
4700
9500
600
7700
7800
8300
Spain
3471
2492
8188
4717
4183
8455
534
6853
6942
7387
Sweden
936
672
2208
1272
1128
2280
144
1848
1872
1992

Table 3: Sales chart

To handle similar variables of higher rank, it is necessary to extend the Normalize function. Now it will average one dimension after another. As averaging a further dimension changes the result of previously averaged dimensions, this needs to be iterated several times until the result becomes stable. Fortunately this algorithm converges with any number of dimensions.

Normalize←{ ⍝ Normalize an array of arbitrary rank - Dyalog V11 version
    AxialNormalize←{(⍵×⍺⊃⍴⍵)÷[(⍳⍴⍴⍵)~⍺]+/[⍺]|⍵}
    LeftChained←{⊃⍺⍺/(⌽⍺),⊂⍵}
    Preprocess←{⍵+1E¯180×0=⍵}
    Postprocess←{⍵×1E¯160<|⍵}
    Postprocess(⍳⍴⍴⍵)AxialNormalize LeftChained⍣≡Preprocess ⍵
}

Normalize←{ ⍝ Normalize an array of arbitrary rank - Compatible version
    AxialNormalize←{(⍵×⍺⊃⍴⍵)÷[(⍳⍴⍴⍵)~⍺]+/[⍺]|⍵}
    LeftChained←{⊃⍺⍺/(⌽⍺),⊂⍵}
    Until←{⍵ ⍵⍵ x←⍺⍺ ⍵:x ⋄ ∇ x}
    Preprocess←{⍵+1E¯180×0=⍵}
    Postprocess←{⍵×1E¯160<|⍵}
    Postprocess{(⍳⍴⍴⍵)AxialNormalize LeftChained ⍵}Until≡Preprocess ⍵
}

(The Preprocess and Postprocess functions are required to circumvent zero division problems.)

This population chart illustrates the power of the method very clearly:

Age \ Year
1999
2000
2001
2002
2003
0
771223 766554 735755 719250 706449
1
788905 772940 772147 739547 722845
2
814425 789766 775918 773772 741235
3
799778 814674 792198 777811 775035
4
773303 799910 816966 794141 778756
5
782784 773740 802195 819000 794971
6
814607 782802 775663 803429 819102
7
833793 814593 784716 777074 803783
8
866345 834766 817082 786980 778263
9
954074 867444 837502 819359 788064
10
943186 955399 870230 840052 820427
11
970098 945001 958622 873253 841560
12
949410 972105 948491 961356 874970
13
933225 951793 975676 951535 963201
14
901858 935755 955683 979021 953449
15
900050 905014 940218 959420 981339
16
908955 904300 911412 945119 962594
17
936004 913630 910945 916538 948796
18
937447 942036 922226 917900 921679
19
950649 947452 955826 934459 927797
20
911786 963406 963199 969933 945735
21
908151 922857 978755 976442 980454
22
910085 920340 939322 992803 987005
23
905701 921590 936609 952985 1003813
24
890669 916064 936232 948860 962464
25
912320 898814 928552 946009 956147
26
931254 918245 909454 936263 951628
27
1029212 935974 927431 916169 940907
28
1144718 1032886 943227 933124 919707
29
1202052 1147340 1038983 947840 935548
30
1296324 1203587 1151707 1042534 949498
31
1362258 1296851 1207656 1154077 1043319
32
1398811 1362199 1299552 1209961 1154414
33
1439408 1398313 1364225 1300709 1209537
34
1450859 1438889 1399998 1364686 1300032
35
1477274 1450005 1440128 1400187 1363600
36
1469050 1476581 1450670 1439945 1398377
37
1424872 1468164 1476899 1450555 1438447
38
1404723 1424090 1468690 1476463 1448475
39
1368060 1403988 1424674 1468226 1474533
40
1326138 1367322 1404263 1424198 1466359
41
1261066 1325466 1367630 1403660 1422470
42
1238416 1260096 1325031 1366909 1401548
43
1210553 1237357 1259536 1324022 1364641
44
1177514 1208956 1236607 1258214 1321898
45
1161536 1176030 1207862 1234722 1256051
46
1128557 1159795 1174673 1205707 1232037
47
1133521 1126482 1157765 1172503 1202886
48
1113454 1131138 1124315 1155464 1169433
49
1123726 1110862 1128577 1121436 1152162
50
1087755 1120698 1108022 1125457 1117790
51
1003191 1084091 1117385 1104667 1121792
52
947557 999443 1080585 1113405 1100655
53
834565 943614 995557 1076523 1108775
54
742176 830302 939581 991293 1071635
55
987985 737949 826037 934937 986273
56
1006682 981853 733724 821345 929925
57
981990 999971 975354 729066 816200
58
1189781 975077 992976 968455 723816
59
1258458 1181002 968067 985523 960921
60
1235197 1247770 1171255 959970 976976
61
1153916 1223925 1236424 1161033 951271
62
1078082 1142454 1212233 1224590 1150029
63
1049741 1066784 1130839 1199711 1211778
64
1012960 1037232 1054855 1118264 1186400
65
940888 999448 1024140 1042034 1104720
66
757258 926714 985463 1010252 1028050
67
752064 744532 912376 970422 995364
68
762724 738522 731607 896637 954207
69
799984 747726 724474 717779 880013
70
770179 782641 732099 709693 703388
71
758324 752048 764629 715508 693668
72
695572 738595 732999 745642 697782
73
673646 676352 718511 712768 725229
74
655313 653544 656444 696700 691416
75
595573 633548 632558 634908 673450
76
568893 574342 610897 610122 613167
77
573642 545926 551956 587216 586192
78
574718 547767 521892 528378 561897
79
537637 545651 520874 496753 503089
80
384281 506703 515478 492383 469764
81
235001 360199 475438 483677 462122
82
210078 218196 335240 442751 450653
83
214385 193387 201446 309447 408189
84
255958 195477 177109 184069 282744
85+
1635134 1612696 1540092 1451680 1374891
Table 4: German population chart


Differences between Products

The second sample will show you how to quantify the reasons for deviations in calculations. Though a standard solution already exists, by doing APL examinations we quickly find out that there is something wrong with it, and we can come to more meaningful and realistic results.

A pre-calculation in its simplest form consists of the addition of item expenses, each being a product of price × quantity. These can then be compared by a post-calculation against the actual prices and quantities.

The differences between pre- and post-calculation should desirably be expressible as components of the costs, as only those can get summarized.

Item

Pre-Calculation

Post-Calculation

Diff.

Price

Qty.

Costs

Price

Qty.

Costs

Bricks

0.75 €

165 pc

123.75 €

0.80 €

135 pc

108.00 €

-15.75 €

Cement

2.50 €

5.0 lb

12.50 €

2.70 €

6.0 lb

16.20 €

3.70 €

Sand

0.30 €

2.0 lb

0.60 €

0.40 €

30.0 lb

12.00 €

11.40 €

Total

 

136.85 €

 

136.20 €

-0.65 €

Table 5: Bricklayer’s calculation

To determine the reasons for the differences, we need to do a more detailed analysis on a per item basis.

In the German literature about management-economics I found the chart below to explain the concept. (Sorry about not having found it in my English literature[1].)

The pre-calculated costs of an item are represented by the grey field. The difference to the post-calculated cost consists of three fields: one resulting from the difference in the price; one from the difference in quantity; and a third one is called mixed deviation.

 

diagram

Chart 1: 1st grade and 2nd grade deviations
[Ref: Zitzmann, Betriebliches Rechnungswesen, Plankosten (chapter 9, page 31), Berner Fachhochschule]

Differences in the prices are caused by inflation or other changes in the market. For differences in quantities the calculating person is responsible. Fortunately the third component, the mixed deviation, is not very much in this sample, so economists tend to neglect it.

Applied to the bricklayer sample, these differences may be separated into the three different causes:

Item

Pre-Calculation

Post-Calculation

Differences by

Price

Qty.

Costs

Price

Qty.

Costs

Price

Qty.

Mixed

Bricks

0.75 €

165 pc

123.75 €

0.80 €

135 pc

108.00 €

8.25 €

-22.50 €

-1.50 €

Cement

2.50 €

5.0 lb

12.50 €

2.70 €

6.0 lb

16.20 €

1.00 €

2.50 €

0.20 €

Sand

0.30 €

2.0 lb

0.60 €

0.40 €

30.0 lb

12.00 €

0.20 €

8.40 €

2.80 €

Total

 

136.85 €

 

136.20 €

9.45 €

-11.60 €

1.50 €

Table 6: Bricklayer’s calculation with 1st grade and 2nd grade deviations

Though the total differences in the first table were as low as 65¢, they now expose as being 9.45€ and 11.60€, and furthermore having a mixed difference of 1.50€.

When exchanging the pre- and post-calculation values, peculiarly not only the deviations’ signs change, but also its values:

Item

Post-Calculation

Pre-Calculation

Differences by

Price

Qty.

Costs

Price

Qty.

Costs

Price

Qty.

Mixed

Bricks

0.80 €

135 pc

108.00 €

0.75 €

165 pc

123.75 €

-6.75 €

24.00 €

-1.50 €

Cement

2.70 €

6.0 lb

16.20 €

2.50 €

5.0 lb

12.50 €

-1.20 €

-2.70 €

0.20 €

Sand

0.40 €

30.0 lb

12.00 €

0.30 €

2.0 lb

0.60 €

-3.00 €

-11.20 €

2.80 €

Total

 

136.20 €

 

136.85 €

-10.95 €

10.10 €

1.50 €

Table 7: Same as Table 6 with exchanged values for pre- and post-calculation

At this point, one began to have doubts about having found an appropriate solution for the problem. The situation worsens drastically when item costs are the result of three factors:

Costs = Price × Quantity × Rebate Factor [2]

In such cases you will get:

  • One deviation based on price difference,
  • One deviation based on quantity difference,
  • One deviation based on rebate difference and
  • Four mixed deviation components.

As it is not unusual that the sum of these four mixed components will be bigger than any of the three meaningful components, the usability of this method becomes rather questionable. So we must search for a better strategy.

To help you follow the quite terse functions, I will explain the resulting strategy for two factors first.

These are the conditions:

a × b + x + y = d × e

a × b + y = a × e

a × b + x = d × b

What we further need, are these two factors…

p = (b × d) E

q = (a × e) E

...to offer this solution:

x = (d - a) × (b + (e - b) × p / (p + q))

y = (e - b) × (a + (d - a) × q / (p + q))

With the exponent E=0, the mixed deviation is cut into halves:

x = (d - a) × (b + (e - b)/2)

y = (e - b) × (a + (d - a)/2)

With E=1, the bigger causer gets the bigger part of a deviation:

x = (d - a) × (b + (e - b) × (b × d) / ((b × d) + (a × e)))

y = (e - b) × (a + (d - a) × (a × e) / ((b × d) + (a × e)))

Here is the same as a dynamic function:

ProDiff←{E←1
     b←⌽a←⍺+1E¯100
     d←⍵+1E¯100
     h←⌽g←d-a
     q←⌽p←(d×b)*E
     g×b+(h×p)÷p+q}

The additions in lines 2 and 3 are required to avoid division by zero.

These are the calculation results where the deviations are separated according to their origins.

Item

Pre-Calculation

Post-Calculation

Differences by

Price

Qty.

Costs

Price

Qty.

Costs

Price

Qty.

Bricks

0.75 €

165 pc

123.75 €

0.80 €

135 pc

108.00 €

7.50 €

-23.25 €

Cement

2.50 €

5.0 lb

12.50 €

2.70 €

6.0 lb

16.20 €

1.10 €

2.60 €

Sand

0.30 €

2.0 lb

0.60 €

0.40 €

30.0 lb

12.00 €

1.60 €

9.80 €

Total

 

136.85 €

 

136.20 €

10.20 €

-10.85 €

Table 8: Bricklayer’s calculation of two factors with E=0 (mixed component shared equally)

 

Item

Pre-Calculation

Post-Calculation

Differences by

Price

Qty.

Costs

Price

Qty.

Costs

Price

Qty.

Bricks

0.75 €

165 pc

123.75 €

0.80 €

135 pc

108.00 €

7.40 €

-23.15 €

Cement

2.50 €

5.0 lb

12.50 €

2.70 €

6.0 lb

16.20 €

1.09 €

2.61 €

Sand

0.30 €

2.0 lb

0.60 €

0.40 €

30.0 lb

12.00 €

0.43 €

10.97 €

Total

 

136.85 €

 

136.20 €

8.92 €

-9.57 €

Table 9: Bricklayer’s calculation of two factors with E=1 (appropriately shared to its causers)

With the exponent set to E=1 we get the best matching results, as it shares the mixed deviation component into parts appropriate to its cause.

As previously mentioned, these are calculation samples consisting of three factors. The four mixed deviations could get removed, and the calculation concept is symmetric. [3]

Item

Pre-Calculation

Post-Calculation

Differences by

Price

Qty.

Reba.

Costs

Price

Qty.

Reba

Costs

Price

Qty.

Reba

Bricks

0.75

165

-12 %

108.90

0.80

135

-16 %

101.52

6.82

-21.17

6.97

Cement

2.50

5.0

-5 %

11.88

2.70

6.0

-16 %

15.23

1.04

2.46

-0.14

Sand

0.30

2.0

-20 %

0.48

0.40

30.0

0 %

12.00

1.49

8.87

1.17

Total

 

 

 

121.26

 

 

 

128.75

9.34

-9.84

-7.99

Table 10: Bricklayer’s calculation of three factors with E=0 (mixed component shared equally)

 

Item

Pre-Calculation

Post-Calculation

Differences by

Price

Qty.

Reba.

Costs

Price

Qty.

Reba

Costs

Price

Qty.

Reba

Bricks

0.75

165

-12 %

108.90

0.80

135

-16 %

101.52

6.73

-20.98

6.88

Cement

2.50

5.0

-5 %

11.88

2.70

6.0

-16 %

15.23

1.03

2.46

-0.14

Sand

0.30

2.0

-20 %

0.48

0.40

30.0

0 %

12.00

0.41

10.81

0.31

Total

 

 

 

121.26

 

 

 

128.75

8.17

-7.72

7.04

Table 11: Bricklayer’s calculation of three factors with E=1 (appropriately shared to its causer)

In detail, 8.17€ of the difference are inflation based, -7.72€ are based upon the calculator’s fault, and 7.04€ are the result of a purchase manager who failed to ask for the discounts.

ProDiff←{ ⍝ Distribute differences to their causers
    E←1
    2=⍬⍴⌽⍴⍵:⍺{
      b←⌽a←⍺+1E¯100
        d←⍵+1E¯100
        h←⌽g←d-a
        q←⌽p←(d×b)*E
        g×b+(h×p)÷p+q}⍵
    3=⍬⍴⌽⍴⍵:⍺{
        c←1⌽b←1⌽a←⍺+1E¯100
        d←⍵+1E¯100
        i←1⌽h←1⌽g←d-a
        r←1⌽q←1⌽p←a*E
        s←d*E
        pt←1⌽rs←r×s
        pu←2⌽qs←q×s
        puq←1⌽rtp←1⌽qsr←qs×r
        g×(b×c)+((h×c×qs)÷qs+pt)        ...
               +((i×b×rs)÷rs+pu)        ...
               +(h×i×qsr)÷qsr+rtp+puq}⍵
    ⍵-⍺
}
    
         8 2⍕PreCalc
      0.75  165.00    0.88
      2.50    5.00    0.95
      0.30    2.00    0.80
         8 2⍕PostCalc
      0.80  135.00    0.94
      2.70    6.00    0.94
      0.40   30.00    1.00
         8 2⍕PreCalc ProDiff PostCalc
      6.73  ¯20.98    6.88
      1.03    2.46   ¯0.14
      0.41   10.81    0.31

[1] Some translations: Preis = price; Menge = quantity; Preisänderung = price change; Planmenge = planned quantity; Planpreis = planned price; Plankosten = planned costs; Preisabweichung = deviation by price; Grad = grade; Mengenabweichung = deviation by quantity; Gemischte Abweichung = mixed deviation.

[2] Though a rebate is usually declared as a percentage, it is easy to transform into a multipliable component. e.g. 3.5% corresponds to a rebate factor of 0.965.

[3] This means, when exchanging the left and the right argument, the result will change its sign, not its values. When having three factors, the APL solution does not fulfil this requirement completely, because the full solution would be an iterative formula, which has been terminated at an acceptable precision.

October 2005

Peter-Michael Hager
HAGER-ELECTRONICS GmbH
Hamburger Str. 97, 44135 Dortmund, Germany
email: hager@dortmund.net

script began 5:43:57
caching off
debug mode off
cache time 3600 sec
indmtime not found in cache
cached index is fresh
recompiling index.xml
index compiled in 0.2611 secs
read index
read issues/index.xml
identified 26 volumes, 101 issues
array (
  'id' => '10003450',
)
regenerated static HTML
article source is 'HTML'
source file encoding is 'UTF-8'
URL: mailto:hager@dortmund.net => mailto:hager@dortmund.net
URL: #_ftn1 => art10003450#_ftn1
URL: hager1.jpg => trad/v221/hager1.jpg
URL: #_ftn2 => art10003450#_ftn2
URL: #_ftn3 => art10003450#_ftn3
URL: #_ftnref1 => art10003450#_ftnref1
URL: #_ftnref2 => art10003450#_ftnref2
URL: #_ftnref3 => art10003450#_ftnref3
URL: mailto:hager@dortmund.net => mailto:hager@dortmund.net
completed in 0.2888 secs