Aggregate Data. 1

Arrays. 2

Design Issues for Arrays. 2

Array Notation. 3

Declaring Arrays. 4

Array Size. 5

Determining the Lower- and Upper-Bound of an Array. 6

Array Organization. 6

Array Starting Positions. 7

Where Do Arrays Start?. 8

Where Do Arrays End?. 9

How Many Elements Does 10 Get Us?. 9

Initializing Arrays. 10

Perl Arrays. 12

FORTRAN 90 Arrays. 13

Questions: 14

Array Traversing. 15

Array Operations===where does this go??. 16

Using foreach with Arrays. 16

Array Cross Sections. 16

Array Slices and Sections. 17

Design Issues for Array Slices. 18

Negative Array Indexes. 19

Perl Arrays. 20

Python Arrays. 20

PL/I Label Arrays. 20

Array of Size Zero. 20

Perl Shift and Push for Arrays. 21

Ragged Arrays. 22

JavaScript Arrays. 22

Ruby Arrays. 22

Questions: 22

 

 

Aggregate Data

 

Simple variables are called elementary data items. They are manipulated as a unit. For example here are some elementary or simple variables:

 

   int count = 0;

   float rate = 15.50;

 

But often we want groups or aggregations of elementary data objects. Examples are arrays or records. Objects such as arrays or records are called data structures. They are an aggregate of other data objects. In this chapter, I go over arrays. There are so many ways to handle arrays in the different programming languages, that we could devote a whole book to just arrays. But this chapter will provide a fairly good review on the many programming techniques used with arrays and will point the reader to languages that can be studied for those that want more than what is provided here.

 

Arrays

Arrays allow us to represent many values with one variable name. In some languages (e.g. COBOL) arrays are called tables. An individual value in an array is called an element. A subscript or index is used to indicate which one of the array elements we want to use.

 

Design Issues for Arrays

While arrays work about the same in most languages there are some interesting differences. Here are some design issues for arrays.

 

 

There are arrays in some language that has each of the above choices.

 

Most descriptions of arrays would indicate that an array is group of homogenous type elements, that is the same type (like all integer or all double). But then scripting languages such as JavaScript, Perl, and Ruby showed up with arrays that can have elements of any type, including other arrays!

 

Some languages do great things with arrays. If you are interested in what a language can do with arrays, you might look at APL, which has an extensive set of array operations. But APL needed many special characters for operations and thus needed special input and output devices, which probably doomed the language. Other languages that use regular keyboards but still do wonderful things with arrays are FORTRAN 77 (and later versions), PL/I, and Ada. Modern scripting languages have very interesting array operations.

 

Array Notation

One interesting challenge is how do we tell arrays from other things like functions. This is sometimes called name space for arrays. ALGOL 60 contributed the square bracket [ ] for enclosing array subscripts. Since FORTRAN was already available the people designing ALGOL had probably run into the problem of using round parentheses for both functions “sqrt(85.0)” and arrays “taxrate(4)”. For example, if we have the following FORTRAN statement:

 

     z = 2 * baz(z)

 

Is baz an array or a function? In FORTRAN baz could be either an array or a function since both use round parentheses. This makes syntax checking for the FORTRAN compiler difficult and error messages often misleading. This problem is covered in more detail in the Parentheses section of chapter 1.

 

The second question with using arrays, is how do we tell arrays from simple variables? Can we have an array name and a variable name that are the same in the program? Thus can we have an array named rates[] and a variable names rates? Each language has its own rules about this. But just reading a programming textbook will often give few hints. Try using identical array and variable names in the same program and see what happens. You can make some sophisticated guesses on what will happen. For example, languages that have array operations will make it impossible to have identical names for arrays and variables. For example, some languages (e.g., PL/I) allow this pseudocode for the array rates:

 

   rates = 0

 

In these languages all elements of the rates array are set to zero. But this would mean we can not have a simple variable by the name of rates also. Array operations are covered later on in this chapter.

 

The third array notation question is how to indicate multiple-dimensioned arrays. In FORTRAN and other languages, we use:

 

     dimension scores(5,4)

 

where array dimensions are listed in one set of parentheses. But the C family of languages use:

 

     int scores[5][4];

 

where each array dimension is in a separate set of brackets. Table x.1 illustrates some two dimensional array examples by language.

 

Notation

Languages

rates(2, 3)

FORTRAN, BASIC, COBOL, Ada

rates[2][3]

Java, C++, C#, Modula-2

rates[2, 3]

C#, Pascal, Modula-2

Array Notation by Language

Table x.1

 

As you can see, most older languages use parentheses for their arrays (the early input devices did not have brackets), but newer languages use brackets. Thus the newer languages can clearly indicate that zap(x) is a function, and zap[x] is an array. C# and Pascal use only one set of brackets, (i.e., rates[2, 3]), for a multidimensional array, while Java and C++ use a set of brackets, (i.e., rates[2][3]), for each dimension of the array. The situation where C# uses two sets of brackets is a special situation, which is covered later in this chapter when ragged arrays are discussed. Modula-2 allows either [i, k] or [i][k] for array indexes.

Declaring Arrays

The declaring of arrays is similar in many languages but there are a few variations. To illustrate the differences between languages we will declare the same arrays in several languages. First, we declare a 1-dimensional integer array maxy with 10 elements. Next, we declare a floating point 2-dimensional array total with 3 rows and 5 columns. Finally, if possible in that language, we declare a floating point array tides with indexes from –5 to 5. Here are some examples by language.

 

Java/C++/C:

     int maxy [9];     //arrays start at zero.

     float total [2][4];

 

Ada:

     type maxy is array (1..10) of integer;

     type total is array (1..3, 1..5) of float;

     type tides is array (-5..5) of float;

 

ALGOL:

     integer array maxy [1:10];

     real array total [1:3, 1:5];

     real array tides [-5:5];

 

C#:

     int[] maxy = new int[9];     //arrays start at zero.

     float[,] total = new float[2,4];

 

COBOL:

     05  MAXY    PICTURE 9999 OCCURS 10 TIMES.

     05  ROWS-TOTAL  OCCURS 3 TIMES.

         10  TOTAL  PICTURE 9999V99  OCCURS 5 TIMES.

 

FORTRAN IV:

     DIMENSION MAXY(10), TOTAL(3,5)   !arrays start at one.

     REAL, DIMENSION (-5:5) :: TIDES  ! FORTRAN 77

 

Pascal, Moldula-3 similar):

     var maxy = array [1..10] of integer;

         total = array [1..3, 1..5] of real;

         tides = array [-5..5] of real;

 

As you can see the array declarations are slightly different by language. The numbers used for the array bounds change depending on whether the array starts at zero or one. Some of the languages can handle arrays that use indexes of any starting value, like used for the array tides. “Modern” languages like ALGOL-60 (joke) can do tricky things like that. It is interesting that few modern programming languages can start an array at any place except zero.

Array Size

How array size is indicated has several variations. First, can we use variables or just constants for array size?

 

     dim taxrate(100)   or dim taxrate(size)

 

Some languages, such as early FORTRAN, limit array declarative sizes to just integer literals. But even in FORTRAN we can pass a variable to a function or subroutine and then use the variable for the array size:

 

     FUNCTION SUM(TAXRATE, SIZE)

     DIM TAXRATE(SIZE)

 

While array bounds are often restricted to an integer or integer expression, a few languages are more generous. For example, in BASIC, we can use:

 

     dim tax(11.7)

 

and the array size 11.7 will be rounded to 12. I am not clear why we need this but I am sure many people can tell me. Other languages allow real numbers to be used but truncate the value to an integer.

 

The C family allows variables to be used if the variable has a value before execution of that function. In C we could use #define to do what we need.

 

   #define SIZE 100

   main() {

     int taxrate(SIZE);

     int otherrates(SIZE * 2);

 

Not only can we use the integer constant SIZE, but we an also use integer expressions (SIZE * 2) to indicate the size of the array. In C++ we can change the #define line to:

 

     const int SIZE=100;

 

and the rest would be the same.

 

Determining the Lower- and Upper-Bound of an Array

 

Mid versions of BASIC allowed any starting and ending value. So we could have arrays like the following:

 

     dim years (1998 to 2008)

     dim tide.level(-5 to +5)

 

Many “modern” languages do not allow this freedom with declaring arrays, instead making the programmers deal with array subscript offsets (often unsuccessfully) to take care of a very simple situation.

 

Since BASIC allows programmers to indicate both the lower and upper array bound [ DIM TIDE (-3 to 3) ] and to change the array bound while execution (instead of the constant 3, variables can be used), it is useful to be able to find out what the present bounds of the array are. QBASIC has the functions lbound(array-name, n) and ubound(array-name, n) where n is the nth dimension, to determine these array bounds. Thus for the array

 

   DIM TIDE (-5 to 3)

 

Lbound(TIDE, 1) returns –5, and Ubound(TIDE, 1) returns 3. Many languages including C#, Perl, and Python have methods or functions to determine the length of the array.

Array Organization

 

FORTRAN was one of the first high-level language to have arrays. Since FORTRAN is for scientific programmers, the arrays were organized a little differently. For example, FORTRAN arrays are organized by column (column major order) instead of by row. This means the first subscript increases the most rapidly, followed by the second subscript, and so on. For example:

 

     DIMENSION  K(4, 3)

 

declares an array of 12 elements, and the order in memory will be K(1,1), K(2,1), K(3,1), K(4,1), K(1,2), ....

 

For many situations it may not matter how the array is stored in memory, but it does matter for operations that transmit entire arrays. One example is input and output, which FORTRAN can handle without specifying the elements:

 

            READ *, K

 

The above will read the next 12 items of data into the array K. If we assume the input data is the first 12 integers then the array will look like this when full:

 

     1  5   9

     2  6  10

     3  7  11

     4  8  12

 

There are other operations such as use of DATA statement to fill an array where similar results will occur. FORTRAN was one of the few languages to store arrays by column.

 

The next historical language, COBOL stores arrays by row (row major order). COBOL and most other languages store elements by row. So if we read in the first 12 integers, the array would look like this in most other languages:

 

      1   2   3

      4   5   6

      7   8   9

     10  11  12

 

Today most languages fill and process arrays by row instead of columns. But for the early mathematical programmers multi-dimension arrays were just multiple vectors and that arrangement seemed correct. FORTRAN DO loops can be used to change the order of array processing. Business applications treat arrays like records, and thus like row major order. But scientists often need columns of numbers and then column major order is more useful. It is interesting that all modern languages have followed the COBOL order instead of the FORTRAN order.

 

One unanswered or at least not agreed on characteristic for arrays is where to start them. Should arrays start with element zero or element one? If you ask programmers this question you can probably figure out what language they started programming in. Next, if we start arrays at zero and ask for 5 elements, do we get elements 0-5 or 0-4? The next sections will survey how different languages have answered these questions.

 

Array Starting Positions

In FORTRAN II and IV arrays automatically start with element 1, and cannot start with a lower or higher value. FORTRAN 77 changed this, allowing negative or positive array starting positions. Thus in FORTRAN 77 and later versions we can declare arrays like the following:

 

  

   integer years(1998:2008)

   real tides (-3:5)

   integer gamepoints (-5:5, 2:5)

 

So the subscripts of the arrays years go from 1998 to 2008, the subscripts of the array tides goes from –3 to 5 (-3, -2, -1, 0, 1, 2, 3, 4, 5), and gamepoints is a two-dimensional array with negative values on the first index.

 

Ada allows similar arrays. The above arrays would look as follows:

 

   years: array (1998..2008) of integer;

 

Visual Basic grants the same wide variety in declaring arrays.

 

Modula-2 and its related languages allow array indexes to start at any value (positive or negative) and allow other cardinal values for subscripts. Here are a few examples:

 

   TYPE

     TideLevel = ARRAY [-10..10] OF REAL;

     vector = ARRAY [1..50] OF REAL;

     lcaseCount = ARRAYa”..”z”] OF INTEGER;

     TruthType = ARRAY BOOLEAN OF INTEGER;

 

Most of these are understandable, except maybe the last line that uses FALSE and TRUE for the subscript.

Where Do Arrays Start?

As you can see by the above there is common question of where arrays start, zero or one. ALGOL got around that problem by using a bound pair for declaring the arrays (see above) and thus the beginning array subscript is indicated.

 

Why do C arrays start at zero instead of one? The address of the array is the beginning of the array. Thus the array address points to the first (zero) element of the array. So for the array rates, all the following point to the zero element (in C) of the array:

 

   *rates

   *(rates+0)

   rate[0]

 

After C started their arrays this way most modern languages adopted the same system of starting all arrays at the zero element.

 

In early BASIC, arrays of size ten or less elements did not have to declared. The subscripts went from 1 to 10. In the second edition of BASIC, the zero subscript was allowed, so now we could use A(0) or B(0,0). The other interesting item is that if you declare an array

 

            DIM X(20)

 

you actually have elements X(0) to X(20) available (21 elements). Likewise, an array of B(4,5) goes from B(0,0) to B(4,5) which gets you 30 elements instead of the 20 you get with Java or C. In C when we ask for int x(20) we start at element 0 but only get to element 19, which I have never liked. Too bad Kernighan and Ritchie didn’t read BASIC manuals more, or perhaps they did and rejected my suggestion. They are a lot smarter than me, so I assume they did the right thing. One can easily argue that either approach is best. That is, a declaration of x(20) will get either elements 0-19 or 0-20.

 

Since BASIC was designed for amateur programmers, the interpreter checked for array bound errors. Not many early languages do this for us, probably because this error checking would be a lot more difficult for a compiler than an interpreter. Nowadays computers are very fast so you see C# and VB checking array indexes.

 

There are some problems here, if the language does array processing. For example, if we have a command that prints arrays do we print the zero element (row) or not?

 

QBASIC and Visual BASIC 6.0 have solved this problem by letting you set the default to be zero or 1. But then all the arrays in that program start at zero or one in that language. To do this in these versions of BASIC we use the statement:

 

     option base 0

 

The only choices for option base are zero or one. Since arrays can have a starting and ending value in newer version of BASIC, individual arrays can override this use of option base.

 

Where Do Arrays End?

Since we cannot agree on where arrays start (zero, one, or elsewhere), it is not surprising the end of the array is also not agreed on. In FORTRAN and BASIC an array declared of size 10, ends with the 10th subscript. While in the C family of languages the array declared of size 10, ends with the 9th subscript.

 

How Many Elements Does 10 Get Us?

In FORTRAN (elements 1-10) and C (elements 0-9), we get 10 elements. While in BASIC (elements 0-10) we now get 11 elements. I have always thought the BASIC solution was best, since I do not have to use the zero element, if not interested, and I did not have to ask for 11 elements (like in C) if I want the element with a subscript of 10. It is interesting how many people feel strongly about either solution.

 

Thus when we declare an array of 10 elements, we get a different starting subscript (0 or 1), and a different number of elements, depending on the language. The following table illustrates this in different languages:

 

Language

Element range

# of elements

FORTRAN, PL/I

1-10

10

C++/Java/C#

0-9

10

VB .Net/QBasic

0-10

11

Array Elements by Language

Table x.x

 

The above table shows some of the variations and there are many languages in each category.

 

Initializing Arrays

Once we have arrays, we need some way to initialize the array elements. C/C++ automatically initializes external and static array elements to zero, but does not initialize local arrays, much to the dismay of many programming students. C# initializes all arrays when they are declared. Most languages do not initialize array elements to any value and have no easy way to initialize large number of elements to the same value.

 

For example, suppose we have an array KZAP of 10 elements and want to initialize the first three elements to zero, the next four to 9, and the last two elements to 3. In FORTRAN we would do the following:

   DIMENSION KZAP(10)

   DATA KZAP /3*0, 4*9, 2*3/

 

Many a time have I wanted some way to do something similar in C++.  In Java or C++ we would do the following:

 

   int kzap[9] = {0,0,0,9,9,9,9,3,3};

 

which is not too bad as long as I do not need 100 elements instead of 10 elements.

 

PL/I improved on the FORTRAN method as follows:

 

   DECLARE KZAP(10) FLOAT DECIMAL

           INITITAL((3)0, (4)9, (2)3);

 

which will initialize the array to the same values as the previous two languages ( 3 zeros, 4 nines, and 2 threes. This is an improvement because the array declaration and initialization is done in one statement rather than two. FORTRAN needs both a DIMENSION and a DATA statement. Both PL/I and FORTRAN have ways to repeat values in other declarations. This seems like a very nice method of initializing array elements.

 

Python allows us to repeat values for initializing arrays but their syntax is a little different. If we want an array named numbers with the first eight integers we could do this in Python:

 

   numbers = [1,2,3,4,5,6,7,8]

 

But if we wanted eight zeros instead we could do this:

 

   numbers = [0]*8

 

Likewise, we obtain an array with values similar to the previous PL/I example as follows:

 

   numbers = [0]*3 + [9]*4 + [3]*2

 

Thus all these languages have easy ways to initialize elements in an array to multiple copies of the same value.

 

We can initialize only part of the array in most languages as follows:

 

   int kzap[9] = {0,,2,,4,,6,,8,};

 

The above line would initialize the even-number elements, but leave the odd-number elements not initialized. While this may be the intention, the more likely occurrence is an extra comma being a mistake and then one element left out as follows:

 

   int kzap[9] = {0,1,2,3,4,,6,7,8,9};

 

In this code, element five was not initialized because of the extra comma. Ada, in its great wisdom, has the rule that if any element is initialized, all elements need to be initialized. Do you like this rule?

 

Ada has many ways to initialize an array. Suppose, we have an Ada array named codes that has five elements. Here are some ways to initialize that array in Ada:

 

   type codes is array (1..5) of integer;

 

   codes := (4, 1, 6, 3, 99); --positional association.

   codes := (1..5 =>4);      --sets all element to 4.

   codes := (1|3|5 =>9, others => 7); --uses an or.

   codes := (1..3 =>x, others => z); --use variables.

   codes := (4=>99, 2=>7, 3=>88, 1=>72, 5=>78);

          --above uses subscript to indicate value.

 

The composite symbol => is assignment. For you few students not programming (joke) in Ada (1..5 =>4) assign the value 4 to elements 1 to 5. Few "modern" languages can match this Ada flexibility of array initialization. Critics would say few modern languages would want to.

 

Trying to keep compatible with previous versions of BASIC and Visual Basic hampers Visual Basic. It takes two steps to declare and to initialize the array elements to non-zero values. For example, we do this in Visual Basic 6.0:

 

   Dim intSums(5) as Integer   ‘ declares 6 elements and

                               ‘  initializes them to zero.

   Dim intAges() as Integer = {0, 5, 12, 18, 45, 49}

 

In Visual Basic 6.0 we cannot declare the upper size of the array and initialize the values in the same statement. Thus the parentheses in intAges() is left empty. Some languages do the opposite, but then you can have conflicts with the elements asked for and the values supplied.

Perl Arrays

In Perl the @ symbol indicates we are using an array. Perl allows us to use the range operator to initialize arrays. For example:

 

   @numbs = (1, 2, 3, 4, 5)

 

will set the first five elements of array @numbs to the digits 1 to 5. But we can use the range operator to do it a little easier, as follows:

 

   @numbs = (1..5)

 

This range operator is available in some other languages, but in Perl, the range operator can be used with simple or complicated strings too. For example:

 

   @letters = (‘d’, ‘e’, ‘f’, ‘g’, ‘h’);

 

can be change to

 

   @letters = (‘d’ .. ‘h’);

 

Ranges do not have to be as simple as above, since we can do things like

 

   ‘part01’ .. ‘part05’

 

which get us part01, part02, part03, part04, part05

 

Or even things like ‘aa’ .. ‘cz’, which I will let you figure out.[1]

 

When initializing Perl arrays, expressions can be used. Here is an example:

 

   @x = (4, 2+9, 3*4));

 

But the values will be flatten before initializating the array, as follows:

 

   @x = (4, 11, 12));

 

 

FORTRAN 90 Arrays

Any one interested in how arrays can be handled in a programming language needs to look at FORTRAN 90. Older versions of FORTRAN also did many amazing things but the newer version added even more. First, there are several different ways to declare an array. Suppose we want an integer array of 10 elements.

 

FORTRAN IV

 

   DIMENSION X(10), MINX(10)

 

Since early FORTRAN used default types for variables depending on the first lett