This paper covers the history and use of literals (or constants) in programming languages, from the beginning of programming to the present day. Literals in many programming languages are discussed including modern languages such as C, Java, scripting languages, and older languages such as Ada, COBOL, and FORTRAN. Design issues, types of literals, and problems with literals are illustrated. Literals vary across languages much more than most programmers would expect.

 

Literals. 1

Integer Literals. 2

Design Issues for Integer Constants. 3

Ada Integers. 3

Size of Integer 4

C Family. 4

Arbitrarily Long Integers. 4

Visual BASIC 6.0 and QBasic. 4

Visual Basic .NET Type Designations. 5

Base or Radix of Integers. 5

Questions. 7

Real Literals. 7

Design Issues for Floating Point Constants. 7

Decimal Point Placement 7

Precision of Reals. 8

Complex Numbers. 8

What is doubled?. 9

FORTRAN 90 Kind Numbers. 10

Questions. 11

Questions. 11

Boolean Literals. 11

Design Issues for Boolean. 11

Character Strings Literals. 13

Design Issues for Character Strings. 13

String Delimiters. 14

String Escape Sequences. 14

Perl and UNIX Shell Character Strings. 16

Perl Alternative Quotes. 16

Perl Additional Escape Sequences. 17

UNIX Backquotes. 17

Special Literals= = where. 18

C# Verbatim Sting Literals. 18

Python Triple-Quoted Strings. 18

here Documents. 19

Date Literals. 20

Array Literals. 20

String Comparison==move to strings. 21

Repeating Literals. 21

Conclusion. 21

Questions. 22

New FORTRAN Declarations – why here, move to type chapter??. 24

 

Literals

 

Copyright Dennie Van Tassel 2004.

Please send suggestions and comments to dvantassel@gavilan.edu

 

Literals or constants are the values we write in a conventional form whose value is obvious. In contrast to variables, literals (123, 4.3, “hi”) do not change in value. These are also called explicit constants or manifest constants. I have also seen these called pure constants, but I am not sure if that terminology is agreed on. At first glance one may think that all programming languages type their literals the same way. While there are a lot of common conventions in different languages there are some interesting differences.

 

Literal

Explanation

285

Typical integer

34.67

Typical real

4.23E-4

Typical scientific

140_345

Integer in Perl or Ada

true

Typical boolean

0x1b or Z"1B"

Hexadecimal literal

'B'

Typical character

"Hello"  or  'Hello'

Typical character string

5HHello

Old FORTRAN Hollerith string

null   ZERO

Special literals

Various Literals in Different Languages

Table x.1

 

Literals represent the possible choices in primitive types for that language. Some of the choices of types of literals are often integers, floating point, Booleans and character strings. Each of these will be discussed in this chapter.

 

Integer Literals

Integers are commonly described as numbers without a decimal point or exponent. Another description for integer literals is a string of decimal digits without a decimal point. Thus the following are valid integers in all languages:

 

   123   0   -14   21345

 

Integers may or may not have a sign and must fall within some restricted range. Negative values need to be preceded by a minus sign. If integers use 32 bits, then the maximum value would be 2^31 – 1 (since we need to use one bit for negative numbers).

 

There are two more integer constants available in some languages:

 

   +45   5e2

 

Early C did not allow +45 since integers without a sign, such as just 45, are positive by default, so no unary positive sign was used. Thus C had a unary negative operator but no unary positive operator. But many later C compilers and Java allow the unneeded positive signs on constants. Few other languages actually forbid unary positive signs.

 

The last constant 5e2 which would evaluate to 500 would be a floating point value in C and FORTRAN. Their rule is a floating point constant has a decimal point OR exponent, or both. Thus 5.0, 5e0, and 5.0e0 would all be the same floating point 5.0. But in Ada integers can have positive exponents, so 5e2 (or 5e+2) is integer 500. Negative exponents are not allowed for Ada integers. Thus 5e-3 is an error in Ada[1], but 5.0e-3 is a floating-point constant.

 

Design Issues for Integer Constants

There are a few design issues for integers. They are:

 

  • What sizes of integer constants is available? For example, do we have short integer, regular integers, and long integers?
  • How do we indicate the particular type of integer constant wanted?
  • What bases of integers are available? Examples that may be available besides decimal could be octal, hexadecimal, or any base.
  • Is there any separator available like the comma used for thousands?

 

There is a yes answer to all the above questions in some language, and different languages have different answers.

 

Most languages have one or more default size for integers available. On a 16-bit word size machine integers range from –32,768 to +32,767, which is about 2^15 - 1. On a 32-bit word size machine integers range from –2,147,483,648 to +2,147,483,647, which is about 2^31 – 1. Today 64-bit integers are common. Unfortunately, computer integers cannot have those useful commas to mark thousands.

 

But this is an over simplification since we can have hexadecimal integers and they use letters. And we may want octal values and some way to indicate the desired size of our integers. Also, the definition of integer in the previous paragraph is not true for all languages.

 

Ada Integers

For example, in Ada both integer and real literals can have an exponent. Thus in Ada the integer literal 2100 could also be written as:

 

   21e2   210e+1   2100e+0

 

But in many other languages the exponent would indicate that the above are floating point literals. For integers, the exponent must be positive.  Ada allows us to use the underscore to improve readability. The underscore is often used to separate a number into groups of three digits like commas are used in non-programming areas. Here are some examples:

 

   1_234.56   408_847_1400   1_000_000   12_27_05   4_345e2

 

In most of the above numbers the underscore is placed where a comma would normally be, but the underscore can be placed in any convenient place. Perl and Ruby also allow underscores in their integers.

 

Size of Integer

C Family

If we have more than one size of integers, we need some way to indicate the precision of the integer constant. The C family uses an L or l (ell) after an integer to indicate a long integer. Thus 12L is used for a long integer. We can use the lower case l but few can tell the difference between 12l (12 and L) or 121 (12 and one), so we always use an upper case L. These suffixes are useful to force arithmetic into a particular precision.

 

Besides long integers, we have unsigned integers in C, which use the suffix u or U. Thus we could write 15u or 15U to get the unsigned integer fifteen. Long unsigned integers are indicated with the terminating ul or UL, so 23ul or 23UL will get an unsigned long integer twenty-three. For regular integers one bit must be saved to store the sign of the integer. If a variable or constant is unsigned, then that bit can be used for the integer. Thus a signed integer may have 2^15-1 or -32,768 to +32,767, but an unsigned integer stored in the same amount of storage can go from 0 to +65,535 which is 2^16-1.

 

If we are in a language that has long integers, then how do we use them? For example, if we write 123456789012, we do not want to end up with an integer overflow or truncation. A good compiler would automatically store this integer as a long integer, but we may want to help it (or us) with 123456789012L.

 

Arbitrarily Long Integers

In most languages long integers are restricted to some large size. Python uses the same L to indicate a long integer like, 12345678901234567890L, but Python long integers can be arbitrarily big. Other languages such as Ruby and Lisp dialects have these arbitrarily long integers and are called bignum systems.

Visual BASIC 6.0 and QBasic

These forms of BASIC have two types of integers. The two types are integer and long integer. Early BASIC did not have types for numbers. There was no distinction between integers and floating point. But now we have several numeric types. For numeric constants a suffix is used on the number to indicate the type. Here is what they use:

 

Numeric Type

Suffix

Bytes of Storage

Integer

%

2

Long integer

&

4

Single precision

none or !

4

Double precision

#

8

Types in BASIC

Table x.2

 

Thus 15% is an integer, while 15& is a long integer, and 15 (or 15!) is a floating point, single precision float. By default all numbers are real (floating point) single precision. If we want a double precision float 15, then we type 15#.

 

Visual Basic .NET Type Designations

VB .NET has broken from its BASIC parents and changed the type-designations characters they append to numeric literals. Whole numbers (no decimal points) are type Integer and numbers with decimal points are type Double. Otherwise, they use a method similar to previous dialects of BASIC, but use different codes to change the default type. VB .NET codes are as follows:

 

S          Short integer

I           Integer

L          Long integer

F          Single-precision floating point

R          Double-precision floating point

D         Decimal

 

So they have three types of integers and two types of floating point. They use Decimal for decimal fractions such as dollars and cents. Thus 45S is a Short integer, 45I (or 45) is an Integer, and 45L is a Long integer. And 234.5F is a Single-precision floating point literal and 234.5R (or 234.5) is a Double-precision floating point literal. Finally, 780.23D is Decimal currency-type literal.

 

The range of values for VB .Net is much larger than previous languages. For example, long integer range from ±9x10^18. C# .NET has similar types and value ranges.

 

Base or Radix of Integers

C Family

Sometimes we want a different base or radix of our constants besides base 10. Base 8 and base 16 are useful for storage addresses. The C family allows us to indicate octal constants by preceding the number with a zero. So 012 is octal 12, not decimal twelve. For octal values the range of digits is 0-7.

 

So putting this together with what we learned in the previous section we can use the terminating L to make the constant Long and the U to make it unsigned. Thus 012UL is the unsigned long octal value 12 or the equivalent of the decimal value 10.

 

For hexadecimal values we need to precede the number with an 0x or 0X. Thus 0x12 is hexadecimal 12, not decimal 12. Now the range of acceptable “digits” is 0 1 2 3 ... 9 A B … E F. We can use upper or lower case letters a-f. Again we can use long integer indicator “L” on these too. Thus 07L is a long octal seven, and 0x7L is a long hexadecimal seven. We can also use the terminating U to make it unsigned. Thus 0XFUL is the unsigned long hexadecimal value F, which is equivalent to the decimal value 15.

 

Ruby does the same for octal and hexadecimal literals as C does, but Ruby has added 0b for binary numbers. So in Ruby we can have hexadecimal values like 0x12, octal values like 012, and binary values like 0b1001.

 

FORTRAN 90

FORTRAN 90 does this a little differently. They allow radix (number base) 2, 8, or 16. They start the value with letter B for binary or radix 2, letter O (oh) for octal, and letter Z for hexadecimal. Then the number follows by a string of digits enclosed in double or single quotes. The range of digits must be acceptable for the desired base (no 8 or 9 in octals). The integer value 200 would be B”11001000” for base two, O”310” for base eight, and Z”C8” for base 16. I try very hard not to be chauvinistic, but I sure like the C method better in this case.

 

This FORTRAN 90 solution illustrates the problem of adding a feature to an existing language. They cannot just decide to use the C solution, that all numbers starting with a zero are octal values. Millions of old FORTRAN programs would no longer work correctly when compiled on new FORTRAN 90 compilers, since 012 would be octal 12 instead of decimal 12. On the positive side of this change, thousands of old FORTRAN programmers would suddenly have employment.

 

Ada  

Ada, being a language with always a little more, does what C and FORTRAN do, but has added more bases and uses a different syntax. An integer can be expressed in any base from 2 to 16 by prefixing the number by its base and then bracketing the number within # symbols. Thus the decimal value 35 can be expressed in various bases as follows:

 

   2#100011#   4#203#   8#43#   10#35#   16#23#

 

While this is kind of interesting, I do not see much use for base 7 or 11, but obviously someone did. In addition, C and FORTRAN 90 can only use octal or hexadecimal integer constants; Ada allows floating point constants in these different bases. Thus 23.45 could be expressed in base 16 or another base from 2 to 16

Questions

1. Suppose you wanted to add more bases to Java or C++. Presently, those languages can only handle decimal, octal, and hexadecimal. The Ada people designed their methods in at the beginning, but the FORTRAN had to add it to an existing language. Try to figure out how you could add more bases to C++ or Java without breaking millions of old programs.

Real Literals

Reals are numbers with a decimal point, thus 4.3 is a real literal. Real numbers are called floats or floating point in some languages. Another descriptions of reals is a number with a decimal point or an exponent (or both), thus 2e2 would be a real literal using this definition. Like integer literals, a positive or negative sign can precede the number and no commas are allowed. Thus some real literals are:

 

   0.0   -4.302   7.   3.2e-4   4.9678E+3   4e-3

 

If the language accepts both lower and upper case, the “e” for exponent can be lower case or upper case. It may vary by language if 4e-3 is acceptable, or we may need 4.0e-3 (with a decimal point). The “e” stands for exponent and means multiply by 10 the value that follows. Thus

 

   4.3e2 = 4.3 x 10^2 = 4.3 x 100 = 430.0

 

Scientific notation is useful for expressing very small numbers or very large values (such as your chances to win the lottery or the national debt).

 

Design Issues for Floating Point Constants

There are a few design issues for floating point constants. Here are some:

 

  • What sizes of floating point constants is available? For example, do we have float, double, and long double?
  • How do we indicate the particular type of floating point constant we want?
  • What bases of floating point are available? Examples that may be available besides decimal could be octal, hexadecimal, and maybe others.