Java Site MenuProgramming SectionsMiscellaneous StuffConsultancy ServicesDownloadsFeedback Form


[C] [Previous] [Home]

Arrays and Strings

Understanding of arrays is crucial to getting to grips with how C implements character strings. Let us first consider arrays.

Arrays [Top]

When the compiler finds a statement such as:

    char iaStr1[50];

it will reserve sufficient space to store 50 characters consecutively in memory. The name of the variable represents the address of the first one of those 50 characters in memory. This fact is important to remember in order that you can understand how to access individual characters within the array.

To read or change the contents of an element within the array, you have to tell the compiler which element you are on about. This is accomplished via a statement similar to:

    iaStr1[9] = 'a';

Some compilers will warn you if the value in the square brackets, the subscript value, is outside the valid range for the array. This is only possible if the value is a constant as in the statement above. The compiler is unable to warn you about possible problems if you use a variable as the subscript value.

When the compiler sees such a statement it has to work out the address in memory of the array element that you have specified. To do this it changes the statement to look something more like this:

    *(iaStr1 + (sizeof(char) * 9)) = 'a';

Somewhat unintelligable you may think at first glance, but look more closely and you will see what is happening. The compiler knows the address of the first element of the array, this is the value represented by the variable name. It then has to figure out where the subscripted element, the 9'th element in our case, after that address starts. To do this it takes the size of a single element, in bytes, and multiplies this by the element number, in our case 9, and adds it to the address of the first element. This gives us a further address, the one at which the element we are interested in starts. But, remembering how an address is only a reference to a variable, the compiler has to de-reference it in order to store the data, the character 'a' in this case, hence the * at the front.

It may seem quite complicated, but there is only one memory access here, the one right at the end to store the 'a' character, all the rest of the work is done in calculating the memory address.

All accesses to elements within an array are handled like this, no matter how simple or complex the element type, so when a more complex statement like this:

    iaStr1[9] = iaStr1[20];

is decomposed like above, the end result:

      *(iaStr1 + (sizeof(char) * 9)) = *(iaStr1 + (sizeof(char) * 20));

is still composed of the familiar parts from the simple statement.

Strings [Top]

The C language does not have a string type, but instead utilises arrays of type char and requires that they are treated in a particular manner dictated by convention. A limitation of this approach is that a string is not special in any way, but is just an array of char. There is also no real way of telling whether or not an array of char is to be treated as a string. However, by convention, all strings are terminated by the NUL character, a character with a value of 0.

The NUL character convention means that all char arrays that are intended to hold strings must have room for this 1 extra element that is required at the end of the data elements. Therefore a declaration like this:

    char iStr2[50];

will only allow a maximum string length of 49 characters.

All of the run-time support functions for string manipulation expect the characters in the char array that are to be interpreted as part of the string to start at the beginning of the reserved memory (that is at element 0) and to come immediately before the terminating NUL character. If you forget to do this, or have a bug that over-writes the NUL, then the functions will not treat the array as you expect.


[Fiendish Home]


Content of this page Copyright © Robert Quince 1996 - 2005.
Site Comments