Arrays and Strings
Understanding of arrays is crucial to getting to grips with how C
implements character strings. Let us first consider arrays.
Arrays
When the compiler finds a statement such as:
char iaStr1[50];
it will reserve sufficient space to store 50 characters consecutively
in memory. The name of the variable represents the address of the first one
of those 50 characters in memory. This fact is important to remember in order
that you can understand how to access individual characters within the array.
To read or change the contents of an element within the array, you have to
tell the compiler which element you are on about. This is accomplished via a
statement similar to:
iaStr1[9] = 'a';
Some compilers will warn you if the value in the square brackets, the
subscript value, is outside the valid range for the array. This is only
possible if the value is a constant as in the statement above. The compiler
is unable to warn you about possible problems if you use a variable as the
subscript value.
When the compiler sees such a statement it has to work out the address in
memory of the array element that you have specified. To do this it changes
the statement to look something more like this:
*(iaStr1 + (sizeof(char) * 9)) = 'a';
Somewhat unintelligable you may think at first glance, but look more closely
and you will see what is happening. The compiler knows the address of the
first element of the array, this is the value represented by the variable
name. It then has to figure out where the subscripted element, the 9'th element
in our case, after that address starts. To do this it takes the size of
a single element, in bytes, and multiplies this by the element number, in our
case 9, and adds it to the address of the first element. This gives us a further
address, the one at which the element we are interested in starts. But, remembering
how an address is only a reference to a variable, the compiler has to
de-reference it in order to store the data, the character 'a'
in this case, hence the * at the front.
It may seem quite complicated, but there is only one memory access here,
the one right at the end to store the 'a' character, all the rest
of the work is done in calculating the memory address.
All accesses to elements within an array are handled like this, no matter how
simple or complex the element type, so when a more complex statement like
this:
iaStr1[9] = iaStr1[20];
is decomposed like above, the end result:
*(iaStr1 + (sizeof(char) * 9)) = *(iaStr1 + (sizeof(char) * 20));
is still composed of the familiar parts from the simple statement.
Strings
The C language does not have a string type, but instead utilises
arrays of type char and requires that they are treated in a
particular manner dictated by convention. A limitation of this approach
is that a string is not special in any way, but is just an array
of char. There is also no real way of telling whether or not an
array of char is to be treated as a string. However, by convention,
all strings are terminated by the NUL character, a
character with a value of 0.
The NUL character convention means that all char arrays
that are intended to hold strings must have room for this 1 extra element
that is required at the end of the data elements. Therefore a
declaration like this:
char iStr2[50];
will only allow a maximum string length of 49 characters.
All of the run-time support functions for string manipulation expect the
characters in the char array that are to be interpreted as part
of the string to start at the beginning of the reserved memory (that is
at element 0) and to come immediately before the terminating NUL
character. If you forget to do this, or have a bug that over-writes the
NUL, then the functions will not treat the array as you expect.
|