|
|
|
Simple stringsA string is simply an array of characters as all of us know. The simplest type of a string would look like this. char *s = "this is a string"; in the above statement, a pointer variable for a character is made and also, the space is allocated for the string "this is a string" and the starting address of that allocated memory is put into the pointer. So, the pointer (s) acts like a string. In the above method, the character encoding used is ASCII. In ASCII encoding, one character takes only 1 byte. Another important thing is, the string is terminated by a null character ('\0'). If it is not done, there is no way to identify the end of the string in the memory, because the memory is just an array of bytes. So, if you pass the above string s to a function, the function knows where the string ends, because of the terminating null character. In the above declaration, the null character is added to the tail automatically. So the actual memory consumed in the above string is 17 bytes, not 16 bytes. When using strings like above, you have be careful, due to the use of pointers. Say you declared a string like this. char *string1 = "hello"; You may think the both strings will contain "hello". It is true. But the thing is, actually the both pointers string1 and string2 are sharing the same string in the memory.
So, later if you do something like this, string2[0] = 'a'; both string1 and string2 will contain "aello", which we may not what we need. The other commonly used character encoding is UNICODE. In UNICODE, 2 bytes are used to store 1 character, so large number of characters can be encoded using unicode not just 256 characters as in ASCII, so various alphabets are included in UNICODE character set. As char is used to store ASCII character, wchar_t is used to store a UNICODE character. UNICODE characters are also known as wide characters because they consume 2 bytes. wchar_t *s = L"hello"; In the above statement, a unicode string is made in the memory and a pointer is used to keep track of the string, just as we did with ASCII characters just before. Notice that here, the letter L should be prefixed to the string to specify that it is a unicode string. Same as with ASCII strings, the end of the string is identified by a null character, but the difference is, now the string is terminated by 2 null characters. So the above string s consumes12 bytes of memory. String handling functionsCopying a string We observed in an example above that just assigning a pointer to another will not copy the string, they will share the same string instead. C language provides strcpy for copying strings correctly. char *string1 = "hello"; The above statements show how to use strcpy to copy a string. Here, be careful not to use a char pointer that is pointing nowhere as the first argument to strcpy. Enough space should be allocated first, as done in the above example. In the example, string2 is also actually a char pointer, but it points to a valid memory location as we assigned 16 bytes for it. Formatting a string The function sprintf can be used to format a string. int i = 10; You should be familiar using printf. sprintf works the same way as printf, but instead of printing to the screen, the string is copied to the buffer given as the first argument, here buf. Same as in the previous example of string copying, here also passing just a pointer as the first argument is wrong, because the memory should be allocated first. Comparing 2 strings The function strcmp can be used to compare 2 strings and find whether they are equal or not. char *string1 = "hello"; ret will contain 0 if the two strings are equal. Here, the 2 strings are not equal so ret will not contain 0. All the above functions are for manipulating ASCII strings. The corresponding functions for UNICODE strings are listed below.
|