Skip to main content

Fundamental Types

  • Describe the fundamental types of C++
  • Describe the syntax for constants for each type

"Define types appropriately, then read and write data using those types instead of thinking bits and words and addresses." Sutter, Alexandrescu (2005)

Any type relates an object in a program to its underlying implementation in hardware. A programming language's type system distinguishes the regions of memory accessible by the program from one another. The type assigned to a region of memory identifies the nature of the information stored in that region. That information may be value data, address data, or program instructions.

The type of any region of memory defines two distinct aspects of the object stored in that memory:

  • how to interpret the bit string in the region of memory
  • the operations that are admissible on the bit string in that region of memory

Each fundamental type defined in the C++ core language consists of an integral number of bytes (1, 2, etc.).

Types

This chapter describes the fundamental types in C++. These fundamental types include integer types and floating-point types. The integer types include the character types for representing the different language symbols in the world. This chapter also describes the syntax for initialization, type inference from previously declared types and the alignment of types in memory.

Integer Types

The integer types consist of:

  • standard integers; either
    • signed standard integers, or
    • unsigned standard integers
  • booleans
  • character types

Range specifiers on the standard integer types set the ranges of values associated with these types. They are:

  • signed - negative and positive values
  • unsigned - no negative values

Integer types represent their values using the binary numeration system. The ordering of the bytes within any multi-byte type is platform-implemented. Big-endian platforms store the highest order byte first, while little-endian platforms store the lowest order byte first. Some PowerPC platforms can switch modes; Intel platforms (x86) are typically little-endian. The ordering of bits within each byte is also platform dependent. These finer details are unspecified by the C++ language standard.

Standard Signed Integers

C++17 defines five signed standard integer types:

  • signed char
  • short int
  • int
  • long int
  • long long int

A signed standard integer type can hold negative and positive values. The leading word, if any, before int is its size specifier. A short int fits between a signed char and an int, an int fits between a short int and a long int, and a long int fits between an int and a long long int.

signed char

A signed char occupies enough bits (typically, one byte) to store the language's basic character set:

Signed Char Type

We use a signed char type to read character input from a file and trap the end of file mark (EOF, which typically has a value of -1).

signed char c; // for possibility of receiving EOF

We define a constant of signed char type using single quotes around the value and can express it in character, octal, or hexadecimal notation:

signed char k, m, n, p;

k = '['; // character
m = '\133'; // octal - note the leading \
n = '\x5b'; // hexadecimal - note the leading \x
p = '\X5B'; // hexadecimal - note the leading \X

The single quotes identify a character literal. The host platform converts the enclosed characters to a decimal value based on the platform's encoding sequence.

short int

A short int, or simply a short, occupies at least as much memory as a signed char (typically, at least 16 bits):

signed int

C++ does not support syntax for constant literals of short int type (i.e., integer literals on 2 bytes).

int

An int occupies at least as much memory as a short int (typically, 32 bits):

int

We define a constant of int type using decimal, octal, or hexadecimal notation:

int a, b, c, d;

a = 91; // decimal
b = 0133; // octal - note the leading 0
c = 0x5b; // hexadecimal - note the leading 0x
d = 0X5B; // hexadecimal - note the leading 0X

long int

A long int, or more concisely a long, occupies at least as much memory as an int (typically, 32 bits):

long int

We define a constant of long int type using decimal, octal, or hexadecimal notation with the suffix L or l:

long int a, b, c, d;

a = 91L; // decimal notation
b = 0133L; // octal notation; note the leading 0
c = 0x5bL; // hexadecimal notation; note the leading 0x
d = 0X5BL; // hexadecimal notation; not the leading 0X

long long int

A long long int, or more concisely a long long, occupies at least as much memory as a long int (typically, 64 bits):

long long int

We define a constant of long long int type using decimal, octal, or hexadecimal notation with the suffix LL or ll:

long long int a, b, c, d;

a = 91LL; // decimal
b = 0133LL; // octal
c = 0x5bLL; // hexadecimal
d = 0X5BLL; // hexadecimal

Ranges

The finite number of bytes in any standard integer type limits the number of different values that the type can hold. The range of values for a signed standard integer type depends on the encoding scheme for negative values. Encoding schemes for negative values include:

  • two's complement - flip the bits and subtract one
  • one's complement - flip the bits
  • sign magnitude - reserve one bit for the sign

All three schemes represent positive values identically. Two's complement, which is the most popular, renders separate ALU subtraction circuits redundant and yields only one representation of 0.

For a two's complement encoding scheme, typical ranges are:

TypeMinMax
signed char-128127
short int<= -32,768>= 32,767
int-2,147,483,6482,147,483,647
long int<= -2,147,483,648>= 2,147,483,647
long long int<= -9,223,372,036,854,775,808>= 9,223,372,036,854,775,807

Unsigned Standard Integers

The five unsigned types, which correspond to the signed standard integer types, are

  • unsigned char
  • unsigned short int
  • unsigned int
  • unsigned long int
  • unsigned long long int

We use unsigned types for variables that only hold non-negative values. For example:

  • unsigned char letter;
  • unsigned short int languages;
  • unsigned int persons;
  • unsigned long int students;
  • unsigned long long int citizens;

We define an unsigned constant using decimal, octal, or hexadecimal notation with the suffix U or u:

unsigned int g;
unsigned long int h;
unsigned long long int k;

g = 0x5bU; // unsigned int: U or u
h = 0X5BUL; // unsigned long: (U or u) and (L or l) any order
k = 456789012345ULL; // unsigned long long: (U or u, LL or ll) any order

An unsigned type occupies the same amount of memory as its signed counterpart.

Ranges

Typical ranges for the unsigned standard integer types are:

TypeMinMax
usigned char0255
unsigned short int0>=65,535
unsigned int04,294,967,295
unsigned long int0>= 4,294,967,295
unsigned long long int0>= 18,446,744,073,709,551,615

Comparing the values in this table to those in the table for a signed standard integer type shows that those values are a subset of those for the corresponding unsigned type.

bool Type

The bool type typically occupies one byte:

bool

A variable of bool type can hold one of two values:

  • false
  • true

Character Types

C++17 defines six character types:

  • char
  • signed char
  • unsigned char
  • wchar_t
  • char16_t
  • char32_t

signed char, unsigned char, char

The first three char types occupy the same amount of memory - enough to hold the implementation's basic character set:

signed char

The basic source-code character set consists of 96 characters:

  • space
  • control characters
    • horizontal tab ('\t')
    • vertical tab ('\v')
    • form feed ('\f')
    • new-line ('\n')
  • lower case letters: a through z
  • upper case letters: A through Z
  • digits: 0 1 2 3 4 5 6 7 8 9
  • other: _ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " '

The basic execution character set consists of these characters plus the following control characters

  • null character - all bits set to 0 ('\0')
  • alarm ('\a')
  • backspace ('\b')
  • carriage-return ('\r')

The char type represents Unicode characters using the 8-bit Unicode Transmission Format (UTF-8).

Locales

C++17 uses locale objects for international portability. A locale identifies the culture-specific set of features.

wchar_t

The wchar_t type is a wide character type that can store all the members of the largest character set among the supported locales. The wchar_t type has the same amount of memory and signedness as one of the other integral types. The integral type associated with the wchar_t type is called its underlying type.

We define a literal of wchar_t type using the prefix L followed by the value in single quotes around character, octal, or hexadecimal notation:

wchar_t k, m, n, p;

k = L'['; // character - note the leading L
m = L'\133'; // octal - note the leading L'\
n = L'\x5b'; // hexadecimal - note the leading L'\x
p = L'\X5B'; // hexadecimal - note the leading L'\X

The single quotes identify a character literal. The host platform converts this literal to its decimal value using the platform's encoding sequence.

char16_t

The char16_t type is a character type for representing characters using UTF-16. UTF-16 is an encoding sequence that can hold 1,112,064 code points from 0 to 0x10FFFF. This format is variable-length using one or two 16-bit code units; that is, it requires 16 or 32 bits to encode a character. 16 bits covers most of the Latin alphabets. This format treats characters outside the Basic Multilingual Plane as a special case. The Basic Multilingual Plane is the first group of continuous code points in Unicode, which contains the most commonly used characters.

The char16_t type uses the same amount of memory and signedness as one of the other integral types. The integral type associated with the char16_t type is called its underlying type.

We define a literal of char16_t type using the prefix u (lower case only) followed by single quotes around character, octal, or hexadecimal notation:

char16_t k, m, n, p;

k = u'['; // character - note the leading u
m = u'\133'; // octal - note the leading u'\
n = u'\x5b'; // hexadecimal - note the leading u'\x
p = u'\X5B'; // hexadecimal - note the leading u'\X

char32_t

The char32_t type is a character type for representing characters using UTF-32. UTF-32 is a fixed-length character encoding that can hold code points from 0 to 0x7FFFFFFF.

This type has the same amount of memory and signedness as one of the other integral types. The integral type associated with the char32_t type is called its underlying type.

We define a literal of char32_t type using the prefix U (upper case only) followed by single quotes around character, octal, or hexadecimal notation:

char32_t k, m, n, p;

k = U'['; // character - note the leading U
m = U'\133'; // octal - note the leading U'\
n = U'\x5b'; // hexadecimal - note the leading U'\x
p = U'\X5B'; // hexadecimal - note the leading U'\X

Floating-Point Types

C++17 defines three fundamental types for storing floating-point values:

  • float - a single-precision floating-point number
  • double - a double-precision floating-point number
  • long double - a double-precision floating-point number, possibly with extra precision

C++17 does not specify the sizes of the float, double or long double type. The standard leaves them open to the host's implementation.

A double type provides at least as much precision as a float type. A long double type provides at least as much precision as a double type. The set of values of float type are a subset of the values of double type. The set of values of double type are a subset of the values of long double type.

The three floating-point types store values approximately. The most popular implementation is the IEEE (Eye-triple-E for the Institute of Electrical and Electronics Engineers) Standard 754 for Binary and Floating-Point Arithmetic (Wikipedia).

float

A float type typically occupies 4 bytes of memory:

float

Data Representation

Under IEEE 754, a float type occupies 32 bits, has one sign bit, a 23-bit mantissa and an 8-bit exponent. The arrangement of the mantissa and exponent bits is open:

float Representation

or

float Representation

Under IEEE 754, the following formula determines the value stored

value=(1)s×2e×1+f1×21+f2×22+...+f23×223value = (-1)^s × 2^e × { 1 + f_{1} × 2^{-1} + f_2 × 2^{-2} + ... + f_{23} × 2^{-23} }

where fif_i is the value of bit ii (i=1,2,...,23i = 1, 2, ..., 23) of the mantissa and ee is the exponent, which has a value between -127 and 128 inclusive.

Limits and Ranges

The limits on the number of significant digits and the ranges of the exponents for an IEEE 754 float type are:

TypeSizeSignificant DigitsMin ExponentMax Exponent
float4 bytes6-3738

The exponent values in this table are decimal (base 10).

double

A double type typically occupies 8 bytes of memory:

double

Data Representation

Under IEEE 754, a double type occupies 64 bits, has one sign bit, a 52-bit mantissa and an 11-bit exponent. The arrangement of the mantissa and exponent bits is open:

double

or

double

Under IEEE 754, the following formula determines the value stored

value=(1)s×2e×1+f1×21+f2×22+...+f52×252value = (-1)^s × 2^e × { 1 + f_1 × 2^{-1} + f_2 × 2^{-2} + ... + f_52 × 2^{-52} }

where fif_i is the value of bit ii (i=1,2,...,52i = 1, 2, ..., 52) of the mantissa and ee is the exponent, which has a value between -1022 and 1023 inclusive.

Limits and Ranges

The limits on the number of significant digits and the ranges of the exponents for an IEEE 754 double type are:

TypeSizeSignificant DigitsMin ExponentMax Exponent
double8 bytes15-307308

The exponent values in this table are decimal (base 10).

long double

A long double type provides the maximum number of significant digits. A long double type typically occupies at least 8 bytes of memory:

long double

void Type

The void type is an incomplete type. A type is incomplete if it is missing some information. C++ does not allow the creation of objects of type void.

We use a void type for the return type of functions that do not return a value and for generic pointers described in the next chapter.

Initialization

We can initialize a variable at its declaration in several different ways as shown below:

  • by copying the initial value using the assignment operator on the same type (C-style)
  • by copying the initial value using the assignment operator on a different type (C-style narrowing/promotion)
  • by appending the braces-enclosed value to the declaration (direct-list initialization type-safe universal form)
  • by equating the declaration to a braces-enclosed value (copy-list initialization universal form)

For example:

// Initialization Expressions
// initializer.cpp

#include <iostream>

int main ()
{
int n0 = 7; // C-style
int n1 = 7.2; // C-style - narrowing (loss of information)
int n2 {6}; // universal form braces-enclosed list
int n3 = {5}; // = is redundant

std::cout << "n0 = " << n0 << std::endl;
std::cout << "n1 = " << n1 << std::endl;
std::cout << "n2 = " << n2 << std::endl;
std::cout << "n3 = " << n3 << std::endl;
}

outputs the following:

n0 = 7
n1 = 7
n2 = 6
n3 = 5

The universal form is recommended. It checks the type of the value against the declared type and reports an error in the case of any mismatch. Since this form does not admit a narrowing of information, it is type-safe.

Type Inference

A compiler can infer the type of an object from a previously declared object. For instance, it can infer the type from the right operand of an initializer expression. In C++, the keyword auto specifies type inference.

In the following program, the compiler infers the type of n and i from the type of the initial value (6 and 0):

// Type Inference
// auto.cpp

#include <iostream>

int main ()
{
int a[] {1, 2, 3, 4, 5, 6};
const auto n = 6;

for (auto i = 0; i < n; i++)
std::cout << a[i] << ' ';
std::cout << std::endl;
}

and outputs the following:

1 2 3 4 5 6

Note that auto cannot appear in the top-level declaration of an array type. That is, we cannot write:

// Type Inference
// auto-compilation-error.cpp

#include <iostream>

int main ()
{
auto a[] {1, 2, 3, 4, 5, 6}; // ERROR
const auto n = 6;

for (auto i = 0; i < n; i++)
std::cout << a[i] << ' ';
std::cout << std::endl;
}

Type Alignment

C++ types can be aligned in memory at different spacings.

The alignof() operator returns the alignment requirement of its argument type. The following program

// Alignment of a Type
// alignment.cpp

#include <iostream>

int main ()
{
std::cout << alignof(int) << std::endl;
std::cout << alignof(double) << std::endl;
}

outputs

4
8

An alignment requirement represents the number of bytes between successive addresses of objects of a given type. Every object type has an alignment requirement. We can specify the alignment of an object type using alignas().

In the following program, the alignment of A is 4, because the member int n needs to be allocated on a 4-byte boundary. However, the alignment of B is 16, because we have specified that aligned alignas(16), where 16 is an integral multiple of an 8-byte boundary.

// Type Alignment
// align.cpp

#include <iostream>

// A is allocated at 4-byte boundaries
struct A
{
int n; // size 4 alignment 4
char c; // size 1 alignment 1
}; // size 8 alignment 4

// B is allocated at 16-byte boundaries
struct alignas(16) B
{
int n; // size 4 alignment 4
char c; // size 1 alignment 1
}; // size 8 alignment 16

int main()
{
A a[] {1, 'a', 2, 'b', 3, 'c'};
std::cout << "align of A = "
<< alignof(A) << std::endl;

for (int i = 0; i < 3; ++i)
{
std::cout << &a[i] << ": "
<< a[i].n << ", " << a[i].c << std::endl;
}


B b[] {1, 'a', 2, 'b', 3, 'c'};
std::cout << "align of B = "
<< alignof(B) << std::endl;

for (int i = 0; i < 3; ++i)
{
std::cout << &b[i] << ": "
<< b[i].n << ", " << b[i].c << std::endl;
}
}

The program above produces the output (the addresses will different in each execution):

align of A = 4
00BDFC24: 1, a
00BDFC2C: 2, b
00BDFC34: 3, c

align of B = 16
00BDFBF0: 1, a
00BDFC00: 2, b
00BDFC10: 3, c

Exercises