Skip to content
Image with logo, providing a link to the home page
  • United Stated of America flag, representing the option for the English language.
  • Bandeira do Brasil, simbolizando a opção pelo idioma Português do Brasil.

Learn Programming: Arrays, Strings, Collections and Data Structures

Example of using arrays in four programming languages: Python, Lua, GDScript and JavaScript.

Image credits: Image created by the author using the program Spectacle.

Requirements

In the introduction to development environments, I have mentioned Python, Lua and JavaScript as good choices of programming languages for beginners. Later, I have commented about GDScript as an option for people who want to program digital games or simulations. For the introductory programming activities, you will need, at least, a development environment configured for one of the previous languages.

If you wish to try programming without configuring an environment, you can use of the online editors that I have created:

However, they do not provide all features offered by interpreters for the languages. Thus, sooner or later, you will need to set up a development environment. If you need to configure one, you can refer to the following resources.

Thus, if you have an Integrated Development Environment (IDE), or a combination of text editor and an interpreter, you are ready to start. The following example assumes that you know how to run code in your chosen language, as presented in the configuration pages.

If you want to use another language, the introduction provides links for configure development environments for the C, C++, Java, LISP, Prolog, and SQL (with SQLite) languages. In many languages, it suffices to follow the models from the experimentation section to modify syntax, commands and functions from the code blocks. C and C++ are exceptions, for they require pointers access the memory.

Composite Data Types and the End of an Era

In Data Types, the existence of composite data types was anticipated, which includes, for instance, strings, arrays, sets, unions and records. This is a good time to introduce them in more details.

This topic presents composite data types that allows storing multiple values that are accessible using a single variable. Many of them are, strictly speaking, implemented as records. In general, they can be thought as collections of data.

Like a bag allows storing multiple objects, and insertions, removals, or retrievals of stored items of interest, collections allow storing, recovering and accessing data. The access is performed using a key or identifier (id or ID), used to uniquely identify a stored element. The key or identifier can be implicit (for instance, an order) or explicit (an arbitrary value defined as a synthetic or surrogate key); the chosen data type can also vary.

Regardless of case, collections allow manipulating multiple values in a convenient and efficient way. A combination of collections with repetition structures allow to succinct processing and manipulating values in a few lines of code.

Hereafter, you will become able to process millions of data entries, if so you wish or need. To do this, concepts such as insertion, removal, search and sorting will become part of your programming vocabulary. To improve your programming abilities, it is interesting to learn how to create some traditional collections; however, how to implement data structures is out of the scope of this topic in particular.

Moreover, hereafter some languages will also be out of the scope of this introduction to programming. Thus, the end of an era approaches, for your abilities will surpass the resources of two programming languages. The topic about arrays will be the last supported by Scratch and Flowgorithm, because neither language provide implementations for dictionaries, records and files. Although it is possible to create a dictionary in these languages, this will not be done in this topic. Scratch and Flowgorithm are tools for learning; after arrays, the final resources that are going to be explored are common in general purpose programming languages, albeit uncommon in languages for learning.

In fact, after these last topics, you will be able to use most of the essential resources that programming languages provide for problem-solving. Collections will allow large scale work with data; records will enable you to build your own data types; files enables using secondary memory to store and retrieve data permanently (that is, short of hardware failure).

The entirety of the programming potential is close to your reach. Although there are other topics to explore and much to learn, many concepts will be incremental to improve techniques. The basis to build on is almost completely defined.

Arrays (Vectors) and Lists

Virtually every modern programming language provides a data type for arrays and/or lists. Some languages define arrays and/or lists with more operations and advances capabilities; others provide the bare minimum to work with. Strictly speaking, lists are data structures that are more complex and practical than arrays, providing additional features and conveniences. Nevertheless, their use is similar and many programming languages allow using lists as arrays (if a type for arrays is lacking). Except if mentioned otherwise, this topic will treat lists as arrays, for they will be, in fact, similar in languages such as Python and GDScript.

In general, an array abstract a contiguous sequence of memory positions in a single variable. One can think of an array as a box with partitions, in which in the next partition is placed next to the previous one. One can also think of an array as columns of a table with a single line.

For instance, an array that stores names of programming languages can be thought in the following way:

|-----+--------+------------+----------+---+-----+------+------+--------+-----|
| Lua | Python | JavaScript | GDScript | C | C++ | Java | Lisp | Prolog | SQL |
|-----+--------+------------+----------+---+-----+------+------+--------+-----|

An array represents a block of memory. The example of the table of programming languages represent an array of strings. Usually, one can create arrays of any data type (regardless whether primitive or composite types). It is even possible to create arrays of arrays, for, for instance, create matrices. Arrays of arrays will be discussed in a subsection of this page. Before complexity, it is suitable to learn the simpler concepts.

There are two concepts for the size of an array. One can think on the total size (in bytes) of an array and on the size as the capacity of the array (the number of values that the array can store, also called length).

The total size of an array in bytes depends on the data type that will be stored. Depending on the programming language, each partition can have a fixed size (with the same size in bytes for every partition of the array) or a variable sized one. Lower level programming languages, such as C and C++, usually allow creating fixed size arrays, because they allow programmers to manipulate and manage the memory of a program. In this case, the array is a contiguous block of memory that is divided in a certain number of smaller blocks of equal size (mathematically, the length is the total size in bytes divided by the size of the data type to be stored). Higher level programming languages, such as Lua, Python, JavaScript and GDScript can manage the memory automatically (within limits), easing the use of variable sized partitions. In particular, it is even possible that a same array store values of different data types.

In practice, however, it is more usual to refer to the size as the capacity of the array. The memory block is partitioned in a number of partitions that define its size (or length) of the array. The previous example stores ten strings in an array. The size of an array can be fixed or variable, which depends on its implementation. Some authors call fixed size arrays as arrays and variable sized arrays as vectors. In this material, the terms array and vectors are considered synonyms.

High level programming languages usually provide variable length arrays (or variable size arrays). This is the case of Lua, Python, JavaScript and GDScript. In variable size arrays, the implementation (instead of the programmer) is responible by managing the required memory, adjusting the size according to the number of elements stored in the array. If one inserts a new element, the array can automatically expand its size to store it. If one removes an existing element, the array can shrink to avoid wasting memory. For efficiency, implements often use arrays that may be larger than the number of stored elements instead of using the exact value, to amortize the costs of dynamically allocating and deallocating memory over multiple operations (instead of every one of them).

Low level programming languages typically provide fixed length arrays (fixed size arrays), which is declared statically (using static memory allocation). Fixed length arrays can provide better performance, albeit with lesser flexibility. One should know the number of values to store at development (programming) time. Otherwise, there is a risk to waste memory by declaring sizes that are larger than what is needed, or lack memory due to a declaration of an array that is too small. To create an array at use time, one can use dynamic memory allocation.

Regardless of the case, each partition of an array belongs to a specific memory address. Nevertheless, to make it easier to manipulate arrays, programming languages abstract the address as an index or offset based on the initial address of the array. In programming languages, indices are typically integer numbers. The indexing often assumes the form array_name[index_number], with array_name as an array of a determined data type and index as an integer number. Thus, using an array is similar to using a conventional variable for primitive types; the different is the need to specify the desired index to a specific memory position. For instance, the array with names of programming languages could be called language_names with the type string array. The syntax can vary, as it happens with the criteria used for indexing. For indexing, each language defined its own rules, though two choices are the most common.

The first is starting the indexing from zero, as it happens with JavaScript, Python, GDScript and Flowgorithm. In the indexing starting from zero, indices vary from 0 to size - 1. For instance, for the array of names of programming languages, language_names[0] would correspond to the value Lua, language_names[3] could refer to the GDScript value and language_names[9] would correspond to the value SQL. language_names[10] does not exist in the array; an attempt to access it would result into an error, which two probable outcomes: segmentation fault, that would make the program crash (with good luck), or accessing an improper memory position (with bad luck). Therefore, it is necessary taking care not to use incorrect indices for memory position that are not defined by the array.

Memory AddressE1E2E3E4E5E6E7E8E9E10
Index0123456789
ValueLuaPythonJavaScriptGDScriptCC++JavaLispPrologSQL

The second approach consists of starting the indexing from one, as it works in Lua and Scratch. In the indexing starting from one, indices vary from 1 to size. For instance, for the array of names of programming languages, language_names[1] would correspond to the value Lua, language_names[3] could refer to the JavaScript value and language_names[10] would correspond to the value SQL. language_names[0] does not exist in the array; an attempt to access it would result into an error.

Memory AddressE1E2E3E4E5E6E7E8E9E10
Index12345678910
ValueLuaPythonJavaScriptGDScriptCC++JavaLispPrologSQL

It can be noted that indexing matches the idiomatic form of iterating in a repetition structure. This is not a coincidence, though it is a convenience to operating with arrays' indices.

Furthermore, some programming languages or implementations allow using negative numbers as indices. In Python and GDScript, negative indices access the positive indices in a reverse order. For instance, -1 corresponds to the last position of the array, -2 to the penultimate and so on. In C and C++, the language assumes that indices are unsigned integer numbers. An attempt to use a value such as -1 as the vector index would be interpreted as an attempt to use the value of the largest possible integer value with the number of bytes used as the type of the index. In other words, the access likely would result in an error.

Arrays Versus Lists

Before starting providing examples, it is worth disclaiming that lists are not, necessarily, equivalent to arrays. Unlike what commonly happens with arrays, the values in lists do not even have to be stored in contiguous memory positions. In fact, some implementation of lists do not allow to access an element using an index. Rather, some implementations provide operations to access the next and the previous value of a list based on the currently selected one. As a result, the navigation over list elements can be significantly slower than in proper arrays (for it can require transversing all the previous elements until reaching the desired one). Some implementations soften the problem using an alternative called doubly linked list (or deque).

Nevertheless, that exist implementations that allow operating lists as if they were arrays. This commonly happens if the implementation uses an array with contiguous memory to define the list (array list). This is the case for languages such as Python and GDScript, which are considered in this introduction. For Python, the costs of operations can be consulted using the official wiki.

Furthermore, it is lists commonly provide operations that are not always available for arrays. One of the main examples is automatically readjusting the available size according to the number of stored elements. This can be performed, for instance, using linked lists or by reallocating the array whenever required.

Flowcharts: Flowgorithm

The declaration of arrays in Flowgorithm follow the same steps of the declaration of a variable. After creating a Declare block, choose the name for the array. Before confirming its creation, mark the option Array?. After marking the option, the size of the array must be declared. The chosen size will correspond to the maximum number of values that the array will be able to store (arrays, therefore, have fixed length in Flowgorithm).

Array indexing in Flowgorithm starts from 0. To assign a value to an array, you should create an Assign block. In the block, choose the name of the array in the variable, followed by the indexes between square brackets (for instance, array[0]) for the first position.

Arrays can be used in any field that supports using a variable. To do this, it suffices to write the name of the array followed by the desired index between square brackets.

Example of use of arrays in Flowgorithm.

The following code snippet provides the transcription of the text in the image.

Main

String Array array[10]
array[0] = "Hello"
array[1] = "my"
array[2] = "name"
array[3] = "is"
array[4] = "Franco"
array[5] = "!"

Output array[0]
Output array[4]
Output array[5]

array[0] = "Hi"

Output array[0]
Output array[4]
Output array[5]

End

As the array has a fixed size, it is important not to exceed the maximum size that was declared. In the example, array was declared with 10 memory positions to store data of the type string, though it has used only 6. Thus, the memory positions 6, 7, 8 e 9 could still be used to store data. However, positions starting from 10 must not be used; using they would be a logic error.

Alternatively, to avoid wastes with unused memory positions, array could be declared with size 6. Although declaring an array with size can resemble declaring a variable, the result is not, exactly, equivalent. Rules to use arrays also applies to the unitary size. For instance, it will still be required to use the index and, in many programming languages, the passage of the array as a parameter will be performed by reference. If there is a need to differentiate both, variables that are not arrays can be said scalars. Scalar is roughly the opposite term to vector. For instance, in Physics, there exists scalar quantities (such as time and position) and vector quantities (such as velocity).

Visual Programming Languages: Scratch

To create arrays in Scratch, you should access *Varaibles, then choose Make a List. After choosing a name for the list (array), new blocks will appear to manipulate it.

To execute a program several times, it is useful to start the code with a block Delete all of array. This should be performed for every array that is declared; otherwise, following runs will keep the items from the last execution in the array. To show all the stored values, you can also add show list array.

To add a value to the array, you can use the block add thing to array. To access the value of a position in the array, you can use the block item 1 of array. To modify the value of a position of the array, you can use the block replace item 1 of array with thing. Arrays in Scratch start indexing from 1 (as it happens in Lua).

Example of use of arrays in Scratch.

There exists blocks for some other operations that will be discussed in this topic. Thus, if you wish, you can use Scratch to learn arrays.

Text Programming Languages: JavaScript, Python, Lua and GDScript

The use of arrays is JavaScript, Python, Lua and GDScript is very similar. All languages are high-level and provide convenient implementations of arrays. Strictly speaking, only JavaScript and GDScript provide arrays. Python defines lists that can be used as arrays, while Lua uses tables indexed with integer values.

//           0        1     2       3     4         5
let array = ["Hello", "my", "name", "is", "Franco", "!"]
console.log(array[0])
console.log(array[4])
console.log(array[5])

array[0] = "Hi"
console.log(array[0])
console.log(array[4])
console.log(array[5])
#        0        1     2       3     4         5
array = ["Hello", "my", "name", "is", "Franco", "!"]
print(array[0])
print(array[4])
print(array[5])
print(array[-1])
print(array[-2])
print(array[-6])

array[0] = "Hi"
print(array[0])
print(array[4])
print(array[5])
--             1        2     3       4     5         6
local array = {"Hello", "my", "name", "is", "Franco", "!"}
print(array[1])
print(array[5])
print(array[6])

array[1] = "Hi"
print(array[1])
print(array[5])
print(array[6])
extends Node

func _ready():
    #            0        1     2       3     4         5
    var array = ["Hello", "my", "name", "is", "Franco", "!"]
    print(array[0])
    print(array[4])
    print(array[5])
    print(array[-1])
    print(array[-2])
    print(array[-6])

    array[0] = "Hi"
    print(array[0])
    print(array[4])
    print(array[5])

The comment before the array declaration relates the index with the value stored in the array. The examples in Python and GDScript also illustrate the usage of negative indices. Try using them in Lua or JavaScript to observe the resulting errors.

Many programming languages define a par of square brackets as the operator to access a value stored in the index of an array. Programming languages that do not provide an operator usually tend to provide subroutines to perform the operations.

To iterate over values of an array, one can use repetition structures (loops). Some programming languages also provide more practical features, such as iterators. One example is the repetition structure for each, that makes it simpler to write code to iterate over all values of a collection. Whenever possible, this topic will use iterators.

//           0        1     2       3     4         5
let array = ["Hello", "my", "name", "is", "Franco", "!"]
for (let index = 0; index < array.length; ++index) {
    console.log(array[index])
}

for (const index in array) {
    console.log(array[index])
}

// If you do not intend changing the value, you can use const instead of let.
for (let value of array) {
    console.log(value)
}
#        0        1     2       3     4         5
array = ["Hello", "my", "name", "is", "Franco", "!"]
for index in range(len(array)):
    print(array[index])

for value in array:
    print(value)
--             1        2     3       4     5         6
local array = {"Hello", "my", "name", "is", "Franco", "!"}
for index = 1, #array do
    print(array[index])
end

for index, value in ipairs(array) do
    print(index, value)
end
extends Node

func _ready():
    #            0        1     2       3     4         5
    var array = ["Hello", "my", "name", "is", "Franco", "!"]
    for index in range(len(array)):
        print(array[index])

    for value in array:
        print(value)

In the examples using the traditional for repetition structure, the size (length) of the array is used as the upper limit. To obtain the size of an array, one can use:

In programming languages that do not provide a function or a method to obtain the size, a good alternative is to create a variable or a constant to store the value (instead of typing it whenever necessary). Thus, if one wishes to modify it later, it will be simpler and faster to perform a refactor.

In the examples using for each, there are two common variations:

  1. The iterator returns each value stored in a collection;
  2. The iterator returns the key or index that allows accessing the stored value. In this case, one can access the value using value = array_name[key].

Lua, in particular, defines ipairs() (documentation), which returns the numeric index and the stored value. If you do not wish to use of the values (index or value), it is common to name the variable with an underscore (_). In the case of the value, it can be omitted (for index in ipairs(vector) do).

Moreover, JavaScript, Python, Lua and GDScript allow mixing values of different types in a same array. This means that it is possible to create an array such as [1, "Franco", -1.23, True]. Although it is possible, I would recommend avoiding this kind of use, because it requires knowledge of the expected data type for very index. In other words, it is error-prone. Therefore, it is preferable to use arrays that store data of a very same type instead of mixing them into a single array.

Operations and Techniques Using Arrays

Besides accessing a value using an index or iterate over all values, there are some common operations to manipulate arrays:

  • Obtaining the size (or length): get the current size of the array, as previously showed;
  • Check whether a list is empty: verify whether the size is equal to zero. Implementations with maximum size can also provide functions or methods to verify whether a list is full. The maximum size is, at times, called capacity of the array or listj. In practice, even in lists without limits, there will always exist a maximum capacity imposed by the quantity of free memory that is available in the computer. An attempt to allocate memory when there is not enough memory available is not possible (the program will crash, or emit an exception or another error);
  • Initialization: assignment of an initial value for every position of an array;
  • Write or print (or dump): write all values in an array. In high level programming languages, it is common that commands or procedures such as print() write all values. This is what happens, for instance, in JavaScript, Python and GDScript. The alternative is iterating over the entire array and write each value. This must be done, for instance, in C, C++ and Lua. It is convenient to define a procedure to write arrays in these languages to avoid repeating code, as it will be done in all Lua examples in this section.
  • Accumulation of values (reduce): as in repetition structures, accumulation of the result in a variable during the iteration of elements of an array;
  • Selecting or filtering values: creation of a new array with iterated values from another array that satisfy a given criterion;
  • Sequential search: attempt to discover whether a value exists (or does not exist) in an array;
  • Sorting and binary search: sorting of values according to a rule (for instance, increasing or decreasing order);
  • Copy (duplication or clone): generation of a new array that is identical to the original.

Libraries of programming languages usually provide predefined implementations for many of the previous operations.

In programming languages that implement variable size arrays or as lists, there are three additional operations:

  • Insert (add, append, push) a new element (potentially increasing the size of the array or list);
  • Remove (delete, pop) an existing element (potentially reducing the size of the array or list);
  • Clear (empty) the array, that is, remove all stored values.

With conditional and repetition structures, it is possible to implement the previous operations, if the language does not provide them.

In many programming languages, it is important noticing that arrays are commonly passed by reference to subroutines. In other words, modification of data inside subroutines are usually persistent, because they change the values using the address of the original variable instead of a copy of it. This allows the creation of subroutines to initialize and change values of arrays.

In the following examples, the Lua version of the code will often be longer than the other because it includes some useful subroutines to manipulate arrays. Besides, it is important noticing that arrays in Lua start at the index 1 and end at size. In JavaScript, Python and GDScript, arrays start at 0 and end at size -1.

Declaration and Initialization

Programming languages on which arrays are static (with fixed size) define the size of an array at its declaration. Programming languages with dynamic arrays or lists allow to resize the array according to the stored elements.

JavaScript, Python, Lua and GDScript match the second case. JavaScript, Python and GDScript allow to specify the size of an array at the declaration or with an appropriate subroutine. In Lua (and also in the other languages), it is possible to create an empty array and add new elements until the desired size is reached.

let array = new Array(100)
console.log(array.length)
// Values start as undefined.
console.log(array[0])

// Alternative:
array = []
for (let i = 0; i < 100; ++i) {
    array.push(0)
}

console.log(array.length)
console.log(array)
array = 100 * [0] # 0: initial value; it could be any other.
print(len(array))
# Values start as the initial chosen value.
print(array[0])

# Alternative:
array = []
for i in range(100):
    array.append(0)

print(len(array))
print(array)
-- Procedure to write arrays (as in other languages).
-- A better implementation could be recursive, to allow writing arrays stored
-- with arrays.
-- An example will be provided later for dictionaries in `write_table()`
-- (which can be used both for arrays and for dictionaries, as both as tables
-- in Lua).
function write_array(array)
    if (array == nil) then
        return
    end

    local size = #array
    io.write("[")
    for index = 1, size do
        io.write(array[index])
        if (index < size) then
            io.write(", ")
        end
    end
    print("]")
end

-- The example starts here.
local array = {}for i = 1, 100 do
    -- Alternative: table.insert(array, 0)
    array[#array + 1] = 0
end

print(#array)
write_array(array)
extends Node

func _ready():
    var array = []
    array.resize(100)
    print(len(array))
    # Values start as null.
    print(array[0])

    # Alternative:
    array = []
    for i in range(100):
        array.append(0)

    print(len(array))
    print(array)

For value insertion, the subroutines used are:

If one wanted to create an array with 1000 memory position instead of 1000, she/he would only have to change the number. It is also possible to define the size using a constant (for instance, ARRAY_SIZE or ARRAY_LENGTH).

The alternative way is valid for the four programming languages and allow to choose an initial value for each memory position. In the example, all positions were initialized with 0. The value can be changed according to the needs for a given problem (for instance, "", False, 0.0, "Franco").

It should be noted that not every language define an initial value for pre-allocations based on size. For instance, C and C++ does not initialize arrays (except with they were explicitly initialized with a construction such as {}); the initial value will be undetermined, for each position will have whatever values were present in the memory addresses before the declaration. These values are, at time, called trash. Therefore, one must assume an initial value for newly-created arrays if the language does not provide guarantees of initializing them.

JavaScript, Python and GDScript allow to write the values stored in an array using console.log() or print(), though Lua writes the address of the array. Thus, Lua examples provide a procedure called write_array() to write the values stored in an array in a simplified way. A more robust implementation could be recursive and it would consider other possibly values stored in tables (such as dictionaries). Another alternative to write simple arrays is using table.unpack() (or simply unpack(), depending on Lua's version, such as 5.1; documentation).

print(table.unpack({1, 2, 3}))
local array = {1, 2, 3}
print(table.unpack(array))

Nevertheless, table.unpack() does not write the elements recursively, either. For instance, table.unpack({{1, 1}, 2, 3}) will write an address instead of 1 1 for the first array.

Initizalization of Few Values

For few values or values that are known beforehand (at programming time), many programming languages allow initializing initialization directly on the declaration of an array.

let array = [1.1, 2.2, 3.3, 4 * 1.1, 1.1 + 1.1 + 1.1 + 1.1 + 1.1]
console.log(array)
array = [1.1, 2.2, 3.3, 4 * 1.1, 1.1 + 1.1 + 1.1 + 1.1 + 1.1]
print(array)
function write_array(array)
    if (array == nil) then
        return
    end

    local size = #array
    io.write("[")
    for index = 1, size do
        io.write(array[index])
        if (index < size) then
            io.write(", ")
        end
    end
    print("]")
end

local array = {1.1, 2.2, 3.3, 4 * 1.1, 1.1 + 1.1 + 1.1 + 1.1 + 1.1}
write_array(array)
extends Node

func _ready():
    var array = [1.1, 2.2, 3.3, 4 * 1.1, 1.1 + 1.1 + 1.1 + 1.1 + 1.1]
    print(array)

In programming languages with fixed size arrays, the compiler or interpreter is usually able to infer the correct size of the array based on the defined elements.

Some programming languages required the last value not to end with a comma (for instance, [1, 2, 3]). However, modern languages or recent versions may allow such construction (for instance, [1, 2, 3,]). The use of a comma after the last value can be practical when used along version control systems, such as git, and commands to highlight differences between files, such as diff.

The diff command allow highlighting differences between two different files, which is used by version systems to highlight modifications between the previous version and a new version of a file. For instance, the modification of a code in the file 1.js, as defined as follows:

let v = [
    1,
    2,
    3,
]

To a file 2.js, defined as follows:

let v = [
    1,
    2,
    3,
    4,
]

Would only highlight the inclusion of the new line, instead of highlighting the deletion of the line 3 and the insertion of two lines (3 and 4) in a command like diff 1.js 2.js.

4a5
>     4,

For a version without a comma in the last value, the result of the difference would be as follows:

4c4,5
<     3
---
>     3,
>     4

Although the difference is small, some people prefer a cleaner difference history (something that can be useful for practices such as code reviews). Thus, it can be interesting to end the last value with a comma for this purpose, for it reduces noise and allows focusing on more significant differences.

Recent versions of JavaScript, Python, Lua and GDScript allows using a comma in the last element.

Lua and Arrays

Arrays in Lua actually are a special case of using a data structure called table in the language, optimized to use a numeric index. This is Thus, the language uses curly brackets both to declare vectors and to declare dictionaries (both are tables). A pair of square brackets in Lua does not declare an empty array.

A result from that choice is that Lua allows initialization of values for any index, defining an sparse array. In a sparse array, memory positions are allocated only for indices that store non-empty values, to avoid memory waste. For instance, writing local v = {} followed by v[123] = 123 is valid in Lua, albeit it would result into an error in many other programming languages. The resulting array would have a single defined element, at the index 123. However, the length operator # will not return the correct value for the size. In Lua 5.1, it is possible to use table.maxn() to obtain the largest numeric index of a table (documentation). In more recent versions, it is necessary to implement a subroutine to perform the same processing.

-- For Lua 5.2 or more recent.
-- For Lua 5.1, one can use table.maxn().
function maxn(a_table)
    local result = 0
    for key in pairs(a_table) do
        if (type(key) == "number") then
            if (key > result) then
                result = key
            end
        end
    end

    return result
end

-- If one wants to add maxn() to access it as table.maxn().
table.maxn = table.maxn or maxn

local v = {}
v[123] = 123
print(v[123])
-- The lenght operator will not inform the correct value until values are
-- inserted up to the defined index.
print(#v)
-- As an alternative, one can use `table.maxn()` (Lua 5.1).
-- The line is also valid if one has added the implementation of maxn() to
-- table.maxn.
-- print(table.maxn(v))
print(maxn(v))

v[1] = 1
v[2] = 1
v[3] = 1
print(#v, maxn(v))

In general, to perform the same operation in other programming languages, it would be necessary to define an array with, at least, 123 memory positions or add 123 elements to a list before being able to use the index. The other 122 memory positions would not be utilized, although they would still require memory. As an alternative, one could use a dictionary to create a similar structure.

Insertions, Removals and Cleaning

In dynamic lists or arrays, it is possible to modify the number of stored elements by using operations to insert and remove values. One can start with an empty array (for instance, [] in JavaScript, Python e GDScript, or {} in Lua), and add value as required, or she/he can start with a few elements in the array.

Insertions and removals can happen in the beginning, end or at an arbitrary position between the first and last one.

Insertions in arbitrary positions can be called add, insert or push. Insertions at the end are commonly named append or push back. Insertions at the end are commonly named prepend or push front.

Removals at the end are commonly called pop or pop back. At the beginning, they are commonly named pop front. The removal of all values is commonly called cleaning, clear or empty.

It can be interesting to reproduce the following examples line by line, to watch each result. Evidently, blocks can be reproduced at once for conditional or repetition structures and subroutines.

let array = [] // It can also be performed as array = new Array()
// Insertion at the end.
array.push(1)
array.push(2)
array.push(3)
// Write or print.
console.log(array) // [1, 2, 3]

// Removal at the end.
let removed_value = array.pop() // 3
console.log(array) // [1, 2]

// Cleaning.
// JavaScript does not provide a predefined function for cleaning.
while (array.length > 0) {
    array.pop()
}
console.log(array) // []

// Arbitrary insertions and removals.

// Initial array.
array.push(1)
array.push(3)
console.log(array) // [1, 3]

// Insertion at the beginning
array.unshift(0)
console.log(array) // [0, 1, 3]
// Removal at the beginning.
removed_value = array.shift()
console.log(array) // [1, 3]
// Insertion at arbitrary position.
array.splice(1, 0, 2) // Índice, número de valuees a remover, value a inserir.
console.log(array) // [1, 2, 3]
// Removal at arbirary position.
removed_value = array.splice(2, 1) // 2 é o índice, 1 para remover um value.
console.log(array) // [1, 2]
array = [] # It can also be performed as array = list()
# Insertion at the end.
array.append(1)
array.append(2)
array.append(3)
# Write or print.
print(array) # [1, 2, 3]

# Removal at the end.
removed_value = array.pop() # 3
print(array) # [1, 2]

# Cleaning.
array.clear()
print(array) # []

# Arbitrary insertions and removals.

# Initial array.
array.append(1)
array.append(3)
print(array) # [1, 3]

# Insertion at the beginning
array.insert(0, 0) # Índice, value.
print(array) # [0, 1, 3]
# Removal at the beginning.
removed_value = array.pop(0) # Índice.
print(array) # [1, 3]
# Insertion at arbitrary position.
array.insert(1, 2) # Índice, value a se inserir.
print(array) # [1, 2, 3]
# Removal at arbirary position.
removed_value = array.pop(2) # 2 é o índice.
print(array) # [1, 2]
function write_array(array)
    if (array == nil) then
        return
    end

    local size = #array
    io.write("[")
    for index = 1, size do
        io.write(array[index])
        if (index < size) then
            io.write(", ")
        end
    end
    print("]")
end

local array = {}
-- Insertion at the end.
table.insert(array, 1)
array[#array + 1] = 2 -- Alternative.
table.insert(array, #array + 1, 3) -- Equivalent construction.
-- Write or print.
write_array(array) -- [1, 2, 3]

-- Removal at the end.
local removed_value = table.remove(array) -- 3
write_array(array) -- [1, 2]

-- Cleaning.
-- Lua does not provide a predefined function for cleaning.
while (#array > 0) do
    table.remove(array)
end
write_array(array) -- []

-- Arbitrary insertions and removals.

-- Initial array.
table.insert(array, 1)
table.insert(array, 3)
write_array(array) -- [1, 3]

-- Insertion at the beginning
table.insert(array, 1, 0) -- Índice, value.
write_array(array) -- [0, 1, 3]
-- Removal at the beginning.
removed_value = table.remove(array, 1)
write_array(array) -- [1, 3]
-- Insertion at arbitrary position.
table.insert(array, 2, 2) -- Índice, value a se inserir.
write_array(array) -- [1, 2, 3]
-- Removal at arbirary position.
removed_value = table.remove(array, 3) -- 3 é o índice.
write_array(array) -- [1, 2]
extends Node

func _ready():
    var array = [] # It can also be performed as array = Array()
    # Insertion at the end.
    array.append(1)
    array.push_back(2) # Alternative.
    array.append(3)
    # Write or print.
    print(array) # [1, 2, 3]

    # Removal at the end.
    var removed_value = array.pop_back() # 3
    print(array) # [1, 2]

    # Cleaning.
    array.clear()
    print(array) # []

    # Arbitrary insertions and removals.

    # Initial array.
    array.append(1)
    array.append(3)
    print(array) # [1, 3]

    # Insertion at the beginning
    array.push_front(0) # Índice, value.
    print(array) # [0, 1, 3]
    # Removal at the beginning.
    removed_value = array.pop_front()
    print(array) # [1, 3]
    # Insertion at arbitrary position.
    array.insert(1, 2) # Índice, value a se inserir.
    print(array) # [1, 2, 3]
    # Removal at arbirary position.
    removed_value = array.pop_at(2) # 2 é o índice.
    print(array) # [1, 2]

The example introduces several subroutines. For the author's convenience, the documentation links will refer to a page with them all, instead of links to each one. My recommendation is to consult other listed subroutines in the pages as well, to know about other resources that the programming language of your choice provides for array manipulation. Some languages, such as Python, will provide several predefined operations; others, such as Lua, will provide fewer. Lua has a minimalist standard library; for this material, this minimalism can actually be advantageous, for it will allow illustrating how to implement some features in programming languages that do not provide the features as part of the standard library. After all, the possibility of implementing features using code is an important differential between software and hardware; hence the prefix soft (malleable, adaptable).

  • JavaScript: push(), pop(), unshift(), shift(), splice(): documentation;
  • Python: append(), pop(), clear(), insert(): documentation;
  • Lua: table.insert(), table.remove(): documentation;
  • GDScript: append(), push_back(), pop_back(), clear(), push_front(), pop_front(), insert() and pop_at(): documentation.

In programming languages that not provide a subroutine for cleaning, the removal operation can be repeated until the array becomes empty (in other words, that its size becomes zero).

Nevertheless, why one should not reassign an empty value to a variable to clean it? The answer depends on the context. Some programming languages not allow to assign a new array to a variable that stores one.

In Particular, this is not the case for JavaScript, Python, Lua and GDScript, which do allow assigning a new empty array. Then, a second factor must be considered: does the array share its memory?

If it is not, cleaning the existing array or assign a new one will have the same result: an empty array for the variable. However, if the array was shared, the other variables will keep the values from the original array. This will become clearer in the subsection discussing copies and shared memory.

Some programming languages provide predefined subroutines for sequential search. In programming languages that do not provide them, such as Lua, the implementation is simple: it suffices to iterate over the array and compare the value at the current index with the searched value. If the values are the same, the index is returned. Otherwise, the iteration continues to the next value. If the value is not found until the end of the array, a value is returned (for instance, -1) to inform that the element does not exist in the array.

let array = [1, 2, 3, 4, 5]

// Sequential search.
let search_index = array.indexOf(2)
console.log((search_index !== -1), search_index) // true 1
array = [1, 2, 3, 4, 5]

# Sequential search.
try:
    search_index = array.index(2)
    print(search_index) # 1
except ValueError:
    print("Value was not found.")

# To check whether a value exists in the array, one can use:
# if (value in array):
#     ...
function write_array(array)
    if (array == nil) then
        return
    end

    local size = #array
    io.write("[")
    for index = 1, size do
        io.write(array[index])
        if (index < size) then
            io.write(", ")
        end
    end
    print("]")
end

function sequential_search(array, value)
    for index, current_value in ipairs(array) do
        if (value == current_value) then
            return index
        end
    end

    return -1
end

local array = {1, 2, 3, 4, 5}

-- Sequential search.
-- Lua does not have a predefined function for sequential search.
local search_index = sequential_search(array, 2)
print((search_index ~= -1), search_index) -- true 2
extends Node

func _ready():
    var array = [1, 2, 3, 4, 5]

    # Sequential search.
    var search_index = array.find(2)
    printt((search_index != -1), search_index) # true 1

The example in Lua illustrates the creation of a subroutine to perform sequential search. As anticipated, examples in Lua will usually be longer due the definition of subroutines to implement features that exist in other programming languages. It can be a good idea to group all the defined function in a file as library for future use.

Accumulators

The use of accumulators with arrays is similar to the use in repetition structures. The main difference is that the values of interest are stored in an array.

let array = [1, 2, 2, 3, 3, 3]

// Accumulation.
let accumulator = 0
for (const value of array) {
    accumulator += value
}
console.log(accumulator) // 14

let string_accumulator = ""
for (const value of array) {
    string_accumulator += value.toString()
    string_accumulator += " "
}
console.log(string_accumulator) // 1 2 2 3 3 3
array = [1, 2, 2, 3, 3, 3]

# Accumulation.
accumulator = 0
for value in array:
    accumulator += value

print(accumulator) # 14

string_accumulator = ""
for value in array:
    string_accumulator += str(value)
    string_accumulator += " "

print(string_accumulator) # 1 2 2 3 3 3
function write_array(array)
    if (array == nil) then
        return
    end

    local size = #array
    io.write("[")
    for index = 1, size do
        io.write(array[index])
        if (index < size) then
            io.write(", ")
        end
    end
    print("]")
end

local array = {1, 2, 2, 3, 3, 3}

-- Accumulation.
local accumulator = 0
for _, value in ipairs(array) do
    accumulator = accumulator + value
end
print(accumulator) -- 14

local string_accumulator = ""
for _, value in ipairs(array) do
    string_accumulator = string_accumulator .. value .. " "
end
print(string_accumulator) -- 1 2 2 3 3 3
extends Node

func _ready():
    var array = [1, 2, 2, 3, 3, 3]

    # Accumulation.
    var accumulator = 0
    for value in array:
        accumulator += value

    print(accumulator) # 14

    var string_accumulator = ""
    for value in array:
        string_accumulator += str(value)
        string_accumulator += " "

    print(string_accumulator) # 1 2 2 3 3 3

Accumulators can be of non-numeric types, as illustrated in string_accumulator. In the provided example, the variable concatenates all values stored in the array, using a space as a delimiter (sequence of characters used to separate items) between values.

In function programming, this operation is known as reduce.

Selecting or Filtering Values

An operation that is similar to accumulation is selecting or filtering values of an array. The difference is that, instead of combining every item of interest as a single value (usually scalar), a new array storing values of interest is created.

As an example, one can consider the situation of filtering all odd numbers from an array.

let numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

// Selection or filtering.
let odd_numbers = []
for (const number of numbers) {
    if (number % 2 === 1) {
        odd_numbers.push(number)
    }
}

console.log(odd_numbers)
numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Selection or filtering.
odd_numbers = []
for number in numbers:
    if (number % 2 == 1):
        odd_numbers.append(number)

print(odd_numbers)
function write_array(array)
    if (array == nil) then
        return
    end

    local size = #array
    io.write("[")
    for index = 1, size do
        io.write(array[index])
        if (index < size) then
            io.write(", ")
        end
    end
    print("]")
end

local numbers = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

-- Selection or filtering.
odd_numbers = {}
for _, number in ipairs(numbers) do
    if (number % 2 == 1) then
        table.insert(odd_numbers, number)
    end
end

write_array(odd_numbers)
extends Node

func _ready():
    var numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

    # Selection or filtering.
    var odd_numbers = []
    for number in numbers:
        if (number % 2 == 1):
            odd_numbers.append(number)

    print(odd_numbers)

Instead of only writing the value as it was done until the last topic, now it is possible to select and store values of interest in a collection. This allows manipulating them afterwards, without the need to repeat the loop and the data selection. An advantage of the approach is the greater convenience and the potential improved performance. A disadvantage is the greater memory consumption, due to storing the values in the new array.

In function programming, this operation is commonly referred as filter.

Sorting can be performed in the array itself (called in-place) or in a sorted copy returned as a result (called out-of-place). For out-of-place sorting, one can assign the result to the original variable if she/he wants to save the sorted version.

Binary search allows searching for a value faster and more efficiently than sequential search, though it requires that the array is sorted. The strategy for binary search was previously presented intuitively to guess numbers in Repetition structures (or loops), in the program using a pseudorandom number generator.

For reference, the Big Oh notation complexity of sequential search is (linear), while the notation for binary search is (logarithmic). For an array with 1 million positions, the worst case scenario for sequential search requires 1 million iterations. In binary search, the worst case will be , or around 20 iterations.

The difference suggests the importance of sorting and binary search in programming. They can significantly improve the performance of programs that must work with large amounts of data.

Also for reference, efficient sorting algorithms (such as quicksort and mergesort) have complexity . In other words, it is not always faster to sort an array to, afterwards, search for a value. If the goal is searching for a single value, for instance, it can be faster to perform a sequential search. On the other hand, if the array is already sorted, a binary search normally will be significantly faster than a sequential search for large arrays. Likewise, for problems requiring multiple search operations, it can be worth to sort the array first then perform binary search.

Programming language libraries commonly provide a generic implementation for sorting values. In general, the standard subroutine will order values in ascending order (from the smallest to the largest), normally providing some features to modify the sorting criteria (which is particularly useful for composite types such as records). The feature is often a function order(x, y) defined with the following return values:

  • If the value is negative, x must appear before y;
  • If the value is positive, x must appear after y;
  • If the value is zero, the order can be kept.

For numeric values, an easy way to generate the result as an expression for ascending order sorting is returning x - y. For strings, a function defining alphabetic order can be defined. It must be decided whether lowercase letter should appear before, after or in the same order of uppercase ones. The criteria that are habitually used in the daily life is called natural sorting (or natural sort order).

In Python, Lua and GDScript, the standard implementation for sorting is often the expected one. In JavaScript, however, the use of the standard subroutine will result in a surprise if the array has negative and positive values. To avoid the problem, it is necessary to provide a function with a sorting criterion. The function can be predefined or anonymous (lambda).

function ascending_order(x, y) {
    // x > y -> x - y > 0 (x must appear after y)
    // x < y -> x - y < 0 (x must appear before y)
    // x === y -> x - y = 0 (both are equal, the order can be kept)
    return (x - y)
}

let array = [7, 3, -3, -7, 9, 0]

// Sorting.
console.log(array) // [7, 3, -3, -7, 9, 0]
array = array.sort() // Out-of-place.
// Wrong:
console.log(array) // [-3, -7, 0, 3, 7, 9]
// Correct:
array = array.sort(ascending_order) // Out-of-place.
console.log(array) // [-7, -3, 0, 3, 7, 9]

// Alternatives:
// Lambda function.
array = [7, 3, -3, -7, 9, 0].sort(function(x, y) { // Out-of-place.
    return (x - y)
})
console.log(array) // [-7, -3, 0, 3, 7, 9]

// Lambda function with modern syntax.
// (x, y) is the parameter list.
// => means return
// (x - y) is the expression with the return value.
array = [7, 3, -3, -7, 9, 0].sort((x, y) => (x - y)) // Out-of-place.
console.log(array) // [-7, -3, 0, 3, 7, 9]

// Binary search.
// JavaScript does not provide a predefined function for binary search.
array = [7, 3, -3, -7, 9, 0]

# Sorting.
print(array) # [7, 3, -3, -7, 9, 0]
array.sort() # In-place
print(array) # [-7, -3, 0, 3, 7, 9]

# Binary search.
# Python does not provide a predefined function for binary search.
function write_array(array)
    if (array == nil) then
        return
    end

    local size = #array
    io.write("[")
    for index = 1, size do
        io.write(array[index])
        if (index < size) then
            io.write(", ")
        end
    end
    print("]")
end

local array = {7, 3, -3, -7, 9, 0}

-- Sorting.
write_array(array) -- [7, 3, -3, -7, 9, 0]
table.sort(array) -- In-place
write_array(array) -- [-7, -3, 0, 3, 7, 9]

-- Binary search.
-- Lua does not provide a predefined function for binary search.
extends Node

func _ready():
    var array = [7, 3, -3, -7, 9, 0]

    # Sorting.
    print(array) # [7, 3, -3, -7, 9, 0]
    array.sort() # In-place
    print(array) # [-7, -3, 0, 3, 7, 9]

    # Binary search.
    var array_size = len(array)
    var search_value = 7
    var search_index = array.bsearch(search_value)
    printt((search_index < array_size) and (array[search_index] == search_value), search_index)

    search_value = -4
    search_index = array.bsearch(search_value)
    printt((search_index < array_size) and (array[search_index] == search_value), search_index)

    search_value = 77
    search_index = array.bsearch(search_value)
    printt((search_index < array_size) and (array[search_index] == search_value), search_index)

For sorting, the examples used:

On the other hand, there are programming languages that do not provide predefined implementations for binary search. JavaScript, Python and Lua do not provide a subroutine in the standard library. GDScript, however, provides bsearch() (documentation). It should be noted that, case bsearch() does not find the value in the array, it returns the position where value should be inserted to keep the array sorted. Thus, one must check the stored value in the provided index to ensure that it does exist in the array.

Binary search is an important algorithm in programming. It is, thus, convenient to know how to implement it.

A recursive_binary_search() subroutine can have the parameters array, a value to search, the starting_index and the ending_index for the search. It can return an integer value as the index of the result (if found) or a negative value if value does not exist in array. The array should store values of the same type of value. For convenience of use, an additional function to perform the initial call can be defined as binary_search(array, value).

To implement a binary search, one compares the value of the median (the central value of a sample) with the searched value. If the value is equal, the index of the median is returned (middle_index). Otherwise, there are three situations to consider:

  • If the searched value is less than the median's, the value must be between the initial position of the search and the position of the median. In other words, the search can be ignored from the index of the median onwards. This, one can perform a new binary search from starting_index to middle_index - 1.
  • If the searched value is greater than the value of the median, the value must be between the position of the median and the final position of the search. Thus, one can perform a new binary search from middle_index + 1 to ending_index.
  • If starting_index is less than ending_index, the searched value was not found.
function recursive_binary_search(array, value, starting_index, ending_index) {
    if (starting_index > ending_index) {
        return -1
    }

    let middle_index = Math.floor((starting_index + ending_index) / 2)
    let middle_value = array[middle_index]
    if (value === middle_value) {
        return middle_index
    } else if (value < middle_value) {
        return recursive_binary_search(array, value, starting_index, middle_index - 1)
    } else {
        return recursive_binary_search(array, value, middle_index + 1, ending_index)
    }
}

function binary_search(array, value) {
    return recursive_binary_search(array, value, 0, array.length - 1)
}

let array = [7, 3, -3, -7, 9, 0]

// Sorting.
console.log(array) // [7, 3, -3, -7, 9, 0]
array = array.sort((x, y) => (x - y)) // Out-of-place.
console.log(array) // [-7, -3, 0, 3, 7, 9]

// Binary search.
console.log(binary_search(array, -3))
console.log(binary_search(array, 0))
console.log(binary_search(array, 9))
console.log(binary_search(array, -4))
console.log(binary_search(array, 77))
def recursive_binary_search(array, value, starting_index, ending_index):
    if (starting_index > ending_index):
        return -1

    middle_index = (starting_index + ending_index) // 2
    middle_value = array[middle_index]
    if (value == middle_value):
        return middle_index
    elif (value < middle_value):
        return recursive_binary_search(array, value, starting_index, middle_index - 1)
    else:
        return recursive_binary_search(array, value, middle_index + 1, ending_index)

def binary_search(array, value):
    return recursive_binary_search(array, value, 0, len(array) - 1)

array = [7, 3, -3, -7, 9, 0]

# Sorting.
print(array) # [7, 3, -3, -7, 9, 0]
array.sort() # In-place
print(array) # [-7, -3, 0, 3, 7, 9]

# Binary search.
print(binary_search(array, -3))
print(binary_search(array, 0))
print(binary_search(array, 9))
print(binary_search(array, -4))
print(binary_search(array, 77))
function write_array(array)
    if (array == nil) then
        return
    end

    local size = #array
    io.write("[")
    for index = 1, size do
        io.write(array[index])
        if (index < size) then
            io.write(", ")
        end
    end
    print("]")
end

function recursive_binary_search(array, value, starting_index, ending_index)
    if (starting_index > ending_index) then
        return -1
    end

    -- Lua 5.3 or more recent.
    local middle_index = (starting_index + ending_index) // 2
    -- Older versions:
    -- local middle_index = math.floor((starting_index + ending_index) / 2)
    local middle_value = array[middle_index]
    if (value == middle_value) then
        return middle_index
    elseif (value < middle_value) then
        return recursive_binary_search(array, value, starting_index, middle_index - 1)
    else
        return recursive_binary_search(array, value, middle_index + 1, ending_index)
    end
end

function binary_search(array, value)
    return recursive_binary_search(array, value, 1, #array)
end

local array = {7, 3, -3, -7, 9, 0}

-- Sorting.
write_array(array) -- [7, 3, -3, -7, 9, 0]
table.sort(array) -- In-place
write_array(array) -- [-7, -3, 0, 3, 7, 9]

-- Binary search.
print(binary_search(array, -3))
print(binary_search(array, 0))
print(binary_search(array, 9))
print(binary_search(array, -4))
print(binary_search(array, 77))
extends Node

func recursive_binary_search(array, value, starting_index, ending_index):
    if (starting_index > ending_index):
        return -1

    var middle_index = (starting_index + ending_index) / 2
    var middle_value = array[middle_index]
    if (value == middle_value):
        return middle_index
    elif (value < middle_value):
        return recursive_binary_search(array, value, starting_index, middle_index - 1)
    else:
        return recursive_binary_search(array, value, middle_index + 1, ending_index)

func binary_search(array, value):
    return recursive_binary_search(array, value, 0, len(array) - 1)

func _ready():
    var array = [7, 3, -3, -7, 9, 0]

    # Sorting.
    print(array) # [7, 3, -3, -7, 9, 0]
    array.sort() # In-place
    print(array) # [-7, -3, 0, 3, 7, 9]

    # Binary search.
    print(binary_search(array, -3))
    print(binary_search(array, 0))
    print(binary_search(array, 9))
    print(binary_search(array, -4))
    print(binary_search(array, 77))

Before performing a binary search, the array must be sorted (if it is not already sorted). To better understand the algorithm, it can be interesting to perform a manual execution with a trace table.

Binary search is efficient because it effectively discards the search on half of the values of the array every recursive call. Hence, the complexity is .

Once again, sorting and binary search are import resources for programming, because they allow solving large problems efficiently.

Shallow Copy, Deep Copy and Shared Memory

Copies in programming languages are not always as simple as one could think. This happens because the assignment operators in some programming languages does not perform copies for composite date types, unlike as it does with primitive types.

A copy in programming can be a shallow copy, if one or more values share references (memory addresses), or a deep copy, if they values are duplicated.

A shallow copy normally duplicate primitive type values, though it shares values of composite types (as arrays or records). Thus, changing a value that is of a composite type affect every other copy.

A deep copy effectively duplicates all stored values, regardless of type. That is, each deep copy is unique and independent of the others. Thus, changing a value in a deep copy affects only the copy on which the operation has been performed.

There still is another situation that requires attention. Many programming languages allow sharing memory between many variables when the assignment operator is used. In this case, all variables are references to a very same memory address. In other words, a modification in any variable affects every other.

In JavaScript, Python, Lua and GDScript, the assigning operator for arrays perform memory sharing. Shallow and deep copies require using special subroutines. Subroutines for shallow copies can use a repetition structure to copy, index by index, all values of an array. If one of the values is of a composite type (for instance, an array of arrays), its memory will be shared. Subroutines for deep copies copy values recursively. In other words, if the value stored an index is of a composite type, the subroutine calls itself to perform a deep copy of the value.

let array = [1, 2, 3, 4, 5]

console.log("Shared memory")
let shared_memory = array
array[0] *= -10
shared_memory[1] *= -100
console.log("array", array)
console.log("shared_memory", shared_memory)

console.log("Shallow copy")
let shallow_copy = Array.from(array)
// Alternatives:
// shallow_copy = [...array] // Spread operator.
// shallow_copy = array.slice()
shallow_copy[2] *= -1000
console.log("array", array)
console.log("shared_memory", shared_memory)
console.log("shallow_copy", shallow_copy)

console.log("Deep copy")
// WARNING: This has limitations. Refer to the notes.
let deep_copy = JSON.parse(JSON.stringify(array))
deep_copy[3] *= -10000
console.log("array", array)
console.log("shared_memory", shared_memory)
console.log("shallow_copy", shallow_copy)
console.log("deep_copy", deep_copy)
import copy # For copy() and deepcopy().

array = [1, 2, 3, 4, 5]

print("Shared memory")
shared_memory = array
array[0] *= -10
shared_memory[1] *= -100
print("array", array)
print("shared_memory", shared_memory)

print("Shallow copy")
shallow_copy = copy.copy(array)
# Alternatives:
# shallow_copy = array.copy()
# shallow_copy = array[:]
shallow_copy[2] *= -1000
print("array", array)
print("shared_memory", shared_memory)
print("shallow_copy", shallow_copy)

print("Deep copy")
deep_copy = copy.deepcopy(array)
deep_copy[3] *= -10000
print("array", array)
print("shared_memory", shared_memory)
print("shallow_copy", shallow_copy)
print("deep_copy", deep_copy)
function write_array(array)
    if (array == nil) then
        return
    end

    local size = #array
    io.write("[")
    for index = 1, size do
        io.write(array[index])
        if (index < size) then
            io.write(", ")
        end
    end
    print("]")
end

-- <http://lua-users.org/wiki/CopyTable>
function shallowcopy(orig)
    local orig_type = type(orig)
    local copy
    if orig_type == 'table' then
        copy = {}
        for orig_key, orig_value in pairs(orig) do
            copy[orig_key] = orig_value
        end
    else -- number, string, boolean, etc
        copy = orig
    end
    return copy
end

-- <http://lua-users.org/wiki/CopyTable>
-- Save copied tables in `copies`, indexed by original table.
function deepcopy(orig, copies)
    copies = copies or {}
    local orig_type = type(orig)
    local copy
    if orig_type == 'table' then
        if copies[orig] then
            copy = copies[orig]
        else
            copy = {}
            copies[orig] = copy
            for orig_key, orig_value in next, orig, nil do
                copy[deepcopy(orig_key, copies)] = deepcopy(orig_value, copies)
            end
            setmetatable(copy, deepcopy(getmetatable(orig), copies))
        end
    else -- number, string, boolean, etc
        copy = orig
    end
    return copy
end

local array = {1, 2, 3, 4, 5}

write_array("Shared memory")
local shared_memory = array
array[1] = array[1] * -10
shared_memory[2] = shared_memory[2] * -100
write_array("array", array)
write_array("shared_memory", shared_memory)

write_array("Shallow copy")
local shallow_copy = shallowcopy(array)
shallow_copy[3] = shallow_copy[3] * -1000
write_array("array", array)
write_array("shared_memory", shared_memory)
write_array("shallow_copy", shallow_copy)

write_array("Deep copy")
local deep_copy = deepcopy(array)
deep_copy[4] = deep_copy[4] * -10000
write_array("array", array)
write_array("shared_memory", shared_memory)
write_array("shallow_copy", shallow_copy)
write_array("deep_copy", deep_copy)
extends Node

func _ready():
    var array = [1, 2, 3, 4, 5]

    print("Shared memory")
    var shared_memory = array
    array[0] *= -10
    shared_memory[1] *= -100
    print("array", array)
    print("shared_memory", shared_memory)

    print("Shallow copy")
    var shallow_copy = array.duplicate(false)
    shallow_copy[2] *= -1000
    print("array", array)
    print("shared_memory", shared_memory)
    print("shallow_copy", shallow_copy)

    print("Deep copy")
    var deep_copy = array.duplicate(true)
    deep_copy[3] *= -10000
    print("array", array)
    print("shared_memory", shared_memory)
    print("shallow_copy", shallow_copy)
    print("deep_copy", deep_copy)

Subroutines:

The version for JavaScript has limitations, case the array stores function definitions or the infinite float value. For a more robust implementation, one can use, for instance, cloneDeep() of the Lodash library.

The implementations of shallowcopy() and deepcopy() for Lua can be obtained from lua-users.org, in Copy Table. The implementation of deepcopy() uses the recursive approach that was previously mentioned. As it uses more advanced features (such as metatable), it will not be discussed in this topic.

In the example, as array is an array of integer numbers (that is, an array of a primitive type), a shallow copy works as well as a deep copy. However, if the array stored values defined by reference (for instance, a matrix such as [[1, 2, 3], [4, 5, 6], [7, 8, 9]]), only a deep copy would enable changing the values of the internal lists without affecting the other copies. An example of the difference between shallow and deep copies for data structures containing values stored as references will be provided in the section describing copies and memory sharing in dictionaries. The principles are the same for arrays, dictionaries and other data with variables stored as references.

Parallel Arrays (or Structure of Arrays)

A technique used in high performance programming is called parallel arrays, also known as structures of arrays (SoA). Parallel arrays use a same index to relate data referring to same entity. Simply, the technique allows writing high performance systems as it can use the cache memory of processors efficiently.

let polygon_names = ["Triangle", "Square", "Rectangle", "Pentagon", "Hexagon"]
let polygon_sides = [3, 4, 4, 5, 6]
for (let index = 0; index < polygon_names.length; ++index) {
    let name = polygon_names[index]
    let slides = polygon_sides[index]
    console.log("A ", name, " has ", slides, " slides.")
}
polygon_names = ["Triangle", "Square", "Rectangle", "Pentagon", "Hexagon"]
polygon_sides = [3, 4, 4, 5, 6]
for index in range(len(polygon_names)):
    name = polygon_names[index]
    slides = polygon_sides[index]
    print("A ", name, " has ", slides, " slides.")

# Alternative.
for name, lado in zip(polygon_names, polygon_sides):
    print("A ", name, " has ", slides, " slides.")
local polygon_names = {"Triangle", "Square", "Rectangle", "Pentagon", "Hexagon"}
local polygon_sides = {3, 4, 4, 5, 6}
for index = 1, #polygon_names do
    local name = polygon_names[index]
    local slides = polygon_sides[index]
    print("A ", name, " has ", slides, " slides.")
end
extends Node

func _ready():
    var polygon_names = ["Triangle", "Square", "Rectangle", "Pentagon", "Hexagon"]
    var polygon_sides = [3, 4, 4, 5, 6]
    for index in range(len(polygon_names)):
        var name = polygon_names[index]
        var slides = polygon_sides[index]
        print("A ", name, " has ", slides, " slides.")

In the previous example, polygon_names and polygon_sides use a same index to refer to a very same polygon. In the example, the first value of each array refers to a triangle. If there was more data to represent, one could define a new array and use the same index for each considered polygon. For instance, internal_angles_sum could store the internal sum of angles for each polygon. The array would resemble: [180.0, 360.0, 360.0, 540.0, 720.0].

To make it easier to read the code, it is possible to align the values of the arrays. For instance:

let polygon_names       = ["Triangle", "Square", "Rectangle", "Pentagon", "Hexagon"]
let polygon_sides       = [3,           4,       4,           5,          6]
let internal_angles_sum = [180.0,       360.0,   360.0,       540.0,      720.0]

The example has also introduced an additional iterator for Python. In Python, zip() (documentation) provides a practical way to access values of each array using iterators. The name zip is used in functional programming for iteration using the technique.

Matrices and Arrays of Arrays

In many programming languages, arrays can store data of any type -- regardless of primitive or composite. In particular, arrays can store arrays. When this happens, the result is an array of array (or a list of lists).

In the special case that each has an array, and that this array at the index is a scalar (that is, it is not an array), it is said the array is a matrix. In other words, it resembles a table.

The concept becomes easier with an example, Assuming that the indexing starts from zero, it can be imagined that a v array stores:

  • In v[0], the array [1, 2, 3];
  • In v[1], the array [4, 5, 6];
  • In v[2], the array [7, 8, 9].

The result is the following array:

v = [
   [1, 2, 3],
   [4, 5, 6],
   [7, 8, 9]
]

v can be thought as a table.

LineColumn 1Column 2Column 3
v[0]123
v[1]456
v[2]789

Thus, for instance, v[0][0] == 1, v[1][0] == 4, v[2][2] == 9. This happens because the value accessed in v is a array. In pseudocode, row_array = v[index]. Thus, row_array[another_index] will be a value stored in a line of the array. Likewise, row_array can be manipulated as any other array.

If one follows the same reasoning, she/he can create arrays of arrays of arrays. In this case, each line of the first array would be a matrix.

Arrays can be declared in a single line (for instance, [[1, 2, 3], [4, 5, 6], [7, 8, 9]]). In many programming languages, it is also possible to declare them over multiple lines. The version with multiple lines can be potentially easier to read.

let matrix = [
   [1, 2, 3],
   [4, 5, 6],
   [7, 8, 9]
]

for (let line = 0; line < 3; ++line) {
    var text_line = ""
    for (let column = 0; column < 3; ++column) {
        text_line += matrix[line][column] + " "
        // If you prefer, you can explict the use of toString().
        // text_line += matrix[line][column].toString() + " "
    }

    console.log(text_line)
}
matrix = [
   [1, 2, 3],
   [4, 5, 6],
   [7, 8, 9]
]

for line in range(3):
    for column in range(3):
        print(matrix[line][column], end=" ")

    print("")
local matrix = {
   {1, 2, 3},
   {4, 5, 6},
   {7, 8, 9}
}

for line = 1, 3 do
    for column = 1, 3 do
        io.write(matrix[line][column] .. " ")
    end

    print("")
end
extends Node

func _ready():
    var matrix = [
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ]

    for line in range(3):
        var text_line = ""
        for column in range(3):
            text_line += str(matrix[line][column]) + " "

        print(text_line)

Matrices are common in digital games and computer graphics. By the way, with the programming knowledge that has been acquired hitherto, it is already possible to implement simple games without terminal output. For example, many card games can be implemented using arrays, and many board games can be implemented using matrices.

Row-Major Order and Column-Major Order

Something to notice when using arrays as matrices is that there exists two ways of iterating over the elements of a matrix. One can use the order for line... for column or the order for column... for line. Although the result is the same, the performance resulting from the chosen order can vary significantly. In the cases of JavaScript, Python, Lua and GDScript, as matrices are lists of lists, the results might not be very different. However, in languages such as C and C++, the different can be significant. This happens due to the order in which the programming language stores values in memory.

Programming languages that store lines in sequence (in the example, 1 2 3, then 4 5 6, then 7 8 9) are called row-major order. Programming languages that store columns in sequence (in the example, 1 4 7, then 2 5 8, then 3 6 9) are called row-column order. In row-major order (case of C and C++), it is more efficient to iterate first over the lines (external loop), then over the columns (internal loop). In column-major order (case of MATLAB, GNU Octave, R and Julia), it is more efficient to iterate first over columns, then over rows.

Matrix as an Array

With some creativity, and programming and arithmetic knowledge, it is possible to convert matrices into a liner array.

Com um pouco de criativage, conhecimento de programação e aritmética, é possível converter matrices em um array linear.

Index01234567891011
Value123456789101112

The previous array can be thought as a matrix, if one writes the index in function of a value assumed as the number of columns. For instance, if one assumes 3 columns, each new line should start every 3 values.

Column 1Column 2Column 3
123
456
789
101112

To keep an analogy with the example using two repetition structures, one can define the following implementation to iterate over all elements of an array as if it was a matrix. It is also possible to write a variation using the remainder of integer division.

let matrix = [
   1, 2, 3,
   4, 5, 6,
   7, 8, 9,
   10, 11, 12
]

const LINES = 4
const COLUMNS = 3

for (let line = 0; line < LINES; ++line) {
    var text_line = ""
    for (let column = 0; column < COLUMNS; ++column) {
        text_line += matrix[line * COLUMNS + column] + " "
    }

    console.log(text_line)
}

console.log("As an array")
var matrix_text = ""
for (let index = 0; index < matrix.length; ++index) {
    matrix_text += matrix[index] + " "

    if ((index + 1) % COLUMNS == 0) {
        matrix_text += "\n"
    }
}

console.log(matrix_text)
from typing import Final

matrix = [
   1, 2, 3,
   4, 5, 6,
   7, 8, 9,
   10, 11, 12
]

LINES: Final = 4
COLUMNS: Final = 3

for line in range(LINES):
    for column in range(COLUMNS):
        print(matrix[line * COLUMNS + column], end=" ")

    print("")

print("As an array")
for index in range(len(matrix)):
    print(matrix[index], end=" ")

    if ((index + 1) % COLUMNS == 0):
        print("")
local matrix = {
   1, 2, 3,
   4, 5, 6,
   7, 8, 9,
   10, 11, 12
}

local LINES <const> = 4
local COLUMNS <const> = 3

for line = 1, LINES do
    for column = 1, COLUMNS do
        io.write(matrix[(line - 1) * COLUMNS + column] .. " ")
    end

    print("")
end

print("As an array")
for index = 1, #matrix do
    io.write(matrix[index] .. " ")

    if (index % COLUMNS == 0) then
        print()
    end
end
extends Node

const LINES = 4
const COLUMNS = 3

func _ready():
    var matrix = [
        1, 2, 3,
        4, 5, 6,
        7, 8, 9,
        10, 11, 12
    ]

    for line in range(LINES):
        var text_line = ""
        for column in range(COLUMNS):
            text_line += str(matrix[line * COLUMNS + column]) + " "

        print(text_line)

    print("As an array")
    var matrix_text = ""
    for index in range(len(matrix)):
        matrix_text += str(matrix[index]) + " "

        if ((index + 1) % COLUMNS == 0):
            matrix_text += "\n"

    print(matrix_text)

The examples show how it is possible to interpret a same set of data differently to modify its meaning for a very same representation. Create a trace table to understand the expression to calculate the index. It can also be interesting to declare matrix in a single line to verify that the line breaks do not interfere in the creation of the array. For instance, in JavaScript:

let matrix = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
const COLUMNS = 3
var matrix_text = ""
for (let index = 0; index < matrix.length; ++index) {
    matrix_text += matrix[index] + " "

    if ((index + 1) % COLUMNS == 0) {
        matrix_text += "\n"
    }
}

console.log(matrix_text)

The line breaks in the declaration are not the responsible to transform the array into a matrix, though the programming logic defined to manipulate the values at the indices is. In a matrix in order of lines, each new line will be exactly after every number of values defined for COLUMNS, because the next line starts after the end of the last column of the current line.

Strings

In high level programming languages that provide a type for strings, it is easier to perform basic operations such as assignment and reading values than in low level languages. Now that arrays were introduced, one can start working with strings at a lower level.

Considering the differences regarding codification, strings are similar to arrays. In some programming languages, strings are arrays of encoded characters. In others, they are arrays of bytes, that encode characters.

With repetition structures and knowledge about arrays, one can start manipulating strings character by character. Instead of thinking about a string as a single value, one can think about it as it truly is: a sequence of characters that, potentially, can be individually manipulated.

Operations and Techniques Using Strings

When strings are thought as arrays, it becomes possible to use all operations and techniques of arrays on strings. Furthermore, strings also have their own characteristic operations. Some are already familiar; others will be new.

  • Size or length: the number of characters in the string. The value will not always be equal to the size of the array that stores the string. For instance, in programming languages that deal with strings as arrays of characters, the values can variy. The variable that stores the string can have a maximum capacity, though the contents can fill only part of the available memory. Another possibility that requires attention is codification. In programming languages that treat strings as array of bytes, the real size of a string can be different (smaller) than the provided by subroutines that provide the length (a single character can be composed by multiple bytes);

  • Concatenation (concat): the combination of two strings, on which the second is added to the end of the first; The result can change the original value (in-place) or return it as a result (out-of-place);

  • Type conversion: conversion of types to string, transforming values to text (or vice-versa);

  • Case conversion: conversion os characters to lowercase or uppercase representations;

  • Comparisons (case-sensitive or not): besides equality and difference, it is common to provide operators or subroutines to verify alphabetic order in strings (which is useful for sorting arrays of strings);

  • Value verification: is the string:

    • A number or composed by digits?
    • Composed only by lowercase or uppercase characters?
    • White spaces?

    Programming languages and libraries often provide subroutines to identify what the store value means in a string;

  • Character iteration: to access and/or manipulate specific characters in a string. For instance, iteration allows accessing the characters F, r, a, n, c and o in Franco;

  • Substring search (find): search for a word or expression inside the string. In general, the result returns the position of the occurrence (if it exists), instead of simply True or False. For instance, Franco is substring of Franco Garcia, as are f and o G, although Hello or FrancoG are not. In case-sensitive programming languages, franco is not either. In case-insensitive programming languages, franco would also be;

  • Substring extraction (slice): extraction of part of the stored data in a string. For instance, in Hello, my name is Franco!, one can extract a substring such as Franco by providing the suitable initial and final indices;

  • Search and replace (find and replace): as it is possible to search for a value, it is also possible to replace it, that is, swap a substring by another, modifying part of the contents of a string. For instance, modifying the message Hello, my name is Franco! by Tchau, Franco! replaces Hello, my name is for Tchau,, keeping the remainder of the string;

  • Tokenization or split: extraction of values in an array. For instance, Hello, my name is Franco! could be split at every space (delimiter), resulting in ["Hello,", "my", "name", "is", "Franco!"];

  • Pattern matching and regular expressions (regex): search (with potential replacement) for strings using patterns defined by the programmer.

As it happens with arrays, all previous operations can be defined using conditional structures and repetition structures. By the way, such implementation is a good exercise to practice programming.

Before continuing, it is worth commenting that available features vary from one programming language to another. This is valid both feature availability, and the quality of available features. Some programming languages provide good libraries to manipulate strings (even in the standard library). In others, it is better to use a high quality external library to avoid problems.

Concatenation and Type Conversion

JavaScript, Python, Lua and GDScript provide operators to concatenate strings. In JavaScript, Python and GDScript, the operator is +. In Lua, the operator .. concatenates two values.

let s = "Franco " + 1 + " " + (-1.23) + " " + true
console.log(s)
s = "Franco " + str(1) + " " + str(-1.23) + " " + str(True)
print(s)
local s = "Franco " .. 1 .. " " .. (-1.23) .. " " .. tostring(true)
print(s)
extends Node

func _ready():
    var s = "Franco " + str(1) + " " + str(-1.23) + " " + str(true)
    print(s)

Primitive type conversion to strings were previously presented in Data Types.

In JavaScript, the type conversion of types to strings is implicit. It is also possible to use the method variable_name.toString().

In Python and GDScript, the conversion must be performed explicitly, using str().

In Lua, the conversion is implicit to all types except boolean, which requires the use of tostring(). If one would rather always use tostring(), it is also possible.

Many programming languages also allow overloading specific operators to perform conversion of valuers from/to strings. It is worth searching for the feature when working with composite types (specially with records or classes).

Size and Iteration

Programming languages usually provide a subroutine called lenght(), len() or size() (or similar) to obtain the size of a string. In some programming languages, such as C, a special character marks the end of a string (as the integer 0 or the character \0), allowing to calculate the size of a valid string using a repetition structure (however, if the string does to end with the expected symbol, a grave error will happen, called buffer overread for reads or buffer overrun for writes).

JavaScript, Python, Lua and GDScript are high level languages with subroutines to obtain the length. The following examples with use a string in Portuguese (which means Hello, my name is Franco!) to highlight a potential issue when dealing with internationalization.

let a_string = "Olá, meu nome é Franco!"
let size = a_string.length
console.log(size)

for (let character_index = 0; character_index < size; ++character_index) {
    console.log(a_string[character_index])
}

console.log("\n-------\n")

for (const character of a_string) {
   console.log(character)
}
a_string = "Olá, meu nome é Franco!"
size = len(a_string)
print(size)

for character_index in range(size):
    print(a_string[character_index])

print("\n-------\n")

for character in a_string:
   print(character)
local a_string = "Olá, meu nome é Franco!"
local size = string.len(a_string)
             -- For Lua 5.1 or more recent, you can also use #a_string.
print(size)

-- What does happen with accents?
for character_index = 1, size do
    print(string.sub(a_string, character_index, character_index))
    -- or
    -- print(a_string:sub(character_index, character_index))
end

print("\n-------\n")

for character in string.gmatch(a_string, ".") do
                 -- or a_string:gmatch(".")
   print(character)
end
extends Node

func _ready():
    var a_string = "Olá, meu nome é Franco!"
    var size = len(a_string)
    print(size)

    for character_index in range(size):
        print(a_string[character_index])

    print("\n-------\n")

    for character in a_string:
       print(character)

For Python and GDScript, len() is a known function. For JavaScript, the length (documentation) property provides the size. For Lua, string.len() provides the size, string.sub() allows accessing substrings and string.gmatch() allows performing searches (documentation). Both string.sub() and string.gmatch() will be topic of study of the next subsections.

The output of the Lua program will be different from those of the other languages. The size of a_string in JavaScript, Python and GDScript will 23, though, in Lua, it will be 25. This happens because the language is assuming that each character has a single byte, something that does happen to á and é, which have two bytes each. In fact, when there is an attempt to write the value for each position, the output will be strange, with an incorrect value. Characters belonging English language will not have the same problem. From version 5.3 of Lua, the language provides basic support to Transformation Format 8-bit (UTF-8), allowing the codification of characters from other languages using UTF-8 instead of American Standard Code for Information Interchange (ASCII).

Since version 5.3, the following Lua code provides the correct result.

local a_string = "Hello, my name is Franco!"
local size = utf8.len(a_string)
print(size)

for position_in_bytes, code_point in utf8.codes(a_string) do
   -- print(position_in_bytes, code_point, utf8.char(code_point))
   print(utf8.char(code_point))
end

The implementation uses utf8.len() to obtain the size, utf8.codes() for iteration and utf8.char() to retrieve a character (documentation).

With utf8.len(), the correct size 23 will be returned. To write only the characters, print(utf8.char(code_point)) can be used. The commented line writes the other values returned by utf8.codes(). If the comment is removed, it is interesting to note that the counter position_in_bytes increases by two units when processing the accented characters á or é, testifying that each one is, in fact, defined by two bytes in UTF-8.

The example in Lua shoes that is necessary taking care when directly manipulating indices in strings. One must not assume the result without knowing how the language store and encoded text. Thus, if the language does not provide subroutines in the standard library to process text in non-ASCII codifications, it is possible to create subroutines to handle them or (preferably) use a high quality external library to do so.

Case Conversion

In particular, case conversion are also sensible to codification. It is common that conversion subroutines for lowercase or uppercase strings work only with ASCII or UTF-8.

console.log("0123 ABCDEF abcdef ÁÉÍÓÚ ÀÈÌÒÙ ÃÕ Ç áéíóú àèìòù ãõ ç".toLowerCase())
console.log("0123 ABCDEF abcdef ÁÉÍÓÚ ÀÈÌÒÙ ÃÕ Ç áéíóú àèìòù ãõ ç".toUpperCase())
print("0123 ABCDEF abcdef ÁÉÍÓÚ ÀÈÌÒÙ ÃÕ Ç áéíóú àèìòù ãõ ç".lower())
print("0123 ABCDEF abcdef ÁÉÍÓÚ ÀÈÌÒÙ ÃÕ Ç áéíóú àèìòù ãõ ç".upper())
os.setlocale("pt_BR.UTF-8")

print(string.lower("0123 ABCDEF abcdef ÁÉÍÓÚ ÀÈÌÒÙ ÃÕ Ç áéíóú àèìòù ãõ ç"))
print(string.upper("0123 ABCDEF abcdef ÁÉÍÓÚ ÀÈÌÒÙ ÃÕ Ç áéíóú àèìòù ãõ ç"))

local s = "0123 ABCDEF abcdef ÁÉÍÓÚ ÀÈÌÒÙ ÃÕ Ç áéíóú àèìòù ãõ ç"
print(s:lower())
print(s:upper())
extends Node

func _ready():
    print("0123 ABCDEF abcdef ÁÉÍÓÚ ÀÈÌÒÙ ÃÕ Ç áéíóú àèìòù ãõ ç".to_lower())
    print("0123 ABCDEF abcdef ÁÉÍÓÚ ÀÈÌÒÙ ÃÕ Ç áéíóú àèìòù ãõ ç".to_upper())

In the previous examples, the version in Lua will probably fail with accented characters. Unfortunately, there are not predefined functions for utf8.lower() or utf8.upper(), although the use os.setlocale() can be used in some platforms. Thus, one needs to search for a library if she/he wishes to perform case conversion for non-English languages.

Comparisons

In Relational Operations and Comparisons, it was commented about string comparisons.

console.log("Franco" === "Franco")
console.log("Franco" !== "Franco")
console.log("Franco" === "franco")
console.log("FrAnCo" === "FrAnCo")
console.log("fRaNcO".toLowerCase() === "FrAnCo".toLowerCase())
console.log("fRaNcO".toUpperCase() === "FrAnCo".toUpperCase())
print("Franco" == "Franco")
print("Franco" != "Franco")
print("Franco" == "franco")
print("FrAnCo" == "FrAnCo")
print("fRaNcO".lower() == "FrAnCo".lower())
print("fRaNcO".upper() == "FrAnCo".upper())
print("Franco" == "Franco")
print("Franco" ~= "Franco")
print("Franco" == "franco")
print("FrAnCo" == "FrAnCo")
print(string.lower("fRaNcO") == string.lower("FrAnCo"))
print(string.upper("fRaNcO") == string.upper("FrAnCo"))

local x = "fRaNcO"
local y = "FrAnCo"
print(x:lower() == y:lower())
print(x:upper() == y:upper())
extends Node

func _ready():
    print("Franco" == "Franco")
    print("Franco" != "Franco")
    print("Franco" == "franco")
    print("FrAnCo" == "FrAnCo")
    print("fRaNcO".to_lower() == "FrAnCo".to_lower())
    print("fRaNcO".to_upper() == "FrAnCo".to_upper())

The previous example is an adaptation of the example provided in the topic about relation operations. JavaScript, Python, Lua and GDScript are case-sensitive; thus, the example illustrates comparisons using only the operators and using case conversion.

In languages that do not provide an operator to verify equality of strings, one can compare values position by position. If they are all identical (and the strings have the same length), the strings will be equal.

Value Verification

Some programming languages have subroutines to check if a string contains only characters that satisfy a given criterion. This can be performed used predefined subroutines for particular cases or using regular expressions.

The existence and the number of predefined subroutines vary according to the language. The examples below illustrate the use of some of them, which can vary among languages. If the language does not provide a required check, one can implement them using regular expressions (which will be addressed later in this topic), or repetition structures and conditional structures. The implementations in Python and GDScript are the simplest, as they use predefined methods.

JavaScript uses regular expressions in snippets using /pattern/ (documentation), that have yet to be discussed. The same applies to Lua, with string.match() (documentation). Python provides several predefined methods with checks, such as isalpha() and isdecimal() (documentation; they start with the is prefix). GDScript provides some methods to check types and formats, such as for file paths and filenames, numbers, HTML colors and Internet Protocol (IP) addresses for network communication (documentation).

let valuees = ["Franco", "123", "123.45", "Franco123", "franco", "FRANCO", " \n\t"]
for (const value of valuees) {
    console.log(value)
    console.log("Letters", /^[A-Z]+$/i.test(value))
    console.log("Alphanumeric", /^\w+$/i.test(value)) // Includes underscore (_).
    console.log("Digits", /^\d+$/i.test(value))
    console.log("Lowercase letters", /^[a-z]+$/.test(value))
    console.log("Uppercase letters", /^[A-Z]+$/.test(value))
    console.log("Spaces", /^\s+$/i.test(value))

    console.log("-------\n")
}
valuees = ["Franco", "123", "123.45", "Franco123", "franco", "FRANCO", " \n\t"]
for value in valuees:
    print(value)
    print("isalpha", value.isalpha())
    print("isascii", value.isascii())
    print("isdigit ", value.isdigit())
    print("isdecimal", value.isdecimal())
    print("isnumeric", value.isnumeric())
    print("islower", value.islower())
    print("isupper", value.isupper())
    print("isspace", value.isspace())

    print("-------\n")
local valuees = {"Franco", "123", "123.45", "Franco123", "franco", "FRANCO", " \n\t"}
for _, value in ipairs(valuees) do
    print(value)
    print("Letters", string.match(value, "[^%a]") == nil) -- or value:match("[^%a]")
    print("Alphanumeric", string.match(value, "[^%w]") == nil)
    print("Digits", string.match(value, "[^%d]") == nil)
    print("Lowercase letters", string.match(value, "[^%l]") == nil)
    print("Uppercase letters", string.match(value, "[^%u]") == nil)
    print("Spaces", string.match(value, "[^%s]") == nil)

    print("-------\n")
end
extends Node

func _ready():
    var valuees = ["Franco", "123", "123.45", "Franco123", "franco", "FRANCO", " \n\t"]
    for value in valuees:
        print(value)
        printt("is_valid_float", value.is_valid_float())
        printt("is_valid_integer", value.is_valid_integer())

        print("-------\n")

The last value in values corresponds to a space, a line break or a tabulation. These values are typically considered as white spaces in programming languages.

The versions using patterns in JavaScript and Lua do not match real numbers (such as 123.45), because they verify only the existence of digits (although numbers can have a comma), nor negative numbers (due to the symbol -). Besides, they are quick examples; they require further tests before using in real programs. For convenience, it would also be interesting to refactor them into functions.

Substring Extraction

If it is necessary to obtain part of the data in a string, one can perform a substring extraction. To do this, it suffices to provide an initial index and the final index that should be used to perform the extraction. Some programming languages (such as GDScript) use a character counter instead of the final position.

let phrase = "Hello, my name is Franco!"
console.log(phrase.slice(16))
console.log(phrase.slice(16, 22))
phrase = "Hello, my name is Franco!"
print(phrase[16:])
print(phrase[16:22])
print(phrase[16:22:3]) # Alternative by three per three characters.
local phrase = "Hello, my name is Franco!"
print(string.sub(phrase, 19)) -- or phrase:sub(19)
print(string.sub(phrase, 19, 24)) -- or phrase:sub(19, 24)
extends Node

func _ready():
    var phrase = "Hello, my name is Franco!"
    print(phrase.substr(16))
    print(phrase.substr(16, 6))

Subroutines:

The differences of indices in Lua are due to the need to count accents as array positions. If only the indexation starting by 1 was considered, the index should have been 17 for the beginning and 23 for the end.

The extraction operation also can be applied to arrays. In this case, it obtains data from the desired indices, as a filter per position.

A common operation when manipulating strings consists of search whether a string (called substring) exists in another one (the considered string). Besides common in programming, it is also common in software that read or manipulate text. For instance, text editors, text processors, document readers and Internet browsers provide a feature to search for a keyword or expression (which is usually provided in the Ctrl f shortcut).

Simple searches for a specific substring can use subroutines such as find() or search(). Complex searches can use regular expressions to search for patterns with alternatives.

A special case of search for substring consists of checking if a string starts or ends with a specific substring (commonly called start_with() or begin_with() for searches at the beginning or end_with() for searching at the end). The search for a beginning allows identifying prefixes (for instance, to verify domains in web pages or paths to directories in file systems), while the search at the ends allows identifying suffixes (for instance, to check file extensions).

let name = "Franco"
console.log(name.includes("ra"))
console.log(name.indexOf("ra"))
console.log(name.startsWith("Fr"))
console.log(name.endsWith("co"))
name = "Franco"
print(name.find("ra"))
print(name.startswith("Fr"))
print(name.endswith("co"))
local name = "Franco"
print(string.find(name, "ra")) -- or name:find("ra")
local prefix = "Fr"
print(string.find(name, prefix, 1, true) == 1) -- or name:find(prefix, 1, true) == 1
local suffix = "co"
local suffix_position = #name - #suffix + 1
print(string.find(name, suffix, suffix_position, true) == suffix_position) -- ou name:find(suffix, suffix_position, true) == suffix_position
-- Alternative:
print(string.sub(name, -#suffix) == suffix)
extends Node

func _ready():
    var name = "Franco"
    print(name.find("ra"))
    print(name.begins_with("Fr"))
    print(name.ends_with("co"))

Subroutines:

In the examples, Lua does not provide predefined subroutines for searching for prefix or suffix, though they can use string.find() with suitable adjustments for positions.

Moreover, JavaScript has two alternatives for find(): includes() returns a logic value informing whether the value exists, while indexOf() returns the index of a match. In all other languages, the subroutines for search returns the index for the first match. In case of error, Lua returns nil; Python, GDScript and JavaScript (indexOf()) return -1.

It is important noticing that more than an occurrence of the searched term can appear in a string. In this case, the default behavior is often to return the first match. Implementations often allow providing an initial value as a parameter for the search. Thus, if there is interest in locating all occurrence, one can use a repetition structure to search iteratively for the next occurrence (it suffices to adjust the next initial index after every occurrence of the term).

Search and Replace

Subroutines such as find() allows locating substrings. The complementary operation to a search is replacement.

let phrase = "Hello, my name is Franco!"
phrase = phrase.replace("Franco", "FRANCO")
console.log(phrase)
phrase = "Hello, my name is Franco!"
phrase = phrase.replace("Franco", "FRANCO")
print(phrase)
local phrase = "Hello, my name is Franco!"
phrase = string.gsub(phrase, "Franco", "FRANCO", 1) -- or phrase:gsub("Franco", "FRANCO", 1)
print(phrase)
extends Node

func _ready():
    var phrase = "Hello, my name is Franco!"
    phrase = phrase.replace("Franco", "FRANCO")
    print(phrase)

Subroutines:

Some implementations, such as used in Lua, allow to replace all occurrences. The use of 1 requests it to replace only the first one.

In programming languages that only replaces the first occurrence, one can use a repetitions structure to replace the remaining ones. One should take care because, if the sizes of the searched term and the replaced one are different, the indices should be adjusted accordingly. In particular, if the modified term contains the original one, one should be careful not to replace the modification again by mistake. Another option is using a regular expression to replace all occurrences at once.

Tokenization and Split

As it is possible to convert an array in a string using an accumulator, it is also possible to convert a string into an array using operations of split and tokenization. The principle is separating substrings from the original string every time an delimiter is found. The delimiter is not included in the results.

Traditional delimiters include spaces, commas, semicolons, line breaks and tabulations. However, any sequence of characters can be adopted as a delimiter. Ideally, one should choose a delimiter that does not compromise the extraction of valid data.

let phrase = "Hello, my name is Franco!"
let delimiter = " "
let words = phrase.split(delimiter)
console.log(words)
phrase = "Hello, my name is Franco!"
delimiter = " "
words = phrase.split(delimiter)
print(words)
-- Procedure that includes double quotes between each value to make it easier
-- to read each extracted string.
function write_array(array)
    if (array == nil) then
        return
    end

    local size = #array
    io.write("[")
    for index = 1, size do
        io.write("\"" .. array[index] .. "\"")
        if (index < size) then
            io.write(", ")
        end
    end
    print("]")
end

function split(a_string, delimiter)
    delimiter = delimiter or " "
    local result = {}
    local size = #a_string
    local begin_at = 1
    while (begin_at <= size) do
        local end_at, next = string.find(a_string, delimiter, begin_at, true)
        if (end_at ~= nil) then
            table.insert(result, string.sub(a_string, begin_at, end_at - 1))
            begin_at = next + 1
        else
            table.insert(result, string.sub(a_string, begin_at))
            -- Any value greater than size to finish the loop.
            begin_at = size + 1
        end
    end

    -- Add an empty value case the string finishes with the delimiter.
    if (string.sub(a_string, -#delimiter) == delimiter) then
        table.insert(result, "")
    end

    return result
end

local phrase = "Hello, my name is Franco!"
local delimiter = " "
local words = split(phrase, delimiter)
write_array(words)
extends Node

func _ready():
    var phrase = "Hello, my name is Franco!"
    var delimiter = " "
    var words = phrase.split(delimiter)
    print(words)

Subroutines:

Many implementations adopt a space ( ) as the default delimiters, in the case one is not specified. Thus, it is worth consulting the documentation of the chosen subroutine.

Lua does not provide a subroutine for split in the standard library. The example provides an implementation using only features that have been described hitherto (for instance, it does not use regular expressions). The implementation accepts empty values, case a_string ends with a delimiter or presents two delimiters in sequence. This is useful, for instance, when one is working with Comma Separated Values (CSV) or Tab Separated Values (TSV). For instance, the string a,,c,,e,, delimited by a comma would result ["a", "", "c", "e", "", ""], while ,, would result in ["", "", ""].

Regular Expressions (Regex)

Regular expressions are expressions that can match values belonging to a regular language. Regular languages have several limitations; still, regular expressions can be useful for searches matching simple patterns. In programming, regular expression are a more advanced feature for searching (and, potentially, replacing) values in strings. Some previous examples have used regular expressions, albeit without further explanation.

A suitable introduction to regular expressions require its own topic. Therefore, this subsection serves only for illustrative purposes, to highlight that the feature exists. In other words, a curiosity that can be useful in the future.

Searches for regular expressions allow identifying positions of occurrences. Each occurrence can be extracted or replaced by another value.

A difficulty when using regular expressions is the fact that there exists multiple formats to define the expressions. Programming languages (or libraries) can define their own format or use famous ones. Thus, the syntax tend to vary among implementations.

Some popular formats include Perl Compatible Regular Expressions (PRCE), Basic Regular Expression (BRE) and Extended Regular Expression (ERE). Other tools define their own formats (for instance, the Emacs text editor has its own format). Many implementations use or are compatible with PCRE.

Virtually every implementation provides the following features:

  • Character classes: special patterns to represent sets of predefined characters. For instance:

    ValueMeaning
    bSpace
    sAny kind of white space
    dDigits (0 to 9)
    wAlphanumeric characters (letters, digits and underscore)
    .Any character
    $End of the text
    ^Beginning of the text

    The previous values are often prefixed by a character, such as a backslash (\), a number sign (#, also called hash, pound sign or hashtag by the youth). Some implementations define a uppercase letter as the group that is contrary to the lowercase one.

    There also exists a way to define groups using square brackets. For instance, [abc] means any character that is either a, b or c, while [A-Z] corresponds to any uppercase letter and [A-Za-z0-9] corresponds to any alphanumeric character (including underscore, also called underline).

    Once again, the form can vary among implementations; thus, one should always refer to the documentation.

  • Patterns: sequences of characters or classes of characters with, possibly, specifiers for quantities.

    SymbolMeaning
    *0 or more occurrences
    +1 or more occurrences (greed in some implementations)
    -1 or more occurrences (non greed in some implementations)
    ?0 or 1 occurrence

    For instance, writing .* means any combination of zero or more of any character (as it is the meaning of .). On the other hand, (abc)+ corresponds to any non-empty sequence of abc (for instance, abc or abcabcabc).

    The term greed means a search for the largest possible quantity of characters that satisfies the defined pattern. In a greed search, the matching of an expression continues until the last valid value that is identified. In a non greed search, it ends at the first occurrence;

  • Captures: blocks that allow obtaining values that satisfy an identified pattern. The idea of capture is providing a way to access the obtained value from a pattern match, whatever it is. Captures can be named or indexed (in the other of the definition).

    Some implementations use parenthesis to define a capture group.

To demonstrate the potential of regular expressions, one can consider a dialog format such as follows. To avoid problems, the text omits accents.

- Franco: Hello, how are you?
- Your Name: Hi!
- Your Name: Everything is fine.
- Franco: Glad to know!

Any value that is placed between the hyphen and the two dots form the name of the character. Any value after the two points and before a line break form a line of dialog (for instance, a phrase).

One can suppose that there is a need to switch the format of the dialog to something like [NAME] said "[PHRASE]".

To write the expression regular, one should build an expression following the expected formatting. In this case, the problem requires a line per line analysis. The format will resemble - , followed by any text, followed by : , followed by any text, followed by \n. Any text can be written as any alphanumeric combination or any character that is written zero or more times.

To highlight some cases, one can suppose that the name of character is not empty and it is composed by alphanumeric characters, while the phrase can be any valid text except a new line. Thus, the name matching can be written as - ([\w ]+):, while the text matching (which will come after the name) can be written as : (.*)\n. Combining both expressions, the result is - ([\w ]+): (.*)\n (one of the two points was removed to avoid duplicates).

Before starting the implementation, it is worth mentioning that is often necessary to add escape characters in the backslashes (as well as square and curly braces that should appear in the text). In other words, one should write \\ instead of \ when using a backslash in a string. Some programming languages, such as JavaScript and Python, provide special features to write regular expressions with greater ease. For instance, by prefixing a string with r"" in Python, it is possible to use a single backslash. Likewise, JavaScript provides a special syntax using slashes to write regular expressions as literals instead of strings.

For the implementation, each language will have its specificities. In Python, the part \w must be written as \\w to include the backslash in the string. In Lua, the language uses %w instead of w for the alphanumeric group. In Lua, * operator will also be replaced by -, to use the non-greedy version (otherwise, the search will extend until the last line break instead of the stopping after the first one).

let script = "- Franco: Hello, how are you?\n- Your Name: Hi!\n- Your Name: Everything is fine.\n- Franco: Glad to know!\n"
let regular_expression = /- ([\w ]+): (.*)\n/g
let result = regular_expression.exec(script)
while (result !== null) {
    // To write the value obtained as answer:
    // console.log(result)
    let name = result[1]
    let phrase = result[2]
    let new_format = name + " said \"" + phrase + "\""
    console.log(new_format)
    result = regular_expression.exec(script)
}
import re

script = "- Franco: Hello, how are you?\n- Your Name: Hi!\n- Your Name: Everything is fine.\n- Franco: Glad to know!\n"
regular_expression = re.compile(r"- ([\w ]+): (.*)\n")
result = regular_expression.findall(script)
for group in result:
    # To write the value obtained as answer:
    # print(group)
    name = group[0]
    phrase = group[1]
    new_format = name + " said \"" + phrase + "\""
    print(new_format)
local script = "- Franco: Hello, how are you?\n- Your Name: Hi!\n- Your Name: Everything is fine.\n- Franco: Glad to know!\n"
local regular_expression = "- ([%w ]+): (.-)\n"
for name, phrase in string.gmatch(script, regular_expression) do
    local new_format = name .. " said \"" .. phrase .. "\""
    print(new_format)
end
extends Node

func _ready():
    var script = "- Franco: Hello, how are you?\n- Your Name: Hi!\n- Your Name: Everything is fine.\n- Franco: Glad to know!\n"
    var regular_expression = RegEx.new()
    regular_expression.compile("- ([\\w ]+): (.*)\n")
    var result = regular_expression.search_all(script)
    for group in result:
        # To write the value obtained as answer:
        # print(group)
        var name = group.strings[1]
        var phrase = group.strings[2]
        var new_format = name + " said \"" + phrase + "\""
        print(new_format)

The result of each program should be:

Franco said "Hello, how are you?"
Your Name said "Hi!"
Your Name said "Everything is fine."
Franco said "Glad to know!"

If the script was changed, the new results would still be processed correctly, provided that the format was followed correctly for each line of dialog. Besides, it is possible to make the implementation even more concise by using replacements instead of identifications.

let script = "- Franco: Hello, how are you?\n- Your Name: Hi!\n- Your Name: Everything is fine.\n- Franco: Glad to know!\n"
let regular_expression = /- ([\w ]+): (.*)\n/g
let result = script.replace(regular_expression, "$1 said \"$2\"\n",)
console.log(result)
import re

script = "- Franco: Hello, how are you?\n- Your Name: Hi!\n- Your Name: Everything is fine.\n- Franco: Glad to know!\n"
regular_expression = re.compile(r"- ([\w ]+): (.*)\n")
result = regular_expression.sub(r"""\1 said "\2"\n""", script)
print(result)
local script = "- Franco: Hello, how are you?\n- Your Name: Hi!\n- Your Name: Everything is fine.\n- Franco: Glad to know!\n"
local regular_expression = "- ([%w ]+): (.-)\n"
local result = string.gsub(script, regular_expression, "%1 said \"%2\"\n")
print(result)
extends Node

func _ready():
    var script = "- Franco: Hello, how are you?\n- Your Name: Hi!\n- Your Name: Everything is fine.\n- Franco: Glad to know!\n"
    var regular_expression = RegEx.new()
    regular_expression.compile("- ([\\w ]+): (.*)\n")
    var result = regular_expression.sub(script, "$1 said \"$2\"\n", true)
    print(result)

When one uses the subroutine to perform replaces, she/he can eliminate the use of the repetition structure to replace occurrences. Evidently, if one wishes, she/he can change the solution (with suitable parameters) to modify only the first occurrence instead of them all.

Regular expressions are not very legible without experience and practice. In reality, they even can be hard to read with experience and practice. There are tools to make it easier to create and visualize regular expressions, such as RegExr for JavaScript. They can be a good resource for learning and supporting the creation, especially because they show the results of modifications in a expression in real time.

Moreover, one can avoid using regular expressions by using repetition and conditional structures to write a syntactic analyser (also called a parser). Nevertheless, it is convenient learning how to use regular expressions at some time of the programming career.

Dictionaries, Maps or Tables

Arrays (or lists) are available by default in virtually every programming language, although they are not the only way to store collections of values. Modern languages often provide a default implementation for another associative data structure known as dictionary, although the implementation itself may vary. For instance, the dictionary can be implemented as a hash table (or hash map) or as tree. However, the way to operate it is similar.

Arrays associate a value to an index. Dictionaries associate a value to a key (or ID) chosen by the programmer. In particular, the implementation of dictionaries often allow associating data to a key of any data type. Strings are one of the most common data types used as keys. For instance, an association with key "name" and value "Franco" in a dictionary names dictionary allows using dictionary["Franco"] to retrieve the stored value "Franco". Thus, key and value define a pair that can be easily accessed using the key.

An advantage of using a dictionary is the ease to access a datum using the key, because the dictionary abstracts the search for the key to access the value. For instance, imagine that one wants to store a relation of recommended IDEs in this material for programming beginners by language. Using arrays, one could store the name of the language in an array and the IDE in another, using the technique of parallel arrays to map the index. To search for an IDE, first it is necessary to know the index to, then, search for the IDE.

let language_names = ["Python", "Lua", "GDScript", "JavaScript"]
let language_ides = ["Thonny", "ZeroBrane Studio", "Godot Engine", "Text editor and Internet browser"]

let searched_language = "GDScript"
let searched_language_index = language_names.indexOf(searched_language)
if (searched_language_index !== -1) {
    let recommendation = language_ides[searched_language_index]
    console.log(searched_language, ": ", recommendation)
} else {
    console.log("No recommendations for ", searched_language)
}
language_names = ["Python", "Lua", "GDScript", "JavaScript"]
language_ides = ["Thonny", "ZeroBrane Studio", "Godot Engine", "Text editor and Internet browser"]

searched_language = "GDScript"
searched_language_index = language_names.index(searched_language)
if (searched_language_index != -1):
    recommendation = language_ides[searched_language_index]
    print(searched_language, ": ", recommendation)
else:
    print("No recommendations for ", searched_language)
function sequential_search(array, value)
    for index, current_value in ipairs(array) do
        if (value == current_value) then
            return index
        end
    end

    return -1
end

local language_names = {"Python", "Lua", "GDScript", "JavaScript"}
local language_ides = {"Thonny", "ZeroBrane Studio", "Godot Engine", "Text editor and Internet browser"}

local searched_language = "GDScript"
local searched_language_index = sequential_search(language_names, searched_language)
if (searched_language_index ~= -1) then
    local recommendation = language_ides[searched_language_index]
    print(searched_language, ": ", recommendation)
else
    print("No recommendations for ", searched_language)
end
extends Node

func _ready():
    var language_names = ["Python", "Lua", "GDScript", "JavaScript"]
    var language_ides = ["Thonny", "ZeroBrane Studio", "Godot Engine", "Text editor and Internet browser"]

    var searched_language = "GDScript"
    var searched_language_index = language_names.find(searched_language)
    if (searched_language_index != -1):
        var recommendation = language_ides[searched_language_index]
        print(searched_language, ": ", recommendation)
    else:
        print("No recommendations for ", searched_language)

The disadvantage of the approach is the need to search for the name of the language. It could be improved by keeping the arrays sorted by the name of the language.

Another option could define a convention of a specific index for each programming language. For instance, the first index (0 or 1, depending on the language used in the implementation) could always correspond to Python. Each index could be stored as a constant to make it easier to access the value. The disadvantage of the approach is that it requires modifications in the source code every time a new programming language must be included.

In both approaches, the undesirable situation is the need to know the index that stores the value (either by searching it or adopting a convention). With a dictionary, it is possible to associate a value directly to a key. If one knows the key, she/he can retrieve the value.

// With JavaScript Object:
let language_ides = {
    "Python": "Thonny",
    "Lua": "ZeroBrane Studio",
    "GDScript": "Godot Engine",
    "JavaScript": "Text editor and Internet browser"
}
// It is also possible to write:
// let language_ides = {
//    Python: "Thonny",
//    Lua: "ZeroBrane Studio",
//    GDScript: "Godot Engine",
//    JavaScript: "Text editor and Internet browser"
//}

let searched_language = "GDScript"
let recommendation = language_ides[searched_language]
if (recommendation !== undefined) {
    console.log(searched_language, ": ", recommendation)
} else {
    console.log("No recommendations for ", searched_language)
}

// With Map:
language_ides = new Map()
language_ides.set("Python", "Thonny")
language_ides.set("Lua", "ZeroBrane Studio")
language_ides.set("GDScript", "Godot Engine")
language_ides.set("JavaScript", "Text editor and Internet browser")

searched_language = "GDScript"
if (language_ides.has(searched_language)) {
    let recommendation = language_ides.get(searched_language)
    console.log(searched_language, ": ", recommendation)
} else {
    console.log("No recommendations for ", searched_language)
}
language_ides = {
    "Python": "Thonny",
    "Lua": "ZeroBrane Studio",
    "GDScript": "Godot Engine",
    "JavaScript": "Text editor and Internet browser"
}

searched_language = "GDScript"
if (searched_language in language_ides):
    recommendation = language_ides[searched_language]
    print(searched_language, ": ", recommendation)
else:
    print("No recommendations for ", searched_language)
local language_ides = {
    Python = "Thonny",
    Lua = "ZeroBrane Studio",
    GDScript = "Godot Engine",
    JavaScript = "Text editor and Internet browser"
}

local searched_language = "GDScript"
local recommendation = language_ides[searched_language]
if (searched_language ~= nil) then
    print(searched_language, ": ", recommendation)
else
    print("No recommendations for ", searched_language)
end
extends Node

func _ready():
    var language_ides = {
        "Python": "Thonny",
        "Lua": "ZeroBrane Studio",
        "GDScript": "Godot Engine",
        "JavaScript": "Text editor and Internet browser"
    }

    var searched_language = "GDScript"
    if (searched_language in language_ides):
        var recommendation = language_ides[searched_language]
        print(searched_language, ": ", recommendation)
    else:
        print("No recommendations for ", searched_language)

JavaScript, Python, Lua and GDScript allow declaring a dictionary using curly brackets (in Lua, arrays and dictionaries are implemented as a structure called table; this is why both are declared using curly braces). Insertion and accessing values use the square brackets' operator, which receives the key as parameter. In Python and GDScript, one can check whether a key exists using the keyword in. In JavaScript (with JavaScript Object) and Lua, one can try accessing the value and check whether it is null.

In the case of JavaScript, using keys actually create a JavaScript Object instead of a dictionary. To create a proper dictionary, new Map() (documentation) can be used with the methods set() (documentation) to insert or update a value, get() (documentation) to access an existing value, and has() (documentation) to verify whether a key exists. The three methods allow writing equivalent code to the initial example. In many cases, JavaScript Object can be used as dictionary; however, as will be presented in some subsections, only Map provides some useful methods to manipulate dictionaries.

Arrays, lists and dictionaries are data structures (although it could be argued that arrays are at a lower level, as a composite type). All of them can store and manage collections of data. The choice of the most appropriate data structure will depend on the problem (especially because, many times, it is possible to choose between than one to solve a problem).

Dictionaries are practical data structures to program, especially for prototypes. The choice between an array and a dictionary depends on the characteristics of the problem to solve. In general, the access to an element in an array is a faster operation than accessing an element in dictionary. However, this assumes that the index for accessing the value if known. If there is a need to perform a sequential search for find the index for accessing the value, the use of dictionary should be faster.

Besides, arrays are often a better choice to preserve and define order, due to the indices. Except case the implementation affirms otherwise, one cannot assume order to iterate on dictionaries.

When performance is a priority, the choice of the data structure becomes important. In this case, it is necessary analyzing the problem regarding the use of the collection. The number (or proportion) of insertions, removals, searches and accesses to values can suggest the most appropriate choice. Each operation has complexity between to depending on the implementation of the data structure and the operation to perform.

Nevertheless, when one is learning to program, she/he should not consider performance as a priority (unless the problem requires it). Thus, at this time, you can choose the data structure that makes it easier to solve your problem. This recommendation can be useful even for professional practice. Many times, it is convenient to create an initial prototype using the simplest solution to, afterwards, measure the performance of the program with a profiler to identify performance bottlenecks. This way, you can focus on parts of the solution that truly require optimization instead of guessing them with suppositions.

Operations and Techniques Using Dictionaries

Operations and techniques to manipulate dictionaries are practically the same used for array.

  • Accessing value and iteration;
  • Obtaining the size (length);
  • Check whether the dictionary is empty;
  • Initialization;
  • Write or print (dump);
  • Accumulation of values (reduce);
  • Selecting or filtering values;
  • Search (find);
  • Copy (duplication or clone);
  • Insert (add, set), remove (delete) and clear (empty).

A difference is the absence of sequential search, sorting and binary search, which are implementation details. Dictionary implementations can have their keys sorted (for instance, using trees) or unsorted (for instance, using hash tables).

Whatever is the implementation, the programmer using the dictionary does not need to worry with it. Instead of sorting and searching, the programmer must verify whether the key exists in the dictionary. Depending on the implementation, an attempt to access a nonexistent key can result in error or exception (crashing the program), or the return of a null value. In implementations in which the accesses result in error, ideally the programmer must check if the key exists before trying to access the value (or assert the key does, indeed, exist before using it).

Declaration and Initialization

Like arrays, one can initialize values of a dictionary in the declaration or by performing insertions after the declaration. Programming languages often provide two ways of accessing values in a dictionary:

  • A subroutine to obtain the value (normally called get or read); For instance, my_dictionary.get(key);
  • An operator to access the value (normally square brackets, like arrays). For instance, my_dictionary[key].

Some languages provide one of the alternatives, others provide both. Some languages may even provide a third way, for use when the key is a string: my_dictionary.key, which is equivalent to the second form and resembles the use of variables (attributes or fields) of records (which will be the next topic of study).

The declaration often provides two variations as well:

  • The declaration using the name of the data structure (map, dictionary, dict, table...);
  • A special symbol (normally a pair of curly brackets).

The following examples provide the names of numbers written in Portuguese (for instance, um means one), to highlight how keys with accents or other special characters work.

// JavaScript object (can be used as a dictionary with proper care).
let my_dictionary = {
    "um": 1,
    "dois": 2,
    "três": 3
}
// Stricly speaking, the best form to create uses
// let my_dictionary = Object.create(null)
// instead of {}, to avoid creating predefined properties.

my_dictionary["quatro"] = 4
my_dictionary.cinco = 5
my_dictionary["seis"] = 6

console.log(my_dictionary["um"])
console.log(my_dictionary["dois"])
console.log(my_dictionary.três) // JavaScript allows this use even if the key
                                // contains special characters
console.log(my_dictionary.quatro)
console.log(my_dictionary["cinco"])
console.log(my_dictionary["seis"])

// Proper dictionary.
my_dictionary = new Map()
my_dictionary.set("um", 1)
my_dictionary.set("dois", 2)
my_dictionary.set("três", 3)
my_dictionary.set("quatro", 4)
my_dictionary.set("cinco", 5)
my_dictionary.set("seis", 6)

console.log(my_dictionary.get("um"))
console.log(my_dictionary.get("dois"))
console.log(my_dictionary.get("três"))
console.log(my_dictionary.get("quatro"))
console.log(my_dictionary.get("cinco"))
console.log(my_dictionary.get("seis"))
my_dictionary = {
    "um": 1,
    "dois": 2,
    "três": 3
}

my_dictionary["quatro"] = 4
my_dictionary["cinco"] = 5
my_dictionary["seis"] = 6

print(my_dictionary["um"])
print(my_dictionary["dois"])
print(my_dictionary["três"])
print(my_dictionary["quatro"])
print(my_dictionary["cinco"])
print(my_dictionary["seis"])
local my_dictionary = {
    um = 1,
    dois = 2,
    ["três"] = 3 -- Keys of numerical types of with accents, spaces or other
                 -- special symbols must appear between square brackets.
                 -- In this case, ["três"] is the string "três" (three)
                 -- instead of an array (arrays in Lua use the {} syntax,
                 -- because they are also tables).
}

my_dictionary["quatro"] = 4
my_dictionary.cinco = 5
my_dictionary["seis"] = 6

print(my_dictionary["um"])
print(my_dictionary["dois"])
print(my_dictionary["três"]) -- The use of accents inhibits the access
                             -- using .três. In this case, square brackets
                             -- must be used.
print(my_dictionary.quatro)
print(my_dictionary["cinco"])
print(my_dictionary.seis)
extends Node

func _ready():
    var my_dictionary = {
        "um": 1,
        dois = 2,
        "três": 3
    }

    my_dictionary["quatro"] = 4
    my_dictionary.cinco = 5
    my_dictionary["seis"] = 6

    print(my_dictionary["um"])
    print(my_dictionary["dois"])
    print(my_dictionary["três"]) # The use of accents inhibits the access
                                 # using .três. In this case, square brackets
                                 # must be used.
    print(my_dictionary.quatro)
    print(my_dictionary["cinco"])
    print(my_dictionary.get("seis"))
    # get() allows returning a custom error value if the value does not exist.
    # print(my_dictionary.get("sete", "Error: value was not found"))

In JavaScript, the use of keys create a JavaScript Object instead of a proper dictionary. Although it can be used as a dictionary with proper care, a real dictionary can be used with the declaration new Map(). The dictionary created with Map is more modern and has more features, although it does not allow using square brackets to access values.

JavaScript, Lua e GDScript provide some alternatives for initialization and accessing values in dictionaries. Python defines a single way for each operation.

As it happens with arrays, some programming languages allow to separate the last entry with a comma, which can be useful when the using version control systems such as git.

For instance:

let my_dictionary = {
    "um" : 1,
    "dois": 2,
    "três": 3,
}

The benefits are the same described for arrays. The choice of style is yours.

Furthermore, GDScript also provides a get() (documentation) method to access values in a dictionary. The methods allow passing a second parameter as the value to be returned if the provided key does not exist in the dictionary.

Different Types for Keys and Values

Some implementations allow mixing data types for keys and for values in dictionaries. In general, personally I would normally recommend using a same type for the key and a same type for values. The type of the key and the value can be different (for instance, the key can be an integer number and the value can be a string), although mixing keys of different types and values of different types makes using the structure more complex (for instance, the key can be an integer number, a real number or a string).

Nevertheless, there are two interesting applications for mix values. The first is using a dictionary as an enumeration, in which pairs of key and value are mapped to each other. If the stored values do not change (or with proper care to update the pairs), one can create a dictionary that is, at the same time, its reverse dictionary.

let my_dictionary = new Map()
my_dictionary.set("um", 1)
my_dictionary.set(1, "um")
my_dictionary.set("dois", 2)
my_dictionary.set(2, "dois")
my_dictionary.set("três", 3)
my_dictionary.set(3, "três")

console.log(my_dictionary.get("um"))
console.log(my_dictionary.get(1))
my_dictionary = {
    "um": 1,
    1: "um",
    "dois": 2,
    2: "dois",
    "três": 3,
    3: "três"
}

print(my_dictionary["um"])
print(my_dictionary[1])
local my_dictionary = {
    um = 1,
    [1] = "um",
    dois = 2,
    [2] = "dois",
    ["três"] = 3,
    [3] = "três"
}

print(my_dictionary["um"])
print(my_dictionary[1])
extends Node

func _ready():
    var my_dictionary = {
        "um": 1,
        1: "um",
        "dois": 2,
        2: "dois",
        "três": 3,
        3: "três"
    }

    print(my_dictionary["um"])
    print(my_dictionary[1])

In the example, my_dictionary maps the name of a number (key) to its value (value), and the value of a number (key) with its name (value). Thus, instead of creating two dictionaries for pairs of values and keys, it is possible to define a single one (as it was a double dictionary).

The simplest way to create the reverse dictionary consists of using a repetition structure, using iterations. After inserting one of the sets of values, on can iterate key and value performing my_dictionary[value] = key. This assignment will insert the reverse combination in the dictionary.

The second interesting application is modeling records using dictionaries, which can be useful if the programming language does not provide features to create records or classes. For instance, it could be defined that a dictionary that stores the fields name (string), year (integer number) and rating (real number) to map ratings for books, songs or movies. This approach will be presented in an exercise.

Insertions, Removals, Cleaning and Size

Differently from arrays, that may have fixed sizes, dictionaries usually allow resizing. Thus, insertions and removals are common operations. In fact, instead of inserting null (or zero) values for keys as placeholders (a value considered null or neutral with the purpose of saving the spot for later use), it is often preferable to add a value to a dictionary only when the key will effectively store a value. In other words, dictionaries are often sparse.

The section describing initialization provided examples of data insertion. An important addendum is that the insertion of value for a key that does already exist will replace the old value. Dictionaries normally do not allow inserting duplicated keys. If there is a need to store duplicate keys with potentially different values, one can use a different data structure called a multimap. Another option is using trees that allow duplicate keys (for instance, a B-tree, which is often used in databases).

The removal can vary according to the implementation and/or programming language. Some implementations provide a subroutine or a command to remove an entry; others define a convention that assigning a null value to a key removes the entry from the dictionary.

let my_dictionary = {}

my_dictionary["a"] = 1.0
my_dictionary["b"] = 2.0
my_dictionary["c"] = 3.0
my_dictionary["d"] = 4.0
// JavaScript Object does not allow retrieving the size.

console.log(my_dictionary["a"])
my_dictionary["a"] = -1.0
console.log(my_dictionary["a"])

delete my_dictionary["a"]

// JavaScript Object does not provide a subroutine to clear the dictionary.

// Using Map.

my_dictionary = new Map()

my_dictionary.set("a", 1.0)
my_dictionary.set("b", 2.0)
my_dictionary.set("c", 3.0)
my_dictionary.set("d", 4.0)
console.log(my_dictionary.size)

console.log(my_dictionary.get("a"))
my_dictionary.set("a", -1.0)
console.log(my_dictionary.get("a"))

my_dictionary.delete("a")
console.log(my_dictionary.size)

my_dictionary.clear()
console.log(my_dictionary.size)
my_dictionary = {}

my_dictionary["a"] = 1.0
my_dictionary["b"] = 2.0
my_dictionary["c"] = 3.0
my_dictionary["d"] = 4.0
print(len(my_dictionary))

print(my_dictionary["a"])
my_dictionary["a"] = -1.0
print(my_dictionary["a"])

del my_dictionary["a"]
print(len(my_dictionary))

my_dictionary.clear()
print(len(my_dictionary))
function table_size(a_table)
    local size = 0
    for _ in pairs(a_table) do
        size = size + 1
    end

    return size
end

function clear_table(a_table)
    for key, _ in pairs(a_table) do
        a_table[key] = nil
    end
end

local my_dictionary = {}

my_dictionary["a"] = 1.0
my_dictionary["b"] = 2.0
my_dictionary["c"] = 3.0
my_dictionary["d"] = 4.0
print(table_size(my_dictionary))

print(my_dictionary["a"])
my_dictionary["a"] = -1.0
print(my_dictionary["a"])

my_dictionary["a"] = nil
print(table_size(my_dictionary))

clear_table(my_dictionary)
print(table_size(my_dictionary))
extends Node

func _ready():
    var my_dictionary = {}

    my_dictionary["a"] = 1.0
    my_dictionary["b"] = 2.0
    my_dictionary["c"] = 3.0
    my_dictionary["d"] = 4.0
    print(len(my_dictionary))

    print(my_dictionary["a"])
    my_dictionary["a"] = -1.0
    print(my_dictionary["a"])

    my_dictionary.erase("a")
    print(len(my_dictionary))

    my_dictionary.clear()
    print(len(my_dictionary))

Python and GDScript use len() to retrieve the size, similarly to getting the size string and arrays.

In JavaScript, only Map has a property (size; documentation) to retrieve the size of the dictionary and a method do clear it (clear(); documentation). The other methods have already been cited in the introduction about dictionaries.

Therefore, the JavaScript example illustrate some limitations of using a JavaScript Object as dictionary: it is not possible to obtain its size, nor clear the dictionary with a single call. Although it is possible to find the size using some alternatives, it is more appropriate using Map in more complex scenarios that require a more robust solution.

let my_dictionary = {"a": 1, "b": 2, "c": 3}
console.log(Object.keys(my_dictionary).length)

In Lua, there is not a predefined subroutine to retrieve the size of a table. One can define a function to do it, though it is important noticing that it must traverse the entire table to calculate the size. Thus, it can be preferable to store the size in a separate variable and implement subroutine to add and remove values, adjusting the variable with the size.

function table_insert(a_table, key, value, current_size)
    local size = current_size
    if (a_table[key] == nil) then
        size = size + 1
    end

    a_table[key] = value

    return size
end

function table_remove(a_table, key, current_size)
    if (a_table[key] ~= nil) then
        a_table[key] = nil
        return (current_size - 1)
    else
        return current_size
    end
end

function table_size(a_table)
    local size = 0
    for _ in pairs(a_table) do
        size = size + 1
    end

    return size
end

local my_dictionary = {}
local my_dictionary_size = 0

my_dictionary_size = table_insert(my_dictionary, "a", 1, my_dictionary_size)
print(my_dictionary_size, table_size(my_dictionary))

my_dictionary_size = table_insert(my_dictionary, "b", 2, my_dictionary_size)
print(my_dictionary_size, table_size(my_dictionary))

my_dictionary_size = table_insert(my_dictionary, "b", 3, my_dictionary_size)
print(my_dictionary_size, table_size(my_dictionary))

my_dictionary_size = table_remove(my_dictionary, "a", my_dictionary_size)
print(my_dictionary_size, table_size(my_dictionary))

my_dictionary_size = table_remove(my_dictionary, "a", my_dictionary_size)
print(my_dictionary_size, table_size(my_dictionary))

In the example, the function table_size() is called to compare the result to show that they are correct. In particular, the size cannot be incremented in an insertion for which the key already exists, or decremented in a deletion in which the key does not exist. The operations can be simpler with records. It is also possible to improve the process using a metatable (although this requires version 5.2 or more recent of Lua; documentation). To do this, one can use the function setmetatable() (documentation). Procedure:

function table_size(a_table)
    local size = 0 for _ in pairs(a_table) do
        size = size + 1
    end

    return size
end

-- Requer Lua 5.2 or more recent.
function create_table()
    local a_table = {}
    -- __len is a function to be called when the operator # is used.
    setmetatable(a_table, {__len = table_size})

    return a_table
end

local my_dictionary = create_table()
my_dictionary["a"] = 1.0
my_dictionary["b"] = 2.0
print(#my_dictionary)

my_dictionary["a"] = nil
print(#my_dictionary)

Metatables are a more advanced feature of Lua; thus, the previous block serves for illustrative purposes. It should be noted that the calculus of the size for the function table_size has complexity , which is not ideal. It can be better to store the size in a variable if it will be frequently needed.

Value Access and Iteration

Previous examples show how to access a value when the key is known. When the key is unknown, an alternative is iterating over all stored entries in the dictionary using iterators.

let my_dictionary = {
    "um": 1,
    "dois": 2,
    "três": 3
}

for (let key in my_dictionary) {
    let value = my_dictionary[key]
    console.log(key, value)
}

for (let key of Object.keys(my_dictionary)) {
    console.log(key)
}

for (let value of Object.values(my_dictionary)) {
    console.log(value)
}

// Proper dictionary.
my_dictionary = new Map()
my_dictionary.set("um", 1)
my_dictionary.set("dois", 2)
my_dictionary.set("três", 3)

for (let [key, value] of my_dictionary) {
    console.log(key, value)
}

for (let [key, value] of my_dictionary.entries()) {
    console.log(key, value)
}

for (let key of my_dictionary.keys()) {
    console.log(key)
}

for (let value of my_dictionary.values()) {
    console.log(value)
}
my_dictionary = {
    "um": 1,
    "dois": 2,
    "três": 3
}

for key in my_dictionary:
    value = my_dictionary[key]
    print(key, value)

for key, value in my_dictionary.items():
    print(key, value)

for key in my_dictionary.keys():
    print(key)

for value in my_dictionary.values():
    print(value)
local my_dictionary = {
    um = 1,
    dois = 2,
    ["três"] = 3
}

for key, value in pairs(my_dictionary) do
    print(key, value)
end

for key in pairs(my_dictionary) do
    print(key)
end

for _, value in pairs(my_dictionary) do
    print(value)
end
extends Node

func _ready():
    var my_dictionary = {
        "um": 1,
        "dois": 2,
        "três": 3
    }

    for key in my_dictionary:
        var value = my_dictionary[key]
        print(key, value)

    for key in my_dictionary.keys():
        print(key)

    for value in my_dictionary.values():
        print(value)

Subroutines:

The examples show that it is possible to iterate over combinations of keys and values, only keys, and only values. In some executions, the keys may not have a defined order.

Some implementations can sort values of keys, others can follow the order of insertion, others can use other criteria or random order. If there is a need for a criterion to access the keys, one can retrieve a arrays with the keys, sort them as suitable and use the array to access the values in the desired order.

Therefore, unless the documentation of the chosen dictionary implementation affirms otherwise, one should not assume an order for the keys during iteration. To define an order, arrays or lists can be better options.

There are two ways to perform searches in dictionaries:

  1. Search for a value stored in any key;
  2. Search for a key. Alternatively, the goal of searching for a key can be verifying whether it exists. Some dictionary implementations do not allow accessing the value of a nonexistent key, which can result in an exception or crash the program.

As the second case has already been exemplified, it is convenient starting by it.

Search For a Key and Checking Whether a Key Exists

In some dictionary implementations, a key must exist in the dictionary before a programmer tries to access its stored value. To do this, she/he can use subroutines like has() or hasKey(). If the language does not provide such subroutine, one alternative is to retrieve all keys in an array and perform the search on it. Another alternative is to iterate over the dictionary and search for they at each repetition.

The introductory example performed a verification before attempting to access a ky. It is reproduced here once again. To illustrate another approach using JavaScript Objects, the comparison with undefined was replaced to a call to hasOwnProperty() in the JavaScript version. The other versions were not modified.

// With JavaScript Object:
let language_ides = {
    "Python": "Thonny",
    "Lua": "ZeroBrane Studio",
    "GDScript": "Godot Engine",
    "JavaScript": "Text editor and Internet browser"
}

let searched_language = "GDScript"
if (language_ides.hasOwnProperty(searched_language)) {
    let recommendation = language_ides[searched_language]
    console.log(searched_language, ": ", recommendation)
} else {
    console.log("No recommendations for ", searched_language)
}

// With Map:
language_ides = new Map()
language_ides.set("Python", "Thonny")
language_ides.set("Lua", "ZeroBrane Studio")
language_ides.set("GDScript", "Godot Engine")
language_ides.set("JavaScript", "Text editor and Internet browser")

searched_language = "GDScript"
if (language_ides.has(searched_language)) {
    let recommendation = language_ides.get(searched_language)
    console.log(searched_language, ": ", recommendation)
} else {
    console.log("No recommendations for ", searched_language)
}
language_ides = {
    "Python": "Thonny",
    "Lua": "ZeroBrane Studio",
    "GDScript": "Godot Engine",
    "JavaScript": "Text editor and Internet browser"
}

searched_language = "GDScript"
if (searched_language in language_ides):
    recommendation = language_ides[searched_language]
    print(searched_language, ": ", recommendation)
else:
    print("No recommendations for ", searched_language)
local language_ides = {
    Python = "Thonny",
    Lua = "ZeroBrane Studio",
    GDScript = "Godot Engine",
    JavaScript = "Text editor and Internet browser"
}

local searched_language = "GDScript"
local recommendation = language_ides[searched_language]
if (searched_language ~= nil) then
    print(searched_language, ": ", recommendation)
else
    print("No recommendations for ", searched_language)
end
extends Node

func _ready():
    var language_ides = {
        "Python": "Thonny",
        "Lua": "ZeroBrane Studio",
        "GDScript": "Godot Engine",
        "JavaScript": "Text editor and Internet browser"
    }

    var searched_language = "GDScript"
    if (searched_language in language_ides):
        var recommendation = language_ides[searched_language]
        print(searched_language, ": ", recommendation)
    else:
        print("No recommendations for ", searched_language)

Subroutines para JavaScript: hasOwnProperty() (documentation), has() (documentation).

Programming languages that return a null value if the key does not exist, such as Lua, allowing trying to access a value to check whether the key exists. A drawback is that, in such cases, it is not possible to know whether there exists a key that stores the null value of whether the key does not exist in the dictionary. Although this may appear redundant, the difference can be important when modeling some problems.

Search By A Value Stored In Some Key

In the case of a search for a value stored in any key, one normally wants to know whether the value exists in the dictionary, and, if it exists, what key can access it.

function javascript_object_search_by_value(dictionary, desired_value) {
    for (let key in dictionary) {
        let value = dictionary[key]
        if (value === desired_value) {
            return key
        }
    }

    return undefined
}

let my_dictionary = {
    "um": 1,
    "dois": 2,
    "três": 3
}

console.log(javascript_object_search_by_value(my_dictionary, 2))
console.log(javascript_object_search_by_value(my_dictionary, -2))

// Proper dictionary.
function map_search_by_value(dictionary, desired_value) {
    for (let [key, value] of dictionary) {
        if (value === desired_value) {
            return key
        }
    }

    return undefined
}

my_dictionary = new Map()
my_dictionary.set("um", 1)
my_dictionary.set("dois", 2)
my_dictionary.set("três", 3)

console.log(map_search_by_value(my_dictionary, 2))
console.log(map_search_by_value(my_dictionary, -2))
def search_by_value(dictionary, desired_value):
    for key in dictionary:
        value = dictionary[key]
        if (value == desired_value):
            return key

    return None

my_dictionary = {
    "um": 1,
    "dois": 2,
    "três": 3
}

print(search_by_value(my_dictionary, 2))
print(search_by_value(my_dictionary, -2))
function search_by_value(dictionary, desired_value)
    for key, value in pairs(dictionary) do
        if (value == desired_value) then
            return key
        end
    end

    return nil
end

local my_dictionary = {
    um = 1,
    dois = 2,
    ["três"] = 3
}

print(search_by_value(my_dictionary, 2))
print(search_by_value(my_dictionary, -2))
extends Node

func search_by_value(dictionary, desired_value):
    for key in dictionary:
        var value = dictionary[key]
        if (value == desired_value):
            return key

    return null

func _ready():
    var my_dictionary = {
        "um": 1,
        "dois": 2,
        "três": 3
    }

    print(search_by_value(my_dictionary, 2))
    print(search_by_value(my_dictionary, -2))

The presented solution returns the first key identified with the value. In other words, if the value existed in multiple keys, only the first one would be returned. A second limitation is that the null value used as the return cannot be used as a key in the dictionary (which is not always a problem; many implementations do not allow using a null value as a key).

To remove the limitations, an array of identified keys could be returned. An empty array would mean that there were not occurrences of the key in the dictionary.

Furthermore, it should be noted that the search for a key is a slow operation (it is a sequential search, which complexity ; in potential, the complexity to access the value can make it even more complex), because it is the reverse of the definition of a dictionary. If one performs value checks often, it can be convenient defining a second dictionary in which the keys of the first are the values of the second, and vice-versa. Alternatively, in programming languages allowing mixing types of keys and values, a single dictionary could be used a double dictionary.

Accumulators

Accumulators in dictionary work similarly to their counterpart in arrays. The difference is that one can accumulate values and/or keys, depending on the problem.

let my_dictionary = {
    "um": 1,
    "dois": 2,
    "três": 3
}

let text = "{\n"
for (let key in my_dictionary) {
    let value = my_dictionary[key]
    text += "  " + key + ": " + value + ",\n"
}
text += "}"

console.log(text)

// Proper dictionary.
my_dictionary = new Map()
my_dictionary.set("um", 1)
my_dictionary.set("dois", 2)
my_dictionary.set("três", 3)

text = "{\n"
for (let [key, value] of my_dictionary) {
    text += "  " + key + ": " + value + ",\n"
}
text += "}"

console.log(text)
my_dictionary = {
    "um": 1,
    "dois": 2,
    "três": 3
}

text = "{\n"
for key in my_dictionary:
    value = my_dictionary[key]
    text += "  " + key + ": " + str(value) + ",\n"

text += "}"

print(text)
local my_dictionary = {
    um = 1,
    dois = 2,
    ["três"] = 3
}

local text = "{\n"
for key, value in pairs(my_dictionary) do
    text = text .. "  " .. key .. ": " .. value .. ",\n"
end
text = text .. "}"

print(text)
extends Node

func _ready():
    var my_dictionary = {
        "um": 1,
        "dois": 2,
        "três": 3
    }

    var text = "{\n"
    for key in my_dictionary:
        var value = my_dictionary[key]
        text += "  " + key + ": " + str(value) + ",\n"

    text += "}"

    print(text)

The previous example shows how to write all values of a dictionary in languages such as Lua, which has a print subroutine that does not show the entire contents of the array. An improved version could define a recursive function that also wrote arrays and/or dictionaries stores as keys and/or values. An example will be provided for the section describing dictionary copies.

Selection or Filtering Values

The selection or value filter is also similar to the corresponding operation for arrays.

let my_dictionary = {
    "um": 1,
    "dois": 2,
    "três": 3
}

let odd_keys = []
for (let key in my_dictionary) {
    let value = my_dictionary[key]
    if (value % 2 === 1) {
        odd_keys.push(key)
    }
}

console.log(odd_keys)

// Proper dictionary.
my_dictionary = new Map()
my_dictionary.set("um", 1)
my_dictionary.set("dois", 2)
my_dictionary.set("três", 3)

odd_keys = []
for (let [key, value] of my_dictionary) {
    if (value % 2 === 1) {
        odd_keys.push(key)
    }
}

console.log(odd_keys)
my_dictionary = {
    "um": 1,
    "dois": 2,
    "três": 3
}

odd_keys = []
for key in my_dictionary:
    value = my_dictionary[key]
    if (value % 2 == 1):
        odd_keys.append(key)

print(odd_keys)
local my_dictionary = {
    um = 1,
    dois = 2,
    ["três"] = 3
}

local odd_keys = {}
for key, value in pairs(my_dictionary) do
    if (value % 2 == 1) then
        table.insert(odd_keys, key)
    end
end

io.write("[ ")
for index, value in ipairs(odd_keys) do
    io.write(value .. ", ")
end
print("]")
extends Node

func _ready():
    var my_dictionary = {
        "um": 1,
        "dois": 2,
        "três": 3
    }

    var odd_keys = []
    for key in my_dictionary:
        var value = my_dictionary[key]
        if (value % 2 == 1):
            odd_keys.append(key)

    print(odd_keys)

The previous example select all keys based in their value (select if the value is odd). An advantage of storing keys is the ease to access the value. For instance, a construction such as my_dictionary[odd_keys[0]] allows accessing the value of the first obtained result, provided that index 0 is valid odd_keys.

Shallow Copy, Deep Copy and Shared Memory

Copy and memory sharing operations in dictionaries usually work similarly to the same operations for arrays. The rules are usually the same: the passage of dictionaries as parameters to subroutines is performed by reference. Furthermore, assignments share memory instead of creating copies.

To create independent copies, one must, thus, use (or define) a subroutine that makes deep copies. Copies of values will generate shallow copies, which are often not desirable (because data as arrays and other dictionaries stored internally will be shared).

In the following example, the original dictionary has the following types for each key:

  • integer: an integer numbers, that is, a value of a primitive type. A shallow copy or a deep copy can duplicate it;
  • real: a real number, that is, a value of a primitive type; A shallow copy or a deep copy can duplicate it;
  • array: an array or a list, that is, a composite type stored as a reference. Only a deep copy can duplicate it (and it must be recursive);
  • dictionary: a dictionary or a table, that is, a composite type stored as a reference. Only a deep copy can duplicate it (and it must be recursive).

At every change of value, it is worth inspecting each stored variable to verify changes in the value of original, shared_memory, shallow_copy and deep_copy. Each variable will be declared after illustrating the previous changes.

let original = {
    "integer": 1,
    "real": 1.1,
    "array": [1, 2, 3],
    "dictionary": {
        "hello": "Hello, my name is Franco!"
    }
}
console.log("Original: ", original)

console.log("\nShared memory")
let shared_memory = original
original["integer"] = 2
shared_memory["real"] = 2.2
console.log("Original: ", original)
console.log("Shared Memory: ", shared_memory)

console.log("\nShallow Copy")
let shallow_copy = {...original}
shared_memory["integer"] = 3
original["integer"] = 3
shared_memory["real"] = 3.3
shallow_copy["integer"] = -33
shallow_copy["real"] = -33.3
shallow_copy["array"][0] = 111
shallow_copy["dictionary"]["hello"] = "Hello, my name is Franco!"
original["array"][1] = 222
console.log("Original: ", original)
console.log("Shared Memory: ", shared_memory)
console.log("Shallow copy: ", shallow_copy)

console.log("\nDeep Copy")
// WARNING: It has limitations, as described for arrays.
let deep_copy = JSON.parse(JSON.stringify(original))
shared_memory["integer"] = 4
original["integer"] = 4
shared_memory["real"] = 4.4
shallow_copy["integer"] = -44
shallow_copy["real"] = -44.4
shallow_copy["array"][0] = 1234
shallow_copy["dictionary"]["hello"] = "Hello, my name is Franco!!!"
original["array"][1] = 1234
deep_copy["integer"] = 999
deep_copy["real"] = 999.99
deep_copy["array"][0] = 999
deep_copy["dictionary"]["hello"] = "Hello, Franco!"
console.log("Original: ", original)
console.log("Shared Memory: ", shared_memory)
console.log("Shallow copy: ", shallow_copy)
console.log("Deep copy: ", deep_copy)

// Version with Map.
original = new Map()
original.set("integer", 1)
original.set("real", 1.1)
original.set("array", [1, 2, 3])
original.set("dictionary", new Map().set("hello", "Hello, my name is Franco!"))
console.log("Original: ", original)

console.log("\nShared memory")
shared_memory = original
original.set("integer", 2)
shared_memory.set("real", 2.2)
console.log("Original: ", original)
console.log("Shared Memory: ", shared_memory)

console.log("\nShallow Copy")
shallow_copy = new Map(original)
shared_memory.set("integer", 3)
original.set("integer", 3)
shared_memory.set("real", 3.3)
shallow_copy.set("integer", -33)
shallow_copy.set("real", -33.3)
shallow_copy.get("array")[0] = 111
shallow_copy.get("dictionary").set("hello", "Hello, my name is Franco!")
original.get("array")[1] = 222
console.log("Original: ", original)
console.log("Shared Memory: ", shared_memory)
console.log("Shallow copy: ", shallow_copy)

console.log("\nDeep Copy")
// WARNING: The deep copy will be imperfect.
// Besides the previous limitations, an additional limitation is that the deep
// copy will not be recursive.
// The ideal is using a subroutine such as cloneDeep() for the Lodash library
// to create a better deep copy.
deep_copy = new Map(JSON.parse(JSON.stringify(Array.from(original))))
shared_memory.set("integer", 4)
original.set("integer", 4)
shared_memory.set("real", 4.4)
shallow_copy.set("integer", -44)
shallow_copy.set("real", -44.4)
shallow_copy.get("array")[0] = 1234
shallow_copy.get("dictionary").set("hello", "Hello, my name is Franco!!!")
original.get("array")[1] = 1234
deep_copy.set("integer", 999)
deep_copy.set("real", 999.99)
deep_copy.get("array")[0] = 999
// Here is the problem: the deep copy is not recursive.
// The value stored in "dictionary" will be a JavaScript Object instead of a Map.
// deep_copy.get("dictionary").set("hello", "Hello, Franco!") // Expected: Map.
deep_copy.get("dictionary")["hello"] = "Hello, Franco!" // Result: JavaScript Object.
console.log("Original: ", original)
console.log("Shared Memory: ", shared_memory)
console.log("Shallow copy: ", shallow_copy)
console.log("Deep copy: ", deep_copy)
import copy

original = {
    "integer": 1,
    "real": 1.1,
    "array": [1, 2, 3],
    "dictionary": {
        "hello": "Hello, my name is Franco!"
    }
}
print("Original: ", original)

print("\nShared memory")
shared_memory = original
original["integer"] = 2
shared_memory["real"] = 2.2
print("Original: ", original)
print("Shared Memory: ", shared_memory)

print("\nShallow Copy")
shallow_copy = copy.copy(original)
shared_memory["integer"] = 3
original["integer"] = 3
shared_memory["real"] = 3.3
shallow_copy["integer"] = -33
shallow_copy["real"] = -33.3
shallow_copy["array"][0] = 111
shallow_copy["dictionary"]["hello"] = "Hello, my name is Franco!"
original["array"][1] = 222
print("Original: ", original)
print("Shared Memory: ", shared_memory)
print("Shallow copy: ", shallow_copy)

print("\nDeep Copy")
deep_copy = copy.deepcopy(original)
shared_memory["integer"] = 4
original["integer"] = 4
shared_memory["real"] = 4.4
shallow_copy["integer"] = -44
shallow_copy["real"] = -44.4
shallow_copy["array"][0] = 1234
shallow_copy["dictionary"]["hello"] = "Hello, my name is Franco!!!"
original["array"][1] = 1234
deep_copy["integer"] = 999
deep_copy["real"] = 999.99
deep_copy["array"][0] = 999
deep_copy["dictionary"]["hello"] = "Hello, Franco!"
print("Original: ", original)
print("Shared Memory: ", shared_memory)
print("Shallow copy: ", shallow_copy)
print("Deep copy: ", deep_copy)
function write_indentation(level)
    for indentation = 1, level do
        -- If you prefer indentation with tabulation, use the following line.
        -- io.write("\t")
        -- To indent with spaces, you can choose the number in the next line.
        io.write("  ")
    end
end

-- a_table cannot have a reference for the own table.
-- Otherwise, the recursion will be infinite.
-- The procedures does not write the key recursively.
function write_table(a_table, level)
    -- This use of or corresponds to:
    -- if (level ~= nil) then
    --     level = level
    -- else
    --     level = 1
    -- end
    -- In this case, this initializes level if 1, if it is omitted.
    level = level or 1
    if (type(a_table) == "table") then
        io.write("{\n")
        for key, value in pairs(a_table) do
            write_indentation(level)
            io.write(tostring(key) .. ": ")
            -- The recursive call allows writing arrays or dictionaries
            -- that are written as value.
            write_table(value, level + 1)
            io.write("\n")
        end
        write_indentation(level - 1)
        io.write("},")
    else
        local quotes = ""
        if (type(a_table) == "string") then
            quotes = "\""
        end
        io.write(quotes .. tostring(a_table) .. quotes .. ",")
    end
end

-- <http://lua-users.org/wiki/CopyTable>
function shallowcopy(orig)
    local orig_type = type(orig)
    local copy
    if orig_type == 'table' then
        copy = {}
        for orig_key, orig_value in pairs(orig) do
            copy[orig_key] = orig_value
        end
    else -- number, string, boolean, etc
        copy = orig
    end
    return copy
end

-- <http://lua-users.org/wiki/CopyTable>
-- Save copied tables in `copies`, indexed by original table.
function deepcopy(orig, copies)
    copies = copies or {}
    local orig_type = type(orig)
    local copy
    if orig_type == 'table' then
        if copies[orig] then
            copy = copies[orig]
        else
            copy = {}
            copies[orig] = copy
            for orig_key, orig_value in next, orig, nil do
                copy[deepcopy(orig_key, copies)] = deepcopy(orig_value, copies)
            end
            setmetatable(copy, deepcopy(getmetatable(orig), copies))
        end
    else -- number, string, boolean, etc
        copy = orig
    end
    return copy
end

local original = {
    integer = 1,
    real = 1.1,
    array = {1, 2, 3},
    dictionary = {
        ["hello"] = "Hello, my name is Franco!"
    }
}

io.write("Original: ")
write_table(original)

print("\n\nShared memory")
local shared_memory = original
original["integer"] = 2
io.write("\nOriginal: ")
write_table(original)
io.write("\nShared Memory: ")
write_table(shared_memory)

print("\n\nShallow Copy")
local shallow_copy = shallowcopy(original)
shared_memory["integer"] = 3
original["integer"] = 3
shared_memory["real"] = 3.3
shallow_copy["integer"] = -33
shallow_copy["real"] = -33.3
shallow_copy["array"][1] = 111
shallow_copy["dictionary"]["hello"] = "Hello, my name is Franco!"
original["array"][2] = 222
io.write("\nOriginal: ")
write_table(original)
io.write("\nShared Memory: ")
write_table(shared_memory)
io.write("\nShallow copy: ")
write_table(shallow_copy)

print("\n\nDeep Copy")
local deep_copy = deepcopy(original)
shared_memory["integer"] = 4
original["integer"] = 4
shared_memory["real"] = 4.4
shallow_copy["integer"] = -44
shallow_copy["real"] = -44.4
shallow_copy["array"][1] = 1234
shallow_copy["dictionary"]["hello"] = "Hello, my name is Franco!!!"
original["array"][2] = 1234
deep_copy["integer"] = 999
deep_copy["real"] = 999.99
deep_copy["array"][0] = 999
deep_copy["dictionary"]["hello"] = "Hello, Franco!"
io.write("\nOriginal: ")
write_table(original)
io.write("\nShared Memory: ")
write_table(shared_memory)
io.write("\nShallow copy: ")
write_table(shallow_copy)
io.write("\nDeep copy: ")
write_table(deep_copy)
extends Node

func _ready():
    var original = {
        "integer": 1,
        "real": 1.1,
        "array": [1, 2, 3],
        "dictionary": {
            "hello": "Hello, my name is Franco!"
        }
    }
    print("Original: ", original)

    print("\nShared memory")
    var shared_memory = original
    original["integer"] = 2
    shared_memory["real"] = 2.2
    print("Original: ", original)
    print("Shared Memory: ", shared_memory)

    print("\nShallow Copy")
    var shallow_copy = original.duplicate() # or original.duplicate(false)
    shared_memory["integer"] = 3
    original["integer"] = 3
    shared_memory["real"] = 3.3
    shallow_copy["integer"] = -33
    shallow_copy["real"] = -33.3
    shallow_copy["array"][0] = 111
    shallow_copy["dictionary"]["hello"] = "Hello, my name is Franco!"
    original["array"][1] = 222
    print("Original: ", original)
    print("Shared Memory: ", shared_memory)
    print("Shallow copy: ", shallow_copy)

    print("\nDeep Copy")
    var deep_copy = original.duplicate(true)
    shared_memory["integer"] = 4
    original["integer"] = 4
    shared_memory["real"] = 4.4
    shallow_copy["integer"] = -44
    shallow_copy["real"] = -44.4
    shallow_copy["array"][0] = 1234
    shallow_copy["dictionary"]["hello"] = "Hello, my name is Franco!!!"
    original["array"][1] = 1234
    deep_copy["integer"] = 999
    deep_copy["real"] = 999.99
    deep_copy["array"][0] = 999
    deep_copy["dictionary"]["hello"] = "Hello, Franco!"
    print("Original: ", original)
    print("Shared Memory: ", shared_memory)
    print("Shallow copy: ", shallow_copy)
    print("Deep copy: ", deep_copy)

Subroutines (the same ones used for arrays):

The Lua implementation provides a recursive procedure to write a table, called write_table(). It is an improved version of write_array(), which is now able to write nested arrays and dictionaries. The procedures only uses features that have been described in this material. For more complex implementations, one can use an external library or consult this page of the Lua Users webiste.

The implementation in JavaScript using Map is imperfect for deep copy. As commented for arrays, ideally one should use an external library. This also applies to the version using JavaScript Object.

Moreover, the examples are long. Instead of running the entire program, it can be more interesting to write the variables after every modification to observe changes. Two good strategies to use the examples are:

  1. Analyze the changes line by line, writing the relevant dictionaries;
  2. Run each defined block, each of which started by a print().
    • print("Original: ", original);
    • print("\nShared memory");
    • print("\nShallow Copy");
    • print("\nDeep Copy").

It must be noted that only a deep copy effectively duplicates all values stored in a way to make them independent. Shallow copies duplicate only primitive types (or other mutable types), sharing types that are stored as references. Shared memory shares all values.

Therefore, when one wishes a unique copy, she/he must perform a deep copy. Any other type of assignment or copy will share the memory (either partially or fully).

Other Data Structures

Arrays and dictionaries are excellent data structures to learn programming and to prototype systems. Nevertheless, there exists other data structures that can be more convenient to solve certain problems or to improve the performance of a program.

Although performance concerns should not be a priority for beginners, greater convenience to solve problems is always desirable. Thus, this section presents some additional data structures. There are many other structures that will not be commented at this time; the ones that are described bellow can easily be implemented using arrays and/or dictionaries.

Stacks

The term stack has already appeared in previous topics. For instance, the term has been mentioned in stack overflow, which is a potential error when defining recursive subroutines that, among others, may lack a suitable base case.

Stacks are data structures in which the last inserted element is the first element to be removed. The structured is also known as last-in, first-out (LIFO).

One of the most traditional examples of stacks is a stack of plates. Another example is a stack of (plastic) chairs. A third example is a stack of pancakes, as those from movies from the United States of America. A fourth example are stacks of cards in card games. When one picks a card from the top, she/he pops a stack.

When one stacks a plate, the second one is placed over the first one. The third is placed on top of the second, and so on. The first plate to be removed is the one at the top of the stack.

Stacks in programming work the same way. For instance, if one pushes a value A, then a value B, then a value C, the result is a sequence [A, B, C]. Thus, one does not pick an order for insertion; the new element is always placed at the top.

When one pops a value from the stack, she/he does not choose one in particular, either, though the element of the top is removed. Thus, in the last example, the elements would be removed in the order C (first to leave, which was the last to enter), B and A (last to leave, which was the first to enter). The removal of all elements in a stack is called emptying the stack. It is an error trying to remove a value from an empty stack.

Naturally, real applications are usually more complex than the example. For instance, one could push A, B, C, then pop a value (C), resulting [A, B]. Next, she/he could push D and E, resulting [A, B, D, E]. Now, emptying the stack would remove the elements in the order E (first), D, B, A (last). Therefore, the available elements depend both on the order of pushes and the quantity (and order) of pops.

Stacks have a minimum size, with is zero, called a empty stack. Stacks may have a maximum size, called capacity (as in arrays, dictionaries and other data structures). A stack without an arbitrary limit is still restricted, in the last instance, by the quantity of memory in a computer. Thus, in practice, every data structure is finite in programming.

This limitation allows understanding what is a stack overflow. Simply, every process has a maximum quantity of memory that can be stored in a memory region called stack. Some allocations use memory from the stack. Local variables and function calls (returns and addresses) are usually allocated in the stack. If a program performs a number of recursive calls that extinguishes the available memory of the stack, the program cannot make new recursive calls. In fact, the program will be unable to allocate memory in the stack. Due to a lack of memory, a new attempt would result into a stack overflow.

Some programming languages provide stack implementations in the standard library. In languages that do not provide, one can use an external library or an array. To use an array, every new insertion should be performed at the end. Every removal should also be performed in the array. Alternatively, one can always add values and remove them from the beginning, although these operations are often more expensive.

let stack = []
// Insertion at the end.
stack.push(1)
stack.push(2)
stack.push(3)
console.log(stack) // [1, 2, 3]

// Removal at the end.
let removed_value = stack.pop() // 3
console.log(stack) // [1, 2]

stack.push(4)
console.log(stack) // [1, 2, 4]

stack.pop()
console.log(stack) // [1, 2]

stack.pop()
console.log(stack) // [1]

stack.pop()
console.log(stack) // []
stack = []
# Insertion at the end.
stack.append(1)
stack.append(2)
stack.append(3)
print(stack) # [1, 2, 3]

# Removal at the end.
removed_value = stack.pop() # 3
print(stack) # [1, 2]

stack.append(4)
print(stack) # [1, 2, 4]

stack.pop()
print(stack) # [1, 2]

stack.pop()
print(stack) # [1]

stack.pop()
print(stack) # []
function write_array(stack)
    if (stack == nil) then
        return
    end

    local size = #stack
    io.write("[")
    for index = 1, size do
        io.write(stack[index])
        if (index < size) then
            io.write(", ")
        end
    end
    print("]")
end

local stack = {}
-- Insertion at the end.
table.insert(stack, 1)
stack[#stack + 1] = 2 -- Alternative.
table.insert(stack, #stack + 1, 3) -- Equivalent construction.
write_array(stack) -- [1, 2, 3]

-- Removal at the end.
local removed_value = table.remove(stack) -- 3
write_array(stack) -- [1, 2]

table.insert(stack, 4)
write_array(stack) -- [1, 2, 4]

table.remove(stack)
write_array(stack) -- [1, 2]

table.remove(stack)
write_array(stack) -- [1]

table.remove(stack)
write_array(stack) -- []
extends Node

func _ready():
    var stack = []
    # Insertion at the end.
    stack.append(1)
    stack.push_back(2) # Alternative.
    stack.append(3)
    print(stack) # [1, 2, 3]

    # Removal at the end.
    var removed_value = stack.pop_back() # 3
    print(stack) # [1, 2]

    stack.append(4)
    print(stack) # [1, 2, 4]

    stack.pop_back()
    print(stack) # [1, 2]

    stack.pop_back()
    print(stack) # [1]

    stack.pop_back()
    print(stack) # []

When an element is popped, a call to pop() usually provides the removed value. If it does not return it, the stack will probably provide a method to consult that value that is stored at the top (for instance, top() or peek()). A consult to the top value does not remove it from the stack, although it allows discovering the stored value.

In implementations in which a removal or insertion at the end are expensive operations, it can be preferable to implement a proper stack data structure (or use a library providing an efficient implementation).

For instance, Python provides a stack implementation in the standard library.

from queue import LifoQueue

# For infinite size, one can use maxsize=0 or with a negative number.
stack = LifoQueue(maxsize = 3)
# Insertion at the end.
stack.put(1)
stack.put(2)
stack.put(3)
print(stack.qsize(), stack.full(), stack.empty()) # 3, True, False

# Removal at the end.
removed_value = stack.get() # 3
print(removed_value)

stack.put(4)

removed_value = stack.get() # 4
print(removed_value)

removed_value = stack.get() # 2
print(removed_value)

removed_value = stack.get() # 1
print(removed_value)
print(stack.empty())

The class LifoQueue (documentation) implements a stack. The method put() (documentation) pushes values and the method get() (documentation) pops the value at the top of the stack. If maxsize is defined in the constructor of LifoQueue, it is possible to define the maximum number of values to store. The method empty() (documentation) clears the stack.

Queues

In many cultures, people have the habit of forming queues for attendance of to perform some activity using the order of arrival.

Queues in programming are a data structure in which the elements are removed in the same order of insertion. The structure is also known as first-in, first-out (FIFO).

The operation of inserting an element into a queue is usually called enqueue or add. The operation of removing an element from the queue is usually called dequeue or remove. Like stacks, queues can have a maximum capacity, then can be emptied (removing values in the insertion order), and it is an error trying to dequeue an empty queue.

Queues are common in programs. For instance, a playlist for songs that reproduces in the defined order uses a queue. Memory for tasks called buffers are also defined as queues.

Similar to stacks, some programming languages provide implementations for queues in the standard library. In programming languages that do not provide them, one can use an external library or an array. To use an array, every insertion should happen at the end. Every should be performed at the beginning. It is also possible to insert every new element at the beginning, and remove every element from the end.

let queue = []
// Insertion at the end.
queue.push(1)
queue.push(2)
queue.push(3)
console.log(queue) // [1, 2, 3]

// Removal at the end.
let removed_value = queue.shift() // 1
console.log(queue) // [2, 3]

queue.push(4)
console.log(queue) // [2, 3, 4]

queue.shift()
console.log(queue) // [3, 4]

queue.shift()
console.log(queue) // [4]

queue.shift()
console.log(queue) // []
queue = []
# Insertion at the end.
queue.append(1)
queue.append(2)
queue.append(3)
print(queue) # [1, 2, 3]

# Removal at the end.
removed_value = queue.pop(0) # 1
print(queue) # [2, 3]

queue.append(4)
print(queue) # [2, 3, 4]

queue.pop(0)
print(queue) # [3, 4]

queue.pop(0)
print(queue) # [4]

queue.pop(0)
print(queue) # []
function write_array(queue)
    if (queue == nil) then
        return
    end

    local size = #queue
    io.write("[")
    for index = 1, size do
        io.write(queue[index])
        if (index < size) then
            io.write(", ")
        end
    end
    print("]")
end

local queue = {}
-- Insertion at the end.
table.insert(queue, 1)
queue[#queue + 1] = 2 -- Alternative.
table.insert(queue, #queue + 1, 3) -- Equivalent construction.
write_array(queue) -- [1, 2, 3]

-- Removal at the end.
local removed_value = table.remove(queue, 1) -- 1
write_array(queue) -- [2, 3]

table.insert(queue, 4)
write_array(queue) -- [2, 3, 4]

table.remove(queue, 1)
write_array(queue) -- [3, 4]

table.remove(queue, 1)
write_array(queue) -- [4]

table.remove(queue, 1)
write_array(queue) -- []
extends Node

func _ready():
    var queue = []
    # Insertion at the end.
    queue.append(1)
    queue.push_back(2) # Alternative.
    queue.append(3)
    print(queue) # [1, 2, 3]

    # Removal at the end.
    var removed_value = queue.pop_front() # 1
    print(queue) # [2, 3]

    queue.append(4)
    print(queue) # [2, 3, 4]

    queue.pop_front()
    print(queue) # [3, 4]

    queue.pop_front()
    print(queue) # [4]

    queue.pop_front()
    print(queue) # []

Although removals from the beginning can be expensive (for instance, ) in some array implementations, they can be used for prototypes or small size problems. For problems requiring large queues, it can be preferable to choose a queue implementation with better performance.

For instance, Python provides a queue implementation in the standard library.

from queue import Queue

# For infinite size, one can use maxsize=0 or with a negative number.
a_queue = Queue(maxsize = 3)
# Insertion at the end.
a_queue.put(1)
a_queue.put(2)
a_queue.put(3)
print(a_queue.qsize(), a_queue.full(), a_queue.empty()) # 3, True, False

# Removal from the beginning.
removed_value = a_queue.get() # 1
print(removed_value)

a_queue.put(4)

removed_value = a_queue.get() # 2
print(removed_value)

removed_value = a_queue.get() # 3
print(removed_value)

removed_value = a_queue.get() # 4
print(removed_value)
print(a_queue.empty())

The class Queue (documentation) implements a queue (for a queue without size limits, SimpleQueue can also be used). The interface is the same defined for LifoQueue, though put() enqueues a value at the last position and get() dequeues the first to be removed.

Priority Queues

A variation of queue implementation is a priority queue. In a priority queue, the order of insertion of elements in the queue can depend on the value of an attribute defined as a priority. Values with higher priority are added before values with lower priority, which allows them to be dequeued sooner.

The definition of the priority for each element can be easier using records, which will be the next topic of study.

At this moment, as an alternative, one can declare several queues, each of which with a certain priority. When trying to dequeue the next element, an element will be removed from the non-empty queue with the highest priority.

Python provides a priority queue implementation in the module queue called PriorityQueue.

Sets

As in Mathematics, sets are collections that cannot store duplicated values. The order of the values does not matter, though the data structure does not admit duplicates.

It is easy to implement a set using queues: it suffices to verify whether an element exists before trying to insert it. If the value already exists, it should not be inserted again the array. Otherwise, it must be added.

function add_to_set(array, value) {
    if (!array.includes(value)) {
        array.push(value)
    }
}

let set = []
add_to_set(set, "a")
add_to_set(set, "b")
add_to_set(set, "c")
add_to_set(set, "c")
add_to_set(set, "c")
console.log(set)
console.log(set.includes("a"))
console.log(set.includes("d"))
def add_to_set(array, value):
    if (not value in array):
        array.append(value)

set = []
add_to_set(set, "a")
add_to_set(set, "b")
add_to_set(set, "c")
add_to_set(set, "c")
add_to_set(set, "c")
print(set)
print("a" in set)
print("d" in set)
function write_array(queue)
    if (queue == nil) then
        return
    end

    local size = #queue
    io.write("[")
    for index = 1, size do
        io.write(queue[index])
        if (index < size) then
            io.write(", ")
        end
    end
    print("]")
end

function sequential_search(array, value)
    for index, current_value in ipairs(array) do
        if (value == current_value) then
            return index
        end
    end

    return -1
end

function belong_to_set(array, value)
    return (sequential_search(array, value) ~= -1)
end

function add_to_set(array, value)
    if (not belong_to_set(array, value)) then
        table.insert(array, value)
    end
end

set = {}
add_to_set(set, "a")
add_to_set(set, "b")
add_to_set(set, "c")
add_to_set(set, "c")
add_to_set(set, "c")
write_array(set)
print(belong_to_set(set, "a"))
print(belong_to_set(set, "d"))
extends Node

func add_to_set(array, value):
    if (not value in array):
        array.append(value)

func _ready():
    var set = []
    add_to_set(set, "a")
    add_to_set(set, "b")
    add_to_set(set, "c")
    add_to_set(set, "c")
    add_to_set(set, "c")
    print(set)
    print("a" in set)
    print("d" in set)

It is also possible to implement sets as dictionaries. In this case, a key can be used as the guarantee of the uniqueness of the value. When a value is added, any value can be inserted as the value (for instance, True) for the required key. As dictionaries do not allow duplicated keys, the value does not matter (it suffices that the key exists). To remove the value, it suffices to remove the entry using the key.

Python ({}; documentation) and JavaScript (Set; documentation) provide implementations for sets in the standard library.

let set = new Set()
set.add("a")
set.add("b")
set.add("c")
set.add("c")
set.add("c")
console.log(set) // "a", "b", "c"
console.log(set.has("a"))
console.log(set.has("d"))
set = {"a", "b", "c", "c", "c"}
print(set) # {"a", "b", "c"}
print("a" in set)
print("d" in set)

In Python, the syntax is similar to the used for dictionaries, though the elements have only value (instead of a combination of key and value).