Square-root decomposition

We call "square-root decomposition" any algorithm where square roots play a role in some way, even if not all of them are actually decompositions. In programming contests, some more common patterns for using square roots in algorithms have emerged. There are many different such patterns, but some more common ones are:

In this lecture, we will attempt to show some examples of how these patterns can be used, in hopes that the reader will be able to invent such solutions on their own in the future.

Partitioning the array

Find the sums of segments

Many readers have probably already solved this problem and can solve it with a complexity better than $O(q \sqrt{n})$. However, since this is the simplest possible problem of its type, we will work through it to demonstrate the principle.

Given an array $A$ of length $n$, initially consisting of zeroes. There are $q$ queries of two types: $1 \le n, q \le 10^5$.

Divide the array into blocks of size $B$. In each block, maintain the sum of elements in this block. Responding to the queries then works as follows:

Fulfilling a query of the first type always takes $O(1)$ time. To answer a query of the second type, we need to visit at most $O(\frac{n}{B})$ blocks and at most $O(B)$ non-block elements. In total, the running time of responding to the query is $O(\frac{n}{B} + B)$. This number is the smallest if we take $B = \sqrt{n}$, giving $O(q \sqrt{B})$ as the total complexity.

This solution is, at least conceptually, perhaps simpler than solving it with a Fenwick tree or segment tree. This is also why square-root methods are useful. For more complex queries, it can be easier to come up with a square-root algorithm.

Nastier operations on array subsegments

The next problem has somewhat more complex queries than the previous one. This problem, too, can be solved faster than $O(q \sqrt{n})$, but that solution would be pretty difficult, it's likely easier to write the square-root one.

Given an array $A$ of length $n$, initially filled with zeroes. There are $q$ queries of four types: $1 \le n, q \le 10^5$.

The solution looks much the same. Given an interval $l \ldots r$ (for a query of any type), we update all blocks completely contained in the interval in $O(1)$ time each, and then we will update individually the elements in the interval whose blocks aren't completely in the array.

To do that, we should maintain two bits of information for every block:

In addition, we maintain the value of $A[i]$ for all $i$ with two exceptions:

Holes

There are $n$ holes in a row, numbered $1 \ldots n$. Each hole is characterized by its "power", an integer $A[k]$. If a ball reaches hole $k$, it will immediately bounce to $k + A[k]$. If there is no hole with such number, the ball will exit. There are $q$ queries, of two types: $1 \le n, q \le 10^5$.

Similar to the previous problems, we partition the row of holes into blocks of size $\sqrt{n}$. For each hole $k$ we maintain the following values:

Now we need to figure out how to use this information to respond to queries of the second type and how to efficiently update this information after a query of the first type.

Queries of the second type are simpler. We just simulate the situation, using the "shortcuts" $C[k]$ as much as possible. Now the ball will make at most $\sqrt{n}$ jumps.

Now suppose that we are asked to update the power of hole $k$. First, update $B[k]$. Now we need to recompute $C[k]$, which will always be $B[k]$ or $C[B[k]]$.

In addition, we need to update some $C[\cdot]$ values, where the path goes through hole $k$. To do this, it's sufficient to update (from right to left!) the $C[\cdot]$ of holes left of $k$ in the same block as $k$: the $C[\cdot]$ of other holes can't change.

In the original problem, we must also print the number of jumps in addition to the last visited hole. This is only a slight addition: we can remember, together with $C[k]$ the number of jumps this shortcut encompasses. These values can be updated at the same time and in the same fashion as $C[k]$.

Partitioning the list of queries

Given a $n \times m$ grid, each square of which can be black or white. In the beginning, all squares are white. We are given $q$ queries, each of one of two types:
  1. Given $x, y$. Paint the square $(x, y)$ black.
  2. Given $x, y$. Find the distance of $(x, y)$ to the nearest black square.
In this context, the distance between $x_1, y_1$ ja $x_2, y_2$ means $\left|x_1 - x_2\right| + \left|y_1 - y_2\right|$, i.e. the so-called Manhattan distance.
$1 \le nm \le 2 \cdot 10^5$, $1 \le q \le 2 \cdot 10^5$.

We start by describing two different slow methods for solving this problem:

  1. Maintain the list of black squares. A query of the first type is handled by simply appending to this list; to answer a query of the second type, go through all black squares and compute the closest one.
  2. For each square, compute the distance to the closest black square. To handle a query of the first type, run a BFS to recompute all distances. To answer a query of the second type, we can simply look the answer up from the table.
These methods are quite different, but both too slow.
  1. Using the first method, queries of the first type take $O(1)$ time, but to answer queries of the second type, all previous queries have to be reexamined: this takes $O(q)$ time. We get a complexity of $O(q^2)$. In the worst case, this takes $q^2 = 4 \cdot 10^{10}$, which is too slow.
  2. Using the second method, queries of the first type may necessitate updating the distance to nearly every square, taking $O(mn)$ time. Queries of the second type take $O(1)$ time. In total we get a complexity of $O(nmq)$. In the worst case, $nmq = 4 \cdot 10^{10}$, again too slow.

The solution is to combine the two methods, for example like this:

Let's compute the complexity. In total, the complexity is thus $O(\frac{qnm}{B} + qB)$. We can show that the complexity is minimal when $B = \sqrt{nm}$. We get a running time of $O(\frac{qnm}{\sqrt{nm}} + q\sqrt{mn}) = O(q \sqrt{mn})$. In the worst case, $q \sqrt{mn} \approx 2 \cdot 10^5 \cdot 450 = 9 \cdot 10^7$, which is sufficiently fastx.

Similarly, rebuilding some data structure after every $O(\sqrt{n})$ queries can be used to solve the following problem:

Given an array $A$ of length $n$. There are $q$ queries of two types: $1 \le n, m \le 3 \cdot 10^5$.