$\newcommand{\abs}[1]{\left| #1 \right|}$
$\renewcommand{\le}{\leqslant}$
$\renewcommand{\ge}{\geqslant}$
Square-root decomposition
We call "square-root decomposition" any algorithm where square roots play a role in
some way, even if not all of them are actually decompositions. In programming contests,
some more common patterns for using square roots in algorithms have emerged.
There are many different such patterns, but some more common ones are:
- Partitioning an array
- Partitioning the list of queries
- Dividing some "things" into "big" and "small" ones
- Mo's algorithm
In this lecture, we will attempt to show some examples of how these patterns can
be used, in hopes that the reader will be able to invent such solutions on their own
in the future.
Partitioning the array
Find the sums of segments
Many readers have probably already solved this problem and can solve it with
a complexity better than $O(q \sqrt{n})$. However, since this is the simplest
possible problem of its type, we will work through it to demonstrate the principle.
Given an array $A$ of length $n$, initially consisting of zeroes.
There are $q$ queries of two types:
- Given $i$ and $v$. Add $v$ to $A[i]$.
- Given $l$ and $r$. Find $A[l] + A[l + 1] + \cdots + A[r]$.
$1 \le n, q \le 10^5$.
Divide the array into blocks of size $B$. In each block, maintain the
sum of elements in this block. Responding to the queries then works as follows:
- For queries of the first type, add $v$ to $A[i]$ and also add $v$ to the
corresponding block.
- For queries of the second type, add to the answer all blocks that are
completely contained in the range $l \ldots r$. Then add to the answer
all elements who are in the range, but whose blocks are not.
Fulfilling a query of the first type always takes $O(1)$ time.
To answer a query of the second type, we need to visit at most $O(\frac{n}{B})$ blocks
and at most $O(B)$ non-block elements. In total, the running time of responding
to the query is $O(\frac{n}{B} + B)$. This number is the smallest if we take
$B = \sqrt{n}$, giving $O(q \sqrt{B})$ as the total complexity.
This solution is, at least conceptually, perhaps simpler than solving it with a
Fenwick tree or segment tree. This is also why square-root methods are useful.
For more complex queries, it can be easier to come up with a square-root algorithm.
Nastier operations on array subsegments
The next problem has somewhat more complex queries than the previous one. This problem,
too, can be solved faster than $O(q \sqrt{n})$, but that solution would be pretty
difficult, it's likely easier to write the square-root one.
Given an array $A$ of length $n$, initially filled with zeroes.
There are $q$ queries of four types:
- Given $l$ and $r$. Replace $A[l], A[l + 1], \ldots, A[r]$ with ones.
- Given $l$ and $r$. Replace $A[l], A[l + 1], \ldots, A[r]$ with zeros.
- Given $l$ and $r$. Flip all of $A[l], \ldots, A[r]$: ones become zeros
and vice versa.
- Given $l$ and $r$. Find the least $i$ such that $l \le i \le r$ ja $A[i] = 0$.
$1 \le n, q \le 10^5$.
The solution looks much the same. Given an interval $l \ldots r$ (for a query of any type),
we update all blocks completely contained in the interval in $O(1)$ time each, and
then we will update individually the elements in the interval whose blocks aren't completely
in the array.
To do that, we should maintain two bits of information for every block:
- Are all elements in this block zeros?
- Are all elements in this block flipped?
In addition, we maintain the value of $A[i]$ for all $i$ with two exceptions:
- If we have marked a block as zeroed out, we treat $A[i]$ as zero regardless
of its value.
- If we have marked a block as flipped, we treat $A[i]$ as its inverse.
Holes
There are $n$ holes in a row, numbered $1 \ldots n$.
Each hole is characterized by its "power", an integer $A[k]$. If a ball reaches
hole
$k$, it will immediately bounce to $k + A[k]$. If there is no hole with such number,
the ball will exit. There are $q$ queries, of two types:
- Given $k$ and $x$. Set the power of hole $k$ to $x$.
- Given $k$. Throw a ball in the hole $k$ and print the index of the last hole,
the ball will visit before exiting.
$1 \le n, q \le 10^5$.
Similar to the previous problems, we partition the row of holes into blocks of
size $\sqrt{n}$.
For each hole $k$ we maintain the following values:
- $B[k]$: The index of the hole the ball will bounce to from $k$.
- $C[k]$: The index of the first hole the ball will reach from $k$ that is
not in the same block as $k$.
Now we need to figure out how to use this information to respond to queries of
the second type and how to efficiently update this information after a query of
the first type.
Queries of the second type are simpler. We just simulate the situation, using
the "shortcuts" $C[k]$ as much as possible. Now the ball will make at most
$\sqrt{n}$ jumps.
Now suppose that we are asked to update the power of hole $k$. First, update
$B[k]$. Now we need to recompute $C[k]$, which will always be $B[k]$ or $C[B[k]]$.
In addition, we need to update some $C[\cdot]$ values, where the path goes
through hole $k$. To do this, it's sufficient to update (from right to left!)
the $C[\cdot]$ of holes left of $k$ in the same block as $k$:
the $C[\cdot]$ of other holes can't change.
In the original problem, we must also print the number of jumps in addition
to the last visited hole. This is only a slight addition: we can remember,
together with $C[k]$ the number of jumps this shortcut encompasses.
These values can be updated at the same time and in the same fashion as $C[k]$.
Partitioning the list of queries
Given a $n \times m$ grid, each square of which can be black or white.
In the beginning, all squares are white. We are given $q$ queries, each of
one of two types:
- Given $x, y$. Paint the square $(x, y)$ black.
- Given $x, y$. Find the distance of $(x, y)$ to the nearest black square.
In this context, the distance between $x_1, y_1$ ja $x_2, y_2$ means
$\left|x_1 - x_2\right| + \left|y_1 - y_2\right|$, i.e. the so-called
Manhattan distance.
$1 \le nm \le 2 \cdot 10^5$, $1 \le q \le 2 \cdot 10^5$.
We start by describing two different slow methods for solving this problem:
- Maintain the list of black squares. A query of the first type is handled
by simply
appending to this list; to answer a query of the second type, go through
all black squares and compute the closest one.
- For each square, compute the distance to the closest black square.
To handle a query of the first type, run a BFS to recompute all distances.
To answer a query of the second type, we can simply look the answer up
from the table.
These methods are quite different, but both too slow.
- Using the first method, queries of the first type take $O(1)$ time,
but to answer queries of the second type, all previous queries have to be
reexamined: this takes $O(q)$ time. We get a complexity of $O(q^2)$.
In the worst case, this takes $q^2 = 4 \cdot 10^{10}$, which is too slow.
- Using the second method, queries of the first type may necessitate
updating the distance to nearly every square, taking $O(mn)$ time.
Queries of the second type take $O(1)$ time. In total we get a complexity
of $O(nmq)$. In the worst case, $nmq = 4 \cdot 10^{10}$, again too slow.
The solution is to combine the two methods, for example like this:
- After each query of the first type, add $(x, y)$ to a buffer.
- When the buffer is full (the number of squares exceeds some $B$),
run a simultaneous BFS from all the squares in the buffer to update
the table of distances. Then clear the buffer.
- To answer queries of the second type, first look the answer up
from the table of distances. Then go through all the squares in the buffer
and compute their distances from $(x, y)$ too. The answer is the minimum of
all those values.
Let's compute the complexity.
- Handling a query of the first type takes $O(1)$ time, except when the buffer becomes
full.
- If the buffer does become full, we take $O(nm)$ time. The buffer will be full
$O(\frac{q}{B})$ times.
- Handling queries of the second type takes $O(B)$ time.
In total, the complexity is thus $O(\frac{qnm}{B} + qB)$. We can show that the
complexity is minimal when $B = \sqrt{nm}$. We get a running time of
$O(\frac{qnm}{\sqrt{nm}} + q\sqrt{mn}) = O(q \sqrt{mn})$. In the worst case,
$q \sqrt{mn} \approx 2 \cdot 10^5 \cdot 450 = 9 \cdot 10^7$, which is sufficiently fastx.
Similarly, rebuilding some data structure after every $O(\sqrt{n})$ queries can
be used to solve the following problem:
Given an array $A$ of length $n$. There are $q$ queries of two types:
- Given $l$ and $r$. For each $i$, where $l \le i \le r$, add to $A[i]$
the number $F_{i - l + 1}$, where $F_k$ is the $k$-th Fibonacci number.
- Given $l$ and $r$. Find the sum $A[l] + A[l + 1] + \cdots + A[r]$ modulo
$10^9 + 9$.
$1 \le n, m \le 3 \cdot 10^5$.