(Optional) Hashmaps
Authors: Darren Yao, Benjamin Qi
Contributors: Neo Wang, Nathan Gong
Maintaining collections of distinct elements with hashing.
Warning!
You can almost always use ordered sets and maps instead, but it's good to know that these exist.
Resources | ||||
---|---|---|---|---|
IUSACO | module is based off this |
Hashing
Hashing refers to assigning a unique code to every variable/object which allows insertions, deletions, and searches in time, albeit with a high constant factor, as hashing requires a large constant number of operations. However, as the name implies, elements are not ordered in any meaningful way, so traversals of an unordered set will return elements in some arbitrary order.
Custom Hashing
C++
There is no built in method for hashing pairs or vectors. Namely,
unordered_set<vector<int>>
does not work. In this case, we can use an
sorted map (which supports all of the functions used
in the code above) or declare our own hash function.
Resources | ||||
---|---|---|---|---|
Mark Nelson | How to create user-defined hash function for |
The link provides an example of hashing pairs of strings, as well as other data structures like pairs.
#include <bits/stdc++.h>using namespace std;typedef pair<int, int> pi;#define f first#define s secondstruct hashPi {size_t operator()(const pi &p) const { return p.f ^ p.s; }};int main() { unordered_map<pi, int, hashPi> um; }
#include <bits/stdc++.h>using namespace std;typedef pair<int, int> pi;#define f first#define s secondnamespace std {template <> struct hash<pi> {size_t operator()(const pi &p) const { return p.f ^ p.s; }};} // namespace stdint main() { unordered_map<pi, int> um; }
Java
Java has its own hash functions for pre-defined objects like ArrayList
's.
However, a custom hash function is still needed for user-defined objects.
In order to create one, we can implement the hashCode
method.
Additionally, in order for HashSet
's and HashMap
's to work with
a custom class, we must also implement the equals
method.
import java.io.*;import java.util.*;public class HashTest {public static void main(String[] args) {// uses custom hash function in class PairSet<Pair> set = new HashSet<>();}static class Pair {
However, this hash function is quite bad; if we insert then they will all be mapped to the same bucket (so it would easily be hacked in a Codeforces contest).
C++
Java
In Java, an easy way to hash custom objects is to use the builtin
Objects.hash()
method. This method takes in multiple objects and uses them
to create a hash code. However, this is easily hackable, as the code
below shows.
import java.io.*;import java.util.*;public class HashTest {public static void main(String[] args) {// uses custom hash function in class PairSet<Pair> set = new HashSet<>();for (int i = 0; i < 100000; ++i) {// Pair p = new Pair(i, 0); // ~0.2s on ide.usaco.guidePair p = new Pair(i, -31 * i); // >5s
Python
A better way to hash pairs would be to use polynomial hashing with a randomized base (as described in this module).
Anti-Hash Tests
The built-in hashing algorithm for integers in C++ is vulnerable to pathological tests, causing abnormally slow runtimes. We describe the issue below and how to fix it. Java users are not affected (see this comment).
Should I worry about anti-hash tests in USACO?
No - historically, no USACO problem has included an anti-hash test. However, these sorts of tests often appear in Codeforces, especially in educational rounds, where open hacking is allowed.
Resources | ||||
---|---|---|---|---|
CF | Explanation of this problem and how to fix it. |
Hack Case Generator
In order to make your unordered_map
unhackable, the hash function you select must have the following property: it is hard to find integers and such that but , where is the prime modulus corresponding to the number of buckets in your unordered map. Note that:
- The default hash function, , obviously does not satisfy this property, because you can simply choose and .
- If open hacking is allowed, then any deterministic hash function does not satisfy this property, because a hacker could generate a test case specifically to break your solution. You should introduce randomness to ensure that your hash function changes from run to run.
Here is a drop-in replacement for unordered_map
that seems to work well in practice:
Resources | ||||
---|---|---|---|---|
Benq (from KACTL) |
struct chash {// any random-ish large odd number will doconst uint64_t C = uint64_t(2e18 * PI) + 71;// random 32-bit numberconst uint32_t RANDOM =chrono::steady_clock::now().time_since_epoch().count();size_t operator()(uint64_t x) const {// see https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.htmlreturn __builtin_bswap64((x ^ RANDOM) * C);}};template <class K, class V> using cmap = unordered_map<K, V, chash>;// example usage: cmap<int, int>
A Faster HashMap in C++
In C++, it's usually faster to use gp_hash_table<K, V>
instead of
unordered_map<K, V>
. Read / writes are much faster than unordered_map
.
The documentation is rather confusing, so I'll just summarize the most useful
functions here. If you need a replacement for unordered_set<K>
, use
gp_hash_table<K, null_type>
.
Resources | ||||
---|---|---|---|---|
CF | Introduces gp_hash_table | |||
GCC | documentation | |||
Benq (from KACTL) |
Custom Hashing
Like unordered_map
, gp_hash_table
is vulernable to hash collisions if a
cusetom hash function isn't used. We recommend using the same custom hash
we used for unordered map above:
#include <ext/pb_ds/assoc_container.hpp>using namespace __gnu_pbds;gp_hash_table<int, int, chash> table;
Resizing
Unordered map has
reserve
.
Calling this function before inserting any elements can result in a constant
factor speedup.
We can modify the declaration of gp_hash_table
so that it supports the
resize
function, which operates similarly.
template <class K, class V>using ht = gp_hash_table<K, V, hash<K>, equal_to<K>, direct_mask_range_hashing<>, linear_probe_fn<>,hash_standard_resize_policy<hash_exponential_size_policy<>,hash_load_check_resize_trigger<>, true>>;
These are the same template arguments as the default gp_hash_table
, except
false
has been changed to true
. This modification allows us to change the
actual size of the hash table.
int main() {ht<int, null_type> g;g.resize(5);cout << g.get_actual_size() << "\n"; // 8cout << g.size() << "\n"; // 0}
When calling g.resize(x)
, x
is rounded up to the nearest power of 2. Then
the actual size of g
is changed to be equal to x
(unless x < g.size()
, in
which case an error is thrown).
Resources | ||||
---|---|---|---|---|
GCC | documentation |
Furthermore, if we construct g
with the following arguments:
ht<int, null_type> g({}, {}, {}, {}, {1 << 16});
then the actual size of g
is always at least 1<<16
(regardless of calls to
resize
). The last argument must be a power of 2 (or else errors will be
thrown).
Solving 3SUM
Focus Problem – try your best to solve this problem before continuing!
View Internal SolutionSince all the values are quite small, you can use an array instead of a hashmap. But if you didn't read the constraints carefully enough, you're in luck!
Warning!
This code passes in 1180ms when initial capacity is specified, and TLEs when it is not.
#include <bits/stdc++.h> // see C++ Tips & Tricksusing namespace std;using ll = long long;using vi = vector<int>;#define pb push_back#define rsz resize#define all(x) begin(x), end(x)#define sz(x) (int)(x).size()
Problems
Status | Source | Problem Name | Difficulty | Tags | |
---|---|---|---|---|---|
CSES | Normal | Show TagsSet |
Module Progress:
Join the USACO Forum!
Stuck on a problem, or don't understand a module? Join the USACO Forum and get help from other competitive programmers!