Skip to content

Introduce an optimized extend_by_hashes function#131

Open
aneubeck wants to merge 3 commits into
mainfrom
aneubeck/fastgeo
Open

Introduce an optimized extend_by_hashes function#131
aneubeck wants to merge 3 commits into
mainfrom
aneubeck/fastgeo

Conversation

@aneubeck

Copy link
Copy Markdown
Collaborator

This is need to speed up construction of geofilters during indexing.
Speed up is at about 2.4 times (depending on the exact usage pattern of course)

Copilot AI review requested due to automatic review settings June 24, 2026 10:19
@aneubeck aneubeck requested a review from a team as a code owner June 24, 2026 10:19
GitHub Advanced Security started work on behalf of aneubeck June 24, 2026 10:20 View session
GitHub Advanced Security finished work on behalf of aneubeck June 24, 2026 10:20

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a batched insertion API to GeoDiffCount to speed up filter construction/insertion by avoiding per-hash rebalancing of the dense/sparse split, along with supporting bit-chunk utilities and benchmarks to quantify the improvement.

Changes:

  • Added GeoDiffCount::extend_by_hashes plus estimate_split_bucket to batch-xor many hashes while restoring invariants once per batch.
  • Added BitVec::toggler / BitToggler to efficiently toggle many bits without repeatedly resolving the internal Cow.
  • Added parity_bit_positions to produce parity-aware BitChunk streams, plus new correctness tests and Criterion benchmarks comparing push_hash vs batched extend.
Show a summary per file
File Description
crates/geo_filters/src/diff_count/bitvec.rs Adds a toggler helper to reduce overhead when flipping many bits in a hot loop.
crates/geo_filters/src/diff_count.rs Implements extend_by_hashes, split estimation, and adds tests validating equivalence to per-hash insertion.
crates/geo_filters/src/config/lookup.rs Optimizes lookup table storage and lookup path (includes new unsafe indexing).
crates/geo_filters/src/config/bitchunks.rs Adds parity_bit_positions and expands test coverage for bit-chunk iterators.
crates/geo_filters/evaluation/performance.rs Adds Criterion benchmarks for batch construction and batch insertion.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 5/5 changed files
  • Comments generated: 2

Comment on lines +223 to +228
#[inline]
pub fn toggle(&mut self, index: usize) {
debug_assert!(index < self.num_bits);
let (block_idx, bit_idx) = index.into_index_and_bit();
self.blocks[block_idx] ^= bit_idx.into_block();
}
Comment on lines 42 to +47
let idx = hash >> (32 - self.b - 1);
let offset = (hash < self.buckets[idx].1) as usize;
offset + self.buckets[idx].0 + (1 << self.b) * levels
// SAFETY: `hash` was masked to 32 bits, so `idx = hash >> (31 - b)` holds at most `b + 1`
// significant bits and is therefore always `< 2^(b+1) == 2 << b == self.buckets.len()`.
debug_assert!(idx < self.buckets.len());
let (base, threshold) = *unsafe { self.buckets.get_unchecked(idx) };
let offset = (hash < threshold as usize) as usize;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants