blob: 05caecca9dbd1cd3dd86f3d583fe07b9e9206052 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
|
1. provide output diagnostics that explain how many input keys total,
how many after dealing with static links, and finally, after the
algorithm is complete, how many dynamic duplicates do we now
have.
2. fix up GATHER_STATISTICS for all instrumentation.
3. Useful idea:
a. Generate the wordlist as a contiguous block of keywords, as before.
This wordlist *must* be sorted by hash value.
b. generate the lookup_array, which are an array of signed {chars,shorts,ints},
which ever allows full coverage of the wordlist dimensions. If the
value v, where v = lookup_array[hash(str,len)], is >= 0, then we
simply use this result as a direct access into wordlist to snag
the keyword for comparison.
c. Otherwise, if v is < 0 this is an indication that we'll need to
search through some number of duplicates hash values. Using a
hash linking scheme we'd then index into a duplicate_address
table that would provide the starting index and total length of
the duplicate entries to consider sequentially.
|