Yioop - PHP Search Engine

Mar 4 In-Class Exercise.

Please post your solution to the Mar 4 In-Class Exercise to this thread.

Best, Chris

(Edited: 2020-03-03)

Please post your solution to the Mar 4 In-Class Exercise to this thread. Best, Chris

-- Mar 4 In-Class Exercise

First, we set aside one of our 3 blocks of memory M to read in the file, and we set our other 2 blocks for our two buckets, B1 and B2.

Using our hash function (xmod2), we can go through the file and hash each record, sending each record to its corresponding block

For example, all instances of 2 will be hashed as 2mod2 = 0 into B1 and all instances of 1 or 3 will be hashed as 1mod2 = 3mod2 = 1 into B2

Now we have our 2 buckets,

0: B1 = {2, 2, 2, 2}

1: B2 = {1, 1, 1, 1, 3, 3, 3, 3}

We only deal with one bucket at a time in each pass in order to fit all the records in memory, so first we work on B1 and do one pass in order to eliminate all instances of 2 except for the first, making our new B1

B1 = {2}

and now we can work on our B2 and do one pass to eliminate all instances of 1 and 3 except for the first of each, making our new bucket B2

B2 = {1, 3}

Now we have no duplicates left in our data set using two pass duplicate elimination using hashing.

Note: we can fit B2 in memory if we use all blocks of our memory. 3x3 = 9 records maximum, B2 contains 8 records

First, we set aside one of our 3 blocks of memory M to read in the file, and we set our other 2 blocks for our two buckets, B1 and B2. Using our hash function (xmod2), we can go through the file and hash each record, sending each record to its corresponding block For example, all instances of 2 will be hashed as 2mod2 = 0 into B1 and all instances of 1 or 3 will be hashed as 1mod2 = 3mod2 = 1 into B2 Now we have our 2 buckets, 0: B1 = {2, 2, 2, 2} 1: B2 = {1, 1, 1, 1, 3, 3, 3, 3} We only deal with one bucket at a time in each pass in order to fit all the records in memory, so first we work on B1 and do one pass in order to eliminate all instances of 2 except for the first, making our new B1 B1 = {2} and now we can work on our B2 and do one pass to eliminate all instances of 1 and 3 except for the first of each, making our new bucket B2 B2 = {1, 3} Now we have no duplicates left in our data set using two pass duplicate elimination using hashing. Note: we can fit B2 in memory if we use all blocks of our memory. 3x3 = 9 records maximum, B2 contains 8 records

-- Mar 4 In-Class Exercise

First Pass: Hash the values from file into two buckets. Bucket 0: 2,2,2,2 Bucket 1: 1,1,1,1,3,3,3,3

Second Pass: Load Bucket 0 into memory and perform one-pass(sort and elim) to eliminate duplicates output a single 2. Then load Bucket 1 into memory and again perform one-pass to eliminate duplicates output 1,3.

Done.

First Pass: Hash the values from file into two buckets. Bucket 0: 2,2,2,2 Bucket 1: 1,1,1,1,3,3,3,3 Second Pass: Load Bucket 0 into memory and perform one-pass(sort and elim) to eliminate duplicates output a single 2. Then load Bucket 1 into memory and again perform one-pass to eliminate duplicates output 1,3. Done.

-- Mar 4 In-Class Exercise

1st pass: hash to 0 -> memory looks like [ [222] [2 ] ] (1 block with three 2's and another with one 2) -> sort and eliminate -> output "2"

2nd pass: hash to 1 -> memory looks like [ [131] [313] [13] ] (1 block of "131" another with "313" and another with "13") -> sort and eliminate -> output "1" and "3"

(Edited: 2020-03-04)

1st pass: hash to 0 -> memory looks like [ [222] [2 ] ] (1 block with three 2's and another with one 2) -> sort and eliminate -> output "2" 2nd pass: hash to 1 -> memory looks like [ [131] [313] [13] ] (1 block of "131" another with "313" and another with "13") -> sort and eliminate -> output "1" and "3"

-- Mar 4 In-Class Exercise

Pass 1: use %2 to hash all the records into two buckets:

ex: record 1 % 2 = 1 => record 1 goes into bucket 1

Bucket 0: 2,2,2,2

Bucket 1: 1,1,1,1,3,3,3,3

Pass 2: read Bucket 0 into memory to eliminate duplicates we get the output: 2 since extra 2 is removed. Then read Bucket 1 into memory and eliminate duplicates we get 1,3.

(Edited: 2020-03-04)

Pass 1: use %2 to hash all the records into two buckets: ex: record 1 % 2 = 1 => record 1 goes into bucket 1 Bucket 0: 2,2,2,2 Bucket 1: 1,1,1,1,3,3,3,3 Pass 2: read Bucket 0 into memory to eliminate duplicates we get the output: 2 since extra 2 is removed. Then read Bucket 1 into memory and eliminate duplicates we get 1,3.

-- Mar 4 In-Class Exercise

Blocks hold three (integer) records. Three blocks of main memory. Elements are hashed into their buckets.

1 % 2 = 1
2 % 2 = 0
3 % 2 = 1
1 % 2 = 1
2 % 2 = 0
3 % 2 = 1
1 % 2 = 1
2 % 2 = 0
3 % 2 = 1

Bucket 0: (2), (2), (2)

Bucket 1: (1), (3), (1), (3), (1), (3)

Pass 1 -> Bucket 0: (2) //Bucket 0 is loaded into memory and contains a singular 2 after duplicate elimination.

Pass 2 -> Bucket 1: (1), (3) //Bucket 1 is loaded into memory and now contains 1 and 3 after duplicate elimination.

(Edited: 2020-03-04)

Blocks hold three (integer) records. Three blocks of main memory. Elements are hashed into their buckets. * 1 % 2 = 1 * 2 % 2 = 0 * 3 % 2 = 1 * 1 % 2 = 1 * 2 % 2 = 0 * 3 % 2 = 1 * 1 % 2 = 1 * 2 % 2 = 0 * 3 % 2 = 1 Bucket 0: (2), (2), (2) Bucket 1: (1), (3), (1), (3), (1), (3) Pass 1 -> Bucket 0: (2) //Bucket 0 is loaded into memory and contains a singular 2 after duplicate elimination. Pass 2 -> Bucket 1: (1), (3) //Bucket 1 is loaded into memory and now contains 1 and 3 after duplicate elimination.

-- Mar 4 In-Class Exercise

Each block can hold 3 records.

We have 3 blocks of main memory.

M-1 blocks for hashing buckets

Use mod 2 as hash function.

1 mod 2 = 1 and 3 mod 2 = 1, so all records with value 1 and 3 will be hashed into the same bucket 1.

2 mod 2 = 0, so all records with value 2 will be hashed into the same bucket 2.

First pass, hashing all the records to buckets. Now, bucket 1 = {1, 3, 1, 3, 1, 3, 1, 3}, bucket 2 = {2, 2, 2, 2}.

Second pass, we try to eliminate duplicate in two buckets. But we can only work one bucket in the main memory at a time.

So, we work with bucket 1 first, and eliminate all the duplicates in this bucket => bucket 1 = {1, 3}.

Then, we work on bucket 2 => bucket 2 = {2}.

So far, we have two buckets with no duplicate values.

Each block can hold 3 records. We have 3 blocks of main memory. M-1 blocks for hashing buckets Use mod 2 as hash function. 1 mod 2 = 1 and 3 mod 2 = 1, so all records with value 1 and 3 will be hashed into the same bucket 1. 2 mod 2 = 0, so all records with value 2 will be hashed into the same bucket 2. First pass, hashing all the records to buckets. Now, bucket 1 = {1, 3, 1, 3, 1, 3, 1, 3}, bucket 2 = {2, 2, 2, 2}. Second pass, we try to eliminate duplicate in two buckets. But we can only work one bucket in the main memory at a time. So, we work with bucket 1 first, and eliminate all the duplicates in this bucket => bucket 1 = {1, 3}. Then, we work on bucket 2 => bucket 2 = {2}. So far, we have two buckets with no duplicate values.

-- Mar 4 In-Class Exercise

First : doing the hash part mod 2 so we have 2 bucket as

Bucket 0 : {2,2,2} Bucket 1 : {1,1,1,1,3,3,3,3}

Next: pass each bucket to memory and eliminate the duplicate Bucket 0 : {2} Bucket 1 : {1,3}

First : doing the hash part mod 2 so we have 2 bucket as Bucket 0 : {2,2,2} Bucket 1 : {1,1,1,1,3,3,3,3} Next: pass each bucket to memory and eliminate the duplicate Bucket 0 : {2} Bucket 1 : {1,3}

-- Mar 4 In-Class Exercise

hash the records into buckets by applying the hash function. You will get 2 buckets:

bucket 0: [2,2,2], bucket 1: [1,3,1,3,1,3]

In the second pass, we read each bucket individually into main memory, as long as it can fit into memory, and apply a single pass duplicate elimination for it.

duplicate elimination bucket 0: [2]

duplicates eliminated bucket 1: [1,3]

(Edited: 2020-03-04)

hash the records into buckets by applying the hash function. You will get 2 buckets: bucket 0: [2,2,2], bucket 1: [1,3,1,3,1,3] In the second pass, we read each bucket individually into main memory, as long as it can fit into memory, and apply a single pass duplicate elimination for it. duplicate elimination bucket 0: [2] duplicates eliminated bucket 1: [1,3]

-- Mar 4 In-Class Exercise

Blocks hold 3 records Records hold 1 integer Memory is 3 blocks big (2 available buckets)

Hash Function is mod 2 Integer ... Mod 2 1 1 2 0 3 1

Eliminate the duplicates from indexes 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11

	1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2,  3

hashes 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1

1 is hashed to bucket1(1) 2 is hashed to bucket0(2) 3 is hashed to bucket1(1,3) 1 is hashed to bucket1(1,3,1) Bucket1(1,3,1) is placed in disk >>>>>> (1,3,1) 2 is hashed to bucket0(2,2) 3 is hashed to bucket1(3) 1 is hashed to bucket1(3,1) 2 is hashed to bucket0(2,2,2) Bucket0(2,2,2) is placed in disk>>>>>>> (2,2,2) 3 is hashed to bucket1(3,1,3) Bucket1(3,1,3) is placed in disk >>>>> (1,3,1,3,1,3) 1 is hashed to bucket1(1) 2 is hashed to bucket0(2) 3 is hashed to bucket1(1,3)

Bucket0 combines to (2,2,2,2) Bucket1 combines to (1,3,1,3,1,3,1,3)

read bucket0 into memory, eliminate 2,2,2 Bucket 0 finally ends up as (2) read bucket1 into memory, eliminate 1,1,1,3,3,3 Bucket 1 finally ends up as (1,3)

(Edited: 2020-03-04)

Blocks hold 3 records Records hold 1 integer Memory is 3 blocks big (2 available buckets) Hash Function is mod 2 Integer ... Mod 2 1 1 2 0 3 1 Eliminate the duplicates from indexes 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3 hashes 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1 1 is hashed to bucket1(1) 2 is hashed to bucket0(2) 3 is hashed to bucket1(1,3) 1 is hashed to bucket1(1,3,1) Bucket1(1,3,1) is placed in disk >>>>>> (1,3,1) 2 is hashed to bucket0(2,2) 3 is hashed to bucket1(3) 1 is hashed to bucket1(3,1) 2 is hashed to bucket0(2,2,2) Bucket0(2,2,2) is placed in disk>>>>>>> (2,2,2) 3 is hashed to bucket1(3,1,3) Bucket1(3,1,3) is placed in disk >>>>> (1,3,1,3,1,3) 1 is hashed to bucket1(1) 2 is hashed to bucket0(2) 3 is hashed to bucket1(1,3) Bucket0 combines to (2,2,2,2) Bucket1 combines to (1,3,1,3,1,3,1,3) read bucket0 into memory, eliminate 2,2,2 Bucket 0 finally ends up as (2) read bucket1 into memory, eliminate 1,1,1,3,3,3 Bucket 1 finally ends up as (1,3)