-- Apr 28 In-Class Exercise Thread
N = 500000
l_avg = 3
lt of "California", "Business", "Tax" and "Return" in the whole corpus are 104, 501, 254, 607 respectively.
DFR score = Sum of every term t in query q = (q_t x (1-P_2,t) x (-logP_1,t))
Or DFR score = (q_t x (log(1+l/t_N)+ f_t,d x log(1 + N/l_t)) / (f_t,d + 1))
f't,d = ft,d * log(1 + l_avg/l_d)
so, DFR score = Sum of every t in q(q_t * (log(1+l/t_N)+ f'_t,d * log(1 + N/l_t)) / (f'_t,d + 1) )
a) Query: "California Business Tax"
“California”:
1 * (log(1+104/500,000)+ 1 * log(1 + 500,000/104)) / (1 + 1) = 6.116
For term “Business”:
1 * (log(1+501/500,000)+ 1 * log(1 + 500,000/501)) / (1 + 1) = 4.983
For term “Tax”:
1 * (log(1+254/500,000)+ 1 * log(1 + 500,000/254)) / (1 + 1) = 5.472
DFR score = 6.116 + 4.983 + 5.472 = 16.571
b) Query: "California Business Tax Return":
“California”:
1 * (log(1+104/500,000)+ log(3/4) * log(1 + 500,000/104)) / (log(3/4) + 1) = 5.464
For term “Business”:
1 * (log(1+501/500,000)+ log(3/4) * log(1 + 500,000/501)) / (log(3/4) + 1) = 4.452
For term “Tax”:
1 * (log(1+254/500,000)+ log(3/4) * log(1 + 500,000/254)) / (log(3/4) + 1) = 4.889
For term “Return”:
1 * (log(1+607/500,000)+ log(3/4) * log(1 + 500,000/607)) / (log(3/4) + 1) = 4.329
DFR score= 5.464 + 4.452 + 4.889 + 4.329 = 19.133
(
Edited: 2021-05-03)
N = 500000
l_avg = 3
lt of "California", "Business", "Tax" and "Return" in the whole corpus are 104, 501, 254, 607 respectively.
DFR score = Sum of every term t in query q = (q_t x (1-P_2,t) x (-logP_1,t))
Or DFR score = (q_t x (log(1+l/t_N)+ f_t,d x log(1 + N/l_t)) / (f_t,d + 1))
f't,d = ft,d * log(1 + l_avg/l_d)
so, DFR score = Sum of every t in q(q_t * (log(1+l/t_N)+ f'_t,d * log(1 + N/l_t)) / (f'_t,d + 1) )
a) Query: "California Business Tax"
“California”:
1 * (log(1+104/500,000)+ 1 * log(1 + 500,000/104)) / (1 + 1) = 6.116
For term “Business”:
1 * (log(1+501/500,000)+ 1 * log(1 + 500,000/501)) / (1 + 1) = 4.983
For term “Tax”:
1 * (log(1+254/500,000)+ 1 * log(1 + 500,000/254)) / (1 + 1) = 5.472
DFR score = 6.116 + 4.983 + 5.472 = 16.571
b) Query: "California Business Tax Return":
“California”:
1 * (log(1+104/500,000)+ log(3/4) * log(1 + 500,000/104)) / (log(3/4) + 1) = 5.464
For term “Business”:
1 * (log(1+501/500,000)+ log(3/4) * log(1 + 500,000/501)) / (log(3/4) + 1) = 4.452
For term “Tax”:
1 * (log(1+254/500,000)+ log(3/4) * log(1 + 500,000/254)) / (log(3/4) + 1) = 4.889
For term “Return”:
1 * (log(1+607/500,000)+ log(3/4) * log(1 + 500,000/607)) / (log(3/4) + 1) = 4.329
DFR score= 5.464 + 4.452 + 4.889 + 4.329 = 19.133