[ Prev ]
2021-05-03

-- Apr 28 In-Class Exercise Thread
N = 500000
 
l_avg = 3
 
lt of "California", "Business", "Tax" and "Return" in the whole corpus are 104, 501, 254, 607 respectively.
 
DFR score = Sum of every term t in query q = (q_t x (1-P_2,t) x (-logP_1,t)) Or DFR score = (q_t x (log(1+l/t_N)+ f_t,d x log(1 + N/l_t)) / (f_t,d + 1))
 
f't,d = ft,d * log(1 + l_avg/l_d) so, DFR score = Sum of every t in q(q_t * (log(1+l/t_N)+ f'_t,d * log(1 + N/l_t)) / (f'_t,d + 1) )
 
a) Query: "California Business Tax" “California”: 1 * (log(1+104/500,000)+ 1 * log(1 + 500,000/104)) / (1 + 1) = 6.116 For term “Business”: 1 * (log(1+501/500,000)+ 1 * log(1 + 500,000/501)) / (1 + 1) = 4.983 For term “Tax”: 1 * (log(1+254/500,000)+ 1 * log(1 + 500,000/254)) / (1 + 1) = 5.472 DFR score = 6.116 + 4.983 + 5.472 = 16.571
 
b) Query: "California Business Tax Return": “California”: 1 * (log(1+104/500,000)+ log(3/4) * log(1 + 500,000/104)) / (log(3/4) + 1) = 5.464 For term “Business”: 1 * (log(1+501/500,000)+ log(3/4) * log(1 + 500,000/501)) / (log(3/4) + 1) = 4.452 For term “Tax”: 1 * (log(1+254/500,000)+ log(3/4) * log(1 + 500,000/254)) / (log(3/4) + 1) = 4.889 For term “Return”: 1 * (log(1+607/500,000)+ log(3/4) * log(1 + 500,000/607)) / (log(3/4) + 1) = 4.329 DFR score= 5.464 + 4.452 + 4.889 + 4.329 = 19.133
(Edited: 2021-05-03)
N = 500000 l_avg = 3 lt of "California", "Business", "Tax" and "Return" in the whole corpus are 104, 501, 254, 607 respectively. DFR score = Sum of every term t in query q = (q_t x (1-P_2,t) x (-logP_1,t)) Or DFR score = (q_t x (log(1+l/t_N)+ f_t,d x log(1 + N/l_t)) / (f_t,d + 1)) f't,d = ft,d * log(1 + l_avg/l_d) so, DFR score = Sum of every t in q(q_t * (log(1+l/t_N)+ f'_t,d * log(1 + N/l_t)) / (f'_t,d + 1) ) a) Query: "California Business Tax" “California”: 1 * (log(1+104/500,000)+ 1 * log(1 + 500,000/104)) / (1 + 1) = 6.116 For term “Business”: 1 * (log(1+501/500,000)+ 1 * log(1 + 500,000/501)) / (1 + 1) = 4.983 For term “Tax”: 1 * (log(1+254/500,000)+ 1 * log(1 + 500,000/254)) / (1 + 1) = 5.472 DFR score = 6.116 + 4.983 + 5.472 = 16.571 b) Query: "California Business Tax Return": “California”: 1 * (log(1+104/500,000)+ log(3/4) * log(1 + 500,000/104)) / (log(3/4) + 1) = 5.464 For term “Business”: 1 * (log(1+501/500,000)+ log(3/4) * log(1 + 500,000/501)) / (log(3/4) + 1) = 4.452 For term “Tax”: 1 * (log(1+254/500,000)+ log(3/4) * log(1 + 500,000/254)) / (log(3/4) + 1) = 4.889 For term “Return”: 1 * (log(1+607/500,000)+ log(3/4) * log(1 + 500,000/607)) / (log(3/4) + 1) = 4.329 DFR score= 5.464 + 4.452 + 4.889 + 4.329 = 19.133
X