Please help me find examples of PHP algorithms to classify the $url variable exactly like YIOOP (+1/1). - 17/11/2019 Yioop Software Help

Hello Mr. Chris.
I have an algorithmic exercise to do and I have a problem for a few days. I wanted you to help me define 4 different classification scores of the 4 classification algorithms already used by Yioop. This will allow me to better understand the concept of Yioop.
So, assuming that the PHP variable that groups the downloaded Links is a $url array , how to define in PHP code examples, the 4 different Yioop algorithms (scores) for the links (represented in a $ url variable array) in 4 different variables of $basicScore , $centroidScore , $centroidWeightedScore and finally $graphScore ?
$basicScore = The Basic summarizer computes a summary by proceeding top to bottom through the document looking for block level tags such as h1, div, p, etc. Based on the distance from the top of the document, the tag type, and the length of the tag's contents, a score for its contents is calculated. The highest scoring regions in the whole document up to the summary length are then returned in the order they appeared in the document as the summary.
$centroidScore = The Centroid summarizer computes a summary by stripping all tags from the document and then splitting the document into "sentence" units. For each sentence, a vector is made with components the terms appearing in the sentence, and with values the term frequency times the inverse sentence frequency of that term. Using the scores for each sentence, an average sentence vector is computed. Sentences are then ranked by their normalized inner product with the average sentence. Top scoring sentence up to the summary length are then returned in the order they appeared in the document as the summary.
$centroidWeightedScore = The Centroid-Weighted summarizer computes a summary by stripping all tags from the document and then splitting the document into "sentence" units. Then for each sentence it makes a normalized vector of term frequencies (no inverse sentence frequencies). It then computes a weighted average of these vectors where the weighting is based on distance from the start of the documents. The sentence closest to the average sentence based on inner product is determined. The components of this sentence are deleted from the average, and then the next best sentence is determined using the residual average. This process is continued until up-to-summary-length text has been extracted. Sentences found up to the summary length are then returned in the order they appeared in the document as the summary.
$graphScore = The Graph-Based summarizer computes a summary by stripping all tags from the document and then splitting the document into "sentence" units. An weighted adjacency matrix between sentences is then computed. The distance between two sentences is calculated using a distortion measure. Using this adjacency matrix, a sentence rank is computed using the power method (similar to Google's Page rank). Top scoring sentence up to the summary length are then returned in the order they appeared in the document as the summary.
Hello Mr. Chris. ---- I have an algorithmic exercise to do and I have a problem for a few days. I wanted you to help me define 4 different classification scores of the 4 classification algorithms already used by Yioop. This will allow me to better understand the concept of Yioop. ---- So, assuming that the PHP variable that groups the downloaded Links is a '''$url''' ''array'', how to define in PHP code examples, the 4 different Yioop algorithms (scores) for the links (represented in a $ url variable array) in 4 different variables of '''$basicScore''', '''$centroidScore''', '''$centroidWeightedScore''' and finally '''$graphScore'''? ---- '''$basicScore''' = ''The Basic summarizer computes a summary by proceeding top to bottom through the document looking for block level tags such as h1, div, p, etc. Based on the distance from the top of the document, the tag type, and the length of the tag's contents, a score for its contents is calculated. The highest scoring regions in the whole document up to the summary length are then returned in the order they appeared in the document as the summary.'' ---- '''$centroidScore =''' ''The Centroid summarizer computes a summary by stripping all tags from the document and then splitting the document into "sentence" units. For each sentence, a vector is made with components the terms appearing in the sentence, and with values the term frequency times the inverse sentence frequency of that term. Using the scores for each sentence, an average sentence vector is computed. Sentences are then ranked by their normalized inner product with the average sentence. Top scoring sentence up to the summary length are then returned in the order they appeared in the document as the summary.'' ---- '''$centroidWeightedScore =''' ''The Centroid-Weighted summarizer computes a summary by stripping all tags from the document and then splitting the document into "sentence" units. Then for each sentence it makes a normalized vector of term frequencies (no inverse sentence frequencies). It then computes a weighted average of these vectors where the weighting is based on distance from the start of the documents. The sentence closest to the average sentence based on inner product is determined. The components of this sentence are deleted from the average, and then the next best sentence is determined using the residual average. This process is continued until up-to-summary-length text has been extracted. Sentences found up to the summary length are then returned in the order they appeared in the document as the summary.'' ---- '''$graphScore =''' ''The Graph-Based summarizer computes a summary by stripping all tags from the document and then splitting the document into "sentence" units. An weighted adjacency matrix between sentences is then computed. The distance between two sentences is calculated using a distortion measure. Using this adjacency matrix, a sentence rank is computed using the power method (similar to Google's Page rank). Top scoring sentence up to the summary length are then returned in the order they appeared in the document as the summary.''
 

-- Please help me find examples of PHP algorithms to classify the $url variable exactly like YIOOP (+0/0). - 25/11/2019 Yioop Software Help

The Code for the summarizers can be found in
 src/library/summarizers.
A class file for each summarizer can be found there. The main method that is called by Yioop is
 getSummary()
Roughly, what this does is split the text string passed to it into sentences with respect to the current language. Then each sentence is scored using the score technique of the particular summarizer and sorted according to this score. The summarizer then keeps the highest scoring sentences up to a desired summary length. These are then output in the orginal order they appeared in the document. The above examples you give say how the scoring is computed for each document in words, the code though can be found in the files in that folder.
Best,
Chris

Last Edited: 25/11/2019
The Code for the summarizers can be found in src/library/summarizers. A class file for each summarizer can be found there. The main method that is called by Yioop is getSummary() Roughly, what this does is split the text string passed to it into sentences with respect to the current language. Then each sentence is scored using the score technique of the particular summarizer and sorted according to this score. The summarizer then keeps the highest scoring sentences up to a desired summary length. These are then output in the orginal order they appeared in the document. The above examples you give say how the scoring is computed for each document in words, the code though can be found in the files in that folder. Best, Chris
 

-- Please help me find examples of PHP algorithms to classify the $url variable exactly like YIOOP (+0/0). - 26/11/2019 Yioop Software Help

Hi Mr. Chris and thank you for your answer.
In fact, I am currently working on a small web crawler based on PHP's DOMElement .
The thing is that currently, the Web Crawler can download the Pages, but I would like to add some more additional features, for example: classification scores, the ability to download millions of URLs at a time and very quickly, compression and decompression of downloaded data . In short, some points that YIOOP already has.
I would like you to help me to finalize my personal project against an invoice that you will pay via Paypal.
So, can you help me to finalize my little personal project please ??? I only have a few important points left to finalize this project based on PHP's DOMElement . And I can get an amount to make up for your lost time on my project.
Thank you in advance for your reply

Last Edited: 26/11/2019
Hi Mr. Chris and thank you for your answer. In fact, I am currently working on a small web crawler based on '''PHP's DOMElement'''. The thing is that currently, the Web Crawler can download the Pages, but I would like to add some more additional features, for example: '''''classification scores, the ability to download millions of URLs at a time and very quickly, compression and decompression of downloaded data'''''. In short, some points that YIOOP already has. I would like you to help me to finalize my personal project against an invoice that you will pay via Paypal. So, can you help me to finalize my little personal project please ??? I only have a few important points left to finalize this project based on '''PHP's DOMElement'''. And I can get an amount to make up for your lost time on my project. Thank you in advance for your reply
 
[X ]