2016-05-11

May 11 Discussion Thread.

Post your answers to today's Let's Experiment here.
Best, Chris
Post your answers to today's Let's Experiment here. Best, Chris

-- May 11 Discussion Thread
For flickr.com's robot.txt, User-Agents include "coccoc."
 
There are directives common to a landing page such as /abuse, /signin, and /search, which link to the Report Abuse, Sign-in, and Search pages respectively.
 
Some tags in the sitemap xml file are <image:loc>, the location of an image, <image:caption> the caption of the image, and <lastmod>, the date and time which the image was last modified.
(Edited: 2016-05-11)
For flickr.com's robot.txt, User-Agents include "coccoc." There are directives common to a landing page such as /abuse, /signin, and /search, which link to the Report Abuse, Sign-in, and Search pages respectively. Some tags in the sitemap xml file are <image:loc>, the location of an image, <image:caption> the caption of the image, and <lastmod>, the date and time which the image was last modified.

-- May 11 Discussion Thread
User-agent: Twitterbot , coccoc , * Directives- Disallow: [path] Tags: 1) <urlset>- Encapsulates the file and references the current protocol standard. 2) <url>-Parent tag for each URL entry. The remaining tags are children of this tag. 3) <loc>- URL of the page. This URL must begin with the protocol (such as https) and end with a trailing slash, if your web server requires it 4) <lastmod>- The date of last modification of the file
User-agent: Twitterbot , coccoc , * Directives- Disallow: [path] Tags: 1) <urlset>- Encapsulates the file and references the current protocol standard. 2) <url>-Parent tag for each URL entry. The remaining tags are children of this tag. 3) <loc>- URL of the page. This URL must begin with the protocol (such as https) and end with a trailing slash, if your web server requires it 4) <lastmod>- The date of last modification of the file

-- May 11 Discussion Thread
Under robot.txt, User-agent: Twitterbot
Disallow: /report_abuse.gne Disallow: /abuse Disallow: /signin Disallow: /search Disallow: /groups/10millionphotos/
In sitemap.xml file, <URL> <loc> in web app, <lastmod>=last modified, <image:loc>=where a pict is <image:title> = title of image
Under robot.txt, User-agent: Twitterbot Disallow: /report_abuse.gne Disallow: /abuse Disallow: /signin Disallow: /search Disallow: /groups/10millionphotos/ In sitemap.xml file, <URL> <loc> in web app, <lastmod>=last modified, <image:loc>=where a pict is <image:title> = title of image

-- May 11 Discussion Thread
user-agents describe when this agent crawl flikr.com, some activities is defined.
directives: Disallow: the agent is not allowed to crawl the specific page
 
In sitemap. It tries to find out the link in <loc></loc>
user-agents describe when this agent crawl flikr.com, some activities is defined. directives: Disallow: the agent is not allowed to crawl the specific page In sitemap. It tries to find out the link in <loc></loc>

-- May 11 Discussion Thread
User-agent: Twitterbot User-agent: coccoc User-agent: * Directives: Disallow, Crawl-delay
Tags: <image:image>, <image:title>, <lastmod>, <url>, <loc>
User-agent: Twitterbot User-agent: coccoc User-agent: * Directives: Disallow, Crawl-delay Tags: <image:image>, <image:title>, <lastmod>, <url>, <loc>

-- May 11 Discussion Thread
User-agent: Twitterbot User-agent: coccoc User-agent: *
Set of directives:
	
Disallow - paths that must not be accessed by the designated crawlers. User-agent - a means of identifying a specific crawler or a set of crawlers. crawl-delay - crawler is allowed to access the site once every 10 seconds.
tags in sitemaps xml files: urlset - base tag url - set of urls under the base tag loc - location to the websites {url} lastmod - > last date the url was modifed
User-agent: Twitterbot User-agent: coccoc User-agent: * Set of directives: Disallow - paths that must not be accessed by the designated crawlers. User-agent - a means of identifying a specific crawler or a set of crawlers. crawl-delay - crawler is allowed to access the site once every 10 seconds. tags in sitemaps xml files: urlset - base tag url - set of urls under the base tag loc - location to the websites {url} lastmod - > last date the url was modifed
X