Information about the fields in the "URL list" tab

The fields shown in the URL list determine how the pages are crawled and what information is shown in the XML Google Sitemap file. These fields are:

Manual

This field is selected when you manually enter a URL into the table. This field is usually used in combination with the button "Delete all non-manual links" (below the table) - which will remove all URLs from the listing which do not have this setting active. That function will let the user return to a manual URL selection for a clean new crawl of the website. That makes sense if you have changed the navigation significantly and need to confirm that all URLs are in fact still visible through a crawler.

Include

This field determines if a URL is listed in the generated Google Sitemap file. By unselecting it the URL will continue to be known to the GSiteCrawler, but it will not be included in the Sitemap file. This can be used to manually remove URLs which would be double (eg "http://domain.com/" and "http://domain.com/index.htm") and is automatically changed when using the duplicate content filter (menu item "Statistics" / "Duplicate content"). Keep in mind that by removing a URL from the Sitemap file you are not removing it from Google's crawlers. You should instead use the normal robots exclusion methods to keep those URLs out of the search engine crawlers (robots.txt or robots meta-tag) - once they are correct, the GSiteCrawler will automatically keep those URLs out of the Sitemap file anyway.

Crawl

This field determines if the GSiteCrawler should crawl from that URL. It is usually activated for normal (X)HTML pages - anything that contains crawlable links. It can also be set automatically through the robots-meta-tag (using the "nofollow" attribute).

Priority

This is the relative priority used within the Google Sitemap file. A value of "1.00" is the maxiumu, "0.00" is the minimum. Please also see more information about the priority-attribute.

Freq

This is the change-frequency in days. This value is translated for the Google Sitemap file as follows:

0: always - please check this file all the time, changes might arrive from hour to hour
1-6: daily
7-29: weekly
30-364: monthly
365-998: yearly
999: never

When crawling, the GSiteCrawler will guess at this setting through the current date returned from the server. If the page in question is a dynamic page without date-information, the server will generally return "now" (the time when the request was placed) as the content-date, which is usually incorrect. Using proper code for IfModifiedSince handling will help the crawler recognize the effective content date and will help the search engine crawlers to determine when to re-crawl the URL in question.

URL

This is the URL of the page in question :-).