As of the Dance/Update that began in mid-May 2003, Google has changed the way that it does its Main and Fresh crawls, and its process of incorporating new pages into the index. Until the new processes are better understood, the information below should be ignored.
Google’s Fresh Crawl explained
Google does two types of crawl:- the main crawl and the fresh crawl. The main crawl is done once a month; the fresh crawl is done more-or-less daily, but only some pages are crawled. Google is still experimenting with which sites and pages to crawl and how deep to crawl. Neither type of crawl puts any new pages into Google’s main index. That only happens at the next update – at the conclusion of the next Google Dance. Fresh crawls can be distinguished from main crawls by the IP addresses used by Googlebot. Fresh crawl: 64.68.82…; Main crawl: 216.239.46…
The fresh crawl recrawls pages that are already in the index, picking up new pages along the way. Fresh-crawled new pages are evaluated in some way and inserted into the search results straight away, which means that new pages can be found by surfers almost immediately, even though they are not yet in Google’s main index. A new page can be added to a site today and traffic could start arriving on it within hours.
Also, updated pages that are already in Google’s main index, are re-evaluated in some way and inserted into the search results in places that reflect the changes. E.g. the day after the link to this site’s SEO Copywriting page was placed on the index page, the index page showed up at #3 for the search term “seo copywriting”. The index page was well established in Google’s main index, but the SEO copywriting part of it was new, and was given the “fresh” treatment. Very soon after that, the SEO copywriting page itself was ‘fresh’ ranked at #1.
This is good news for surfers and webmasters, although some websites can suffer for a while due to fresh-crawled new pages pushing them down the rankings.
In practise, many new fresh-crawled pages enjoy a flury of traffic while they are not in the main index. When they have been included in the main index, they take their place in the rankings according to their evaluated merit, and the traffic tends to be reduced unless the page actually merits its ‘fresh’ ranking, of course.
At the time of writing, the fresh crawl is still new, but my theory of the experience of a new page is this:-
Sometime during a month, the new page is found by Google and fresh-crawled. It is evaluated in some way and placed in a ‘fresh’ index. From there it is inserted into the rankings, according to its ‘fresh’ evaluation.
The page is involved in the next end-of-month dance but, because it hasn’t yet been main-crawled, it isn’t included in the actual update and isn’t placed in the main index. It continues to be a ‘fresh’ page.
Then the main crawl gets underway. If the page still exists, it is crawled and will be included in the following update, when it will enter the main index. During this period, it may keep the ‘fresh’ ranking that it achieved provided that other new pages don’t come along to push it down. It is only after the page enters the main index that it’s true ranking is seen.
Because of the page’s revised evaluation when entering the main index, traffic from it is likely to drop. That’s assuming that the page didn’t really merit its ‘fresh’ ranking.
It should be noted that Google is continually updating the rankings and ‘fresh’ rankings are very volatile in that they come, go and change during a page’s ‘fresh’ period.
As I said, the fresh crawl is still quite new and not yet fully understood. The experience of a new page from fresh crawl to main index is what I believe I have observed, but my conclusions could easily be wrong. The reason I believe that new pages don’t enter the main index until the dance and update after they have been main-crawled, even though they have usually been involved in one dance, is because Google still shows no links to them until after the update following their first main crawl. This is my theory of a new page’s experience but, like any theory, it may need to be revised in the light of new observations.
Addendum
As of the New Year 2003 update, Google is applying Toolbar PR0 (zero PageRank) values to some new pages. PR0 normally indicates that a page has been penalized, but these PR0s are not penalties. From my observations, it appears that the values apply to pages that have been fresh-crawled and have gone through an update following the fresh-crawl. Such pages don’t get into the main index until after they have been main-crawled and gone through the update after that. It appears that, between the two updates, Google applies PR0 to the pages.
The reason for it may be to do with how Google inserts ‘fresh’ pages into the rankings or it may be for some other reason entirely. Also, it may be that different PR values are applied to different pages, but it is brand new and, as yet, I have seen only PR0 values applied.