Maximum database size for crawler 2GB 2GB
爬虫的最大数据库大小2GB 2GB
Use a site map to lead the crawler around your site.
使用站点地图引导爬行器遍历您的站点。
Define the crawler subdirectories, as illustrated in Figure 8.
定义爬虫子目录,如图8所示。
This article took you through the task of creating a Web crawler by.
本文向您介绍了创建Webcrawler的过程。
Notice also that only the base URI string is added to the crawler.
另外请注意,只有基URI字符串被添加到了 crawler。
She still needs to prove herself before she can get with this wall-crawler.
她在和攀墙者走到一起之前还需要证明她自己。
Click on the Edit button in the query_statistic line to move to the crawler TAB.
单击query_statistic行的Ededit按钮,移向爬虫选项卡。
The basic design of this crawler is to load the first link to check onto a queue.
这个爬虫的基本设计是加载第一个链接并将其放入一个队列。
The CollectUrls Web crawler program takes advantage of a fixed-size thread pool.
CollectUrlsWebcrawler程序利用一个固定大小的线程池。
The next step is to make a sample announcement document available to the crawler.
下一步是为爬行器提供一个样例通知文档。
The following steps show you how to create the collection and the crawler, and to get it started
以下步骤演示如何创建集合和爬虫,准备开始
This information helps the Web crawler determine what the set of pages is and when to crawl them.
这一信息能帮助web爬虫程序决定要爬行哪些页面以及爬行的时间。
Select UNIX file system as the crawler type, as shown in Figure 6, and then click on the Next button.
选择UNIXfilesystem作为爬虫类型,如图6 所示,然后单击Next 按钮。
To expand the Web crawler, consider collecting image references or searching for specific text strings.
要扩展该Webcrawler,可以考虑收集图像引用或搜索特定的文本字符串。
Instead of converting your entire site to static URLs, pick the pages you want to index by a crawler.
代替将整个站点转换为静态url,您可以挑选一些希望爬行器建立索引的页面。
What you need to do, in short, is to generate a list of page references (URLs) for a crawler to fetch.
简而言之,我们需要做的就是生成一个页面引用列表(URL),爬虫程序通过这个列表获取信息。
Provides an entry point for the search engine crawler to easily follow the links within your Web pages.
为搜索引擎爬网程序提供入口点,以使爬网程序轻松地跟踪您的Web页面内的链接。
It's one of the most important signals your page offers to a crawler, so why not include a local signal?
这是你的页面向搜索引擎爬虫提供的最为重要的信息之一,所以为什么不提供一些本地化的信息呢?
Each search engine has its own automated program called a "web spider" or "web crawler" that crawls the web.
每个搜索引擎都有自己爬行网页的自动化程序,叫做“网络蜘蛛(web spider)”或“网络爬虫(web crawler)”。
The Sitemaps 0.90 protocol offers the option of a Sitemaps index file to be provided to the crawler as well.
Sitemaps 0.90提供了一个选项,可以把Sitemapsindex文件提供给爬虫程序。
First, let's look at how crawler-based search engines work (both Google and Yahoo fall in this category).
首先,让我们看看基于爬虫(crawler - based)的搜索引擎是如何工作的(Google和Yahoo都是这种类型)。
A sample scenario is based on a scheduled crawler for a website with regularly updated announcement pages.
一个样例场景就是包含定期更新的通知页面的网站的调度爬行器。
It USES a scheduler to initiate periodic events, such as crawler executions and full-text index maintenance.
它使用一个调度程序来发起周期性的事件,比如爬网程序执行和全文索引维护。
Next, navigate to the crawler details page and click 'Start full recrawl', as shown at the bottom of Figure 3.
接下来,导航到爬行器的细节页面并单击“Startfull recrawl”,如图3底部所示。
Use the OmniFind database crawler wizard to configure a crawler to access the VSAM content through the nickname.
使用OmniFind数据库crawler向导来配置一个crawler,从而通过昵称访问VSAM内容。
We can see that this file-based discovery complements UDDI, and may be used in a crawler-like fashion by clients.
我们可以看到,这种基于文件的发现对uddi是一个补充,而且可以被客户以类似crawler的方式使用。
Define the crawler name (UNIX file system crawler 1, for example), as shown in Figure 7, and then click on the Next button.
定义爬虫名称(例如,UNIXfilesystemcrawler1),如图7所示,然后单击Next按钮。
E-mail harvesting can be one of the easiest crawling activities, as you'll see in the final crawler example in this article.
E - mail收集可能是最容易的一种爬行行为,在本文中最后一个爬虫例子中我们会看到这一点。
The mighty Crawler, which ferried Shuttles to the launchpad, will be reduced to hauling more terrestrial freight around the Space Center.
把航天飞机运到发射台的巨大的“爬行者”,将重新做航天中心附近更多的陆上牵引。
The mighty Crawler, which ferried Shuttles to the launchpad, will be reduced to hauling more terrestrial freight around the Space Center.
把航天飞机运到发射台的巨大的“爬行者”,将重新做航天中心附近更多的陆上牵引。
应用推荐