Due to the large size of the Web, users increasingly rely on specialized tools to navigate through the vast volumes of data, and a number of search engines, directories, and other IR tools have been built to ??ll this need. While there is a plethora of smaller specialized engines and directories, the main part of the search infrastructure of the web is supplied by a handful of large crawl-based search engines, such as Google, Inktomi, AltaVista, and a few others. Such search engines are typically based on scalable clusters, consisting of a large number of low-cost servers located at one or a few locations and connected by high-speed local area or system area networks [7]. A lot of work has focused on optimizing performance on such architectures, which support up to tens of thousands of user queries per second on thousands of machines. The last few years have also seen an explosion of activity in the area of peer-to-peer (P2P) systems, i.e., highly distributed computing or service substrates built from thousands or even millions of typically nondedicated nodes across the internet that may join or leave the system at any time. Examples range from widely used unstructured ad-hoc communities such as Napster, Gnutella, and FreeNet to recent academic work on scalable and highly structured peer-to-peer substrates such as Chord [45], Tapestry [53], Pastry [42], or CAN [38] that can support a variety of applications. From the perspective of search engines and large-scale IR this development raises two interesting issues. First, since an increasing amount of content now resides in P2P networks, it becomes necessary to provide search facilities within P2P networks. Second, the signi??cant computing resources provided by a P2P system could also be used to implement search and data mining functions for content located outside the system, e.g., for search and mining tasks across large intranets or global enterprises, or even to build a P2P-based alternative to the current major search engines. This second issue can be seen in the context of the following more general question: Which of the Giant Scale Services [7] currently provided by cluster-based architectures can and should be provided by more highly distributed or P2P systems? It has been established that applications such as the sharing of large static ??les can be very ef??ciently implemented in a P2P environment. However, other applications that, e.g., involve frequent updates to massive data, are more challenging, and may turn out to be more appropriately implemented on clusters or on highly-robust distributed systems of dedicated nodes with limited changes in topology (due to faults, or nodes joining or leaving). In this paper, we describe a prototype system called ODISSEA (Open DIStributed Search Engine Architecture) that is currently under development in our group. ODISSEA attempts to address both of the above issues, by providing a “distributed global indexing and query execution service” that can be used for content residing inside or outside of a P2P network. ODISSEA is different in several ways from many other approaches to P2P search, as explained below. It encounters some basic challenges typical of those that arise when implementing more dynamic applications involving frequent updates on P2P systems, leading to interesting algorithmic problems and solutions. We describe and discuss the basic design choices and motivation and give some initial results, with focus on the issue of ef??cient distributed query processing. 1.1 ODISSEA Design Overview ODISSEA is a distributed global indexing and query execution service, i.e., a system that maintains a global index structure under document insertions and updates and node joins and failures, and that executes simple but general classes of search queries in an ef??cient manner. This system provides the lower tier of a proposed two-tier search infrastructure. In the upper tier, there are two classes of clients that interact with this P2P-based lower tier: 1. Update clients insert new or updated documents into the system, which stores and indexes them. An update client could be a crawler inserting crawled pages, or a web server pushing documents into the index, or a node in a ??le sharing system.
- httpwww5577°ùoía > ODISSEA: A Peer-to-Peer Architecturefor Scalable Web Search ...
-
ODISSEA: A Peer-to-Peer Architecturefor Scalable Web Search ...
下载该文档 文档格式:PDF 更新时间:2012-12-01 下载次数:0 点击次数:2
- 下载地址 (推荐使用迅雷下载地址,速度快,支持断点续传)
- PDF格式下载
- 更多文档...
-
上一篇:2 0 0 6 G E N E R A L C ATA L O G2006总目录
下一篇:...LEAGUE OF NEW YORKS I T U AT E D T E C H N O L O G I E S ...
点击查看更多关于httpwww5577°ùoía的相关文档
- 您可能感兴趣的
- q铆ngb霉z矛j墨n q铆ngb霉z矛j矛n www.ok5577.com 5577yx 5577aa.com 5577tk百合图库 www.5577b.com
- 大家在找
-
- · 厦大大一期末考试试卷
- · 电子设计大赛小车
- · 信访工作领导小组
- · 农业部“美丽乡村”创建试点乡村名单
- · 禽生产pdf下载
- · 生态环境保护ppt
- · 计算机2级vb题库
- · error2038
- · 柏拉图式爱情的意思
- · dnf55粉武器大全图片
- · 公共关系专题活动ppt
- · 征途2仙兽谷在哪
- · 西西记忆炫舞挂下载
- · autocad从入门到精通
- · 液位控制系统
- · 华东交大选课系统
- · 新还珠格格79集
- · 高中会考微机
- · 员工转正ppt模板
- · yy电影频道id你懂的
- · 吉林玉米秸秆颗粒机
- · 反腐倡廉学习心得体会
- · win7上安装cad2010
- · 真空爱情记录
- · 上海会计上岗证培训
- · 中级花卉园艺工
- · 西门子300plc教程
- · cnc.qq.comdjqbo
- · 机械迷城第6关怎么过
- · 交通事故撞车图片大全
- 赞助商链接