DiploCloud: Efficient and Scalable Management of RDF Data in the Cloud

Despite recent advances in distributed RDF data management, processing large-amounts of RDF data in the cloud is still very challenging. In spite of its seemingly simple data model, RDF actually encodes rich and complex graphs mixing both instance and schema-level data. Sharding such data using classical techniques or partitioning the graph using traditional min-cut algorithms leads to very inefficient distributed operations and to a high number of joins. In this paper, we describe DiploCloud, an efficient and scalable distributed RDF data management system for the cloud. Contrary to previous approaches, DiploCloud runs a physiological analysis of both instance and schema information prior to partitioning the data. In this paper, we describe the architecture of DiploCloud, its main data structures, as well as the new algorithms we use to partition and distribute data. We also present an extensive evaluation of DiploCloud showing that our system is often two orders of magnitude faster than state-of-the-art systems on standard workloads.

INTRODUCTION

As large enterprises look to cut the costs of data infrastructure, cloud computing becomes increasingly popular. Cloud computing technologies can use commodity class hardware to manage and retrieve large amounts of data, creating solutions that are affordable. Similarly, the semantic web is increasingly popular as the framework for exchanging knowledge efficiently. The semantic web technologies are governed by the World Wide Web Consortium (W3C). The most prominent standards are Resource Description Framework1 (RDF) and SPARQL Protocol and RDF Query Language2 (SPARQL). RDF is the standard for storing and representing data and SPARQL is a query language to retrieve data from an RDF store. The power of these Semantic Web technologies can be successfully harnessed in a cloud computing environment to provide the user with capability to efficiently store and retrieve data for data intensive applications. Scalability is the predominant challenge in the semantic web as datasets continue to grow larger, and more datasets are integrated and linked. RDF graphs are becoming huge and graph patterns are becoming more complex. At present, existing semantic web frameworks are not sufficiently scalable. A cloud computing solution can be built to overcome these scalability and performance problems. Companies pioneering cloud computing such as Salesforce.com and Amazon have platforms such as EC23, S34, Force.com5 etc. These are proprietary, closed source platforms. However, Hadoop6 is an emerging Cloud Computing tool which is open source and supported by Amazon, the leading Cloud Computing hosting company. It is a distributed file system where files can be saved with replication, and would be an ideal candidate for building a storage system. Hadoop features high fault tolerance and great reliability. In addition, it also contains an implementation of the MapReduce [6] programming model, a functional programming model which is suitable for the parallel processing of large amounts of data. By partitioning data into a number of independent chunks, MapReduce processes run on these chunks, making parallelization easier.

EXISTING SYSTEM:

According to researchers at Berkeley, trust and security are ranked one of the top 10 obstacles for the adoption of cloud computing. Indeed, Service-Level Agreements (SLAs). Consumers’ feedback is a good source to assess the overall trustworthiness of cloud services. Several researchers have recognized the significance of trust management and proposed solutions to assess and manage trust based on feedbacks collected from participants.

DISADVANTAGES OF EXISTING SYSTEM:

Guaranteeing the availability of TMS is a difficult problem due to the unpredictable number of users and the highly dynamic nature of the cloud environment.
A Self-promoting attack might have been performed on cloud service sy, which means sx should have been selected instead.
Disadvantage a cloud service by giving multiple misleading trust feedbacks (i.e., collusion attacks)
Trick users into trusting cloud services that are not trustworthy by creating several accounts and giving misleading trust feedbacks (i.e., Sybil attacks).

PROPOSED SYSTEM:

Cloud service users’ feedback is a good source to assess the overall trustworthiness of cloud services. In this paper, we have presented novel techniques that help in detecting reputation based attacks and allowing users to effectively identify trustworthy cloud services.
We introduce a credibility model that not only identifies misleading trust feedbacks from collusion attacks but also detects Sybil attacks no matter these attacks take place in a long or short period of time (i.e., strategic or occasional attacks respectively).
We also develop an availability model that maintains the trust management service at a desired level. We also develop an availability model that maintains the trust management service at a desired level.

ADVANTAGES OF PROPOSED SYSTEM:

TrustCloud framework for accountability and trust in cloud computing. In particular, TrustCloud consists of five layers including workflow,
Propose a multi-faceted Trust Management (TM) system architecture for cloud computing to help the cloud service users to identify trustworthy cloud service providers.

SYSTEM ARCHITECTURE:

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

System : Pentium IV 2.4 GHz.
Hard Disk : 40 GB.
Floppy Drive : 44 Mb.
Monitor : 15 VGA Colour.
Mouse :
Ram : 512 Mb.

SOFTWARE REQUIREMENTS:

Operating system : Windows XP/7.
Coding Language : JAVA/J2EE
IDE : Netbeans 7.4
Database : MYSQL

DiploCloud: Efficient and Scalable Management of RDF Data in the Cloud DiploCloud: Efficient and Scalable Management of RDF Data in the Cloud DiploCloud: Efficient and Scalable Management of RDF Data in the Cloud DiploCloud: Efficient and Scalable Management of RDF Data in the Cloud PPT REPORT BASEPAPER