This paper considers the problem of determinizing probabilistic data to enable such data to be stored in legacy systems that accept only deterministic input. Probabilistic data may be generated by automated data analysis/enrichment techniques such as entity resolution, information extraction, and speech processing. The legacy system may correspond to pre-existing web applications such as Flickr, Picasa, etc. The goal is to generate a deterministic representation of probabilistic data that optimizes the quality of the end-application built on deterministic data. We explore such a determinization problem in the context of two different data processing tasks—triggers and selection queries. We show that approaches such as thresholding or top-1 selection traditionally used for determinization lead to suboptimal performance for such applications. Instead, we develop a query-aware strategy and show its advantages over existing solutions through a comprehensive empirical evaluation over real and synthetic datasets.


  • Uploading & Downloading Algorithms
  • Searching Algorithms.


Determinizing Probabilistic Data. While we are not aware of any prior work that directly addresses the issue of determinizing probabilistic data as studied in this paper, the works that are much related to ours is this project. They explore how to determinized answers to a query over a probabilistic database. In contrast, we are interested in best deterministic representation of data (and not that of an answer to a query) so as to continue to use existing end-applications that take only deterministic input. The differences in the two problem settings lead to different challenges. Authors in address a problem that chooses the set of uncertain objects to be cleaned, in order to achieve the best improvement in the quality of query answers. However, their goal is to improve quality of single query, while ours is to optimize quality of overall query workload.


A variety of advanced probabilistic data models have been proposed in the past. Our focus however was determinizing probabilistic objects, such as image tags and speech output, for which the probabilistic attribute model suffices. We note that determining probabilistic data stored in more advanced probabilistic models such as and/or tree might also be interesting. Extending our work to deal with data of such complexity remains an interesting future direction of work. There are several related research efforts that deal with the problem of selecting terms to index document for document retrieval. A term-centric pruning method described in retains top postings for each term according to the individual score impact that each posting would have if the term appeared in an adhoc search query. Authors in propose a scalable term selection for text categorization, which is based on coverage of the terms. The focus of these research efforts is on relevance – that is, getting the right set of terms that are most relevant to document. In our problem, a set of possibly relevant terms and their relevance to the document are already given by other data processing techniques. Thus, our goal is not to explore the relevance of terms to documents, but to select keywords from the given set of terms to represent the document, such that the quality of answers to triggers/queries is optimized.


  • User
  • Admin


  1. New user can register then only Login to Application.
  2. Register completed, then admin approved user only to login.
  3. Admin verification mail will be sending your mail ID.
  4. Then only user can login to home page.
  5. User first we search the image file.
  6. We use any of queries that processed to get output in Grid view.
  7. Grid view shows related search items. User selects that particular image then Download it.
  8. Also user can change our Password and Update their Details.
  9. Home Page has about the project Details.


  1. Admin verify the register details and approve the Users.
  2. Upload the Images to Cloud and give different titles for the images.
  3. Admin views User Details and can lock the particular User.
  4. Admin views the Upload files Details and can delete the unwanted files to Cloud.
  5. Chart view shows the user performance in this application.
  6. Chart shows the File searching and execution time.
  7. Chart shows the number of the searching images.
  8. Chart shows the number of images user downloaded to Cloud.
  9. Chart shows the efficient and more accessing images details.
  10. All over user actions and performance view an admin.