Kepler scientific workflow system

Kepler scientific workflow system

Kepler is a free-software system for designing, executing, and sharing scientific workflowsLudäscher B., Altintas I., Berkley C., Higgins D., Jaeger-Frank E., Jones M., Lee E., Tao J., Zhao Y. 2006. Scientific Workflow Management and the Kepler System. Special Issue: Workflow in Grid Systems. Concurrency and Computation: Practice & Experience 18(10): 1039-1065.] Altintas I, Berkley C, Jaeger E, Jones M, Ludäscher B, Mock S. 2004. Kepler: An Extensible System for Design and Execution of Scientific Workflows. Proceedings of the The Future of Grid Data Environments, Global Grid Forum 10.] Michener, William K., James H. Beach, Matthew B. Jones, Bertram Ludaescher, Deana D. Pennington, Ricardo S. Pereira, Arcot Rajasekar, and Mark Schildhauer. 2007. "A Knowledge Environment for the Biodiversity and Ecological Sciences", Journal of Intelligent Information Systems, 29(1): 111-126. Doi: 10.1007/s10844-006-0034-8] . Workflows in general, and scientific workflows in particular, are directed graphs where the nodes represent discrete computational components, and the edges represent paths along which data and results can flow between componentsTaylor, I.J.; Deelman, E.; Gannon, D.B.; Shields, M. (Eds.), “Workflows for e-Science: Scientific Workflows for Grids”, 530 p., Springer. ISBN 978-1-84628-519-6.] . In Kepler, the nodes are called 'Actors' and the edges are called 'channels'. Kepler includes a graphical user interface for composing workflows in a desktop environment, a runtime engine for executing workflows within the GUI and independently from a command-line, and a distributed computing option that allows workflow tasks to be distributed among compute nodes in a computer cluster or computing grid. The Kepler system principally targets the use of a workflow metaphor for organizing computational tasks that are directed towards particular scientific analysis and modeling goals. Thus, Kepler scientific workflows generally model the flow of data from one step to another in a series of computations that achieve some scientific goal.

Access to scientific data

Kepler provides direct access to scientific data that has been archived in many of the commonly used data archives. For example, Kepler provides access to data stored in the Knowledge Network for Biocomplexity (KNB) Metacat serverJones, Matthew B., C. Berkley, J. Bojilova, M. Schildhauer. 2001. Managing Scientific Metadata. IEEE Internet Computing 5 (5): 59-68.] and described using Ecological Metadata Language. Additional data sources that are supported include data accessible using the DiGIR protocol, the OPeNDAP protocol, GridFTP, JDBC, SRB, and others.

Models of Computation

Kepler differs from many of the other bioinformatics workflow management systems in that it separates the structure of the workflow model from its model of computation, such that different models for the computation of the workflow can be bound to a given workflow graph. Kepler inherits several common models of computation from the Ptolemy system, including Synchronous Data Flow (SDF), Continuous Time (CT), Process Network (PN), and Dynamic Data Flow (DDF), among others.

Hierarchical workflows

Kepler supports hierarchy in workflows, which allows complex tasks to be composed of simpler components. This feature allows workflow authors to build re-usable, modular components that can be saved for use across many different workflows.

Workflow semantics

Kepler provides a model for the semantic annotation of workflow components using terms drawn from an ontology. These annotations support many advanced features, including improved search capabilities, automated workflow validation, and improved workflow editing.Berkley, Chad, Shawn Bowers, Matthew B. Jones, Bertram Ludaescher, Mark Schildhauer, Jing Tao. 2005. Incorporating Semantics in Scientific Workflow Authoring. 17th International Conference on Scientific and Statistical Database Management. IEEE Computer Society.]

Sharing workflows

Kepler components can be shared by exporting the workflow or component into a Kepler Archive (KAR) file, which is an extension of the JAR file format from Java. Once a KAR file is created, it can be emailed to colleages, shared on web sites, or uploaded to the Kepler Component Repository. The Component Repository is centralized system for sharing Kepler workflows that is accessible via both a web portal and a web service interface. Users can directly search for and utilize components from the repository from within the Kepler workflow composition GUI.

Kepler History

The Kepler Project was created in 2002 by members of the Science Environment for Ecological Knowledge (SEEK) project and the Scientific Data Management (SDM) project. The project was founded by researchers at the National Center for Ecological Analysis and Synthesis (NCEAS) at the University of California, Santa Barbara and the San Diego Supercomputer Center at the University of California, San Diego. Kepler extends Ptolemy II, which is a software system for modeling, simulation, and design of concurrent, real-time, embedded systems developed at UC Berkeley. Collaboration on Kepler quickly grew as members of various scientific disciplines realized the benefits of scientific workflows for analysis and modeling and began contributing to the system. As of 2008, Kepler collaborators come from many science disciplines, including ecology, molecular biology, genetics, physics, chemistry, conservation science, oceanography, hydrology, library science, computer science, and others.

Kepler FAQs

General Q's
Q: My analyses often require the same basic components. How can I create a workflow template that includes these?
A: Create a workflow that includes all the basic components and save it with an intuitive name, such as, "ANOVAtemplate.xml". To begin a new workflow based on your ANOVA template, open Kepler, on the "File" menu choose "Open File", navigate to the directory in which you saved ANOVAtemplate.xml and select it. Then, immediately choose "Save As..." from the "File" menu and save the workflow under a more specific name, such as "ANOVA_date_project.xml". This leaves ANOVAtemplate.xml unchanged and ready to serve as a template the next time you need it.

Director Q's

Q: Why doesn't my workflow ever finishing executing?
A: By default the workflow director's "iterations" are set to 0, which indicates "loop indefinitely." To change this, right-click on the director, choose "Configure Director" and change "iterations" from 0 to 1 for one iteration, or to "n" for "n" iterations, then push the "Commit" button.

Q: Why do I get the "SDF scheduler found disconnected actors!" error message?
A: The SDF Director does not expect unconnected workflow components. During workflow development, however, it can be convenient to disconnect one actor and connect another. To make the SDF Director allow this, right-click the Director, choose "Configure Actor" and check the box beside "allowDisconnectedGraphs", the push the "Commit" button.

RExpression Actor Q's

Q: How do I keep the R coding window of the RExpression actor open while running my workflow?
A: Right-click on the RExpression actor on your workflow and choose "Open Actor" (Ctrl-L) from the menu. When you are finished making changes to your R-script, choose "Save" (Ctrl-S) from the "File" menu. Then, push the "Run or Resume" button on the workflow toolbar (Ctrl-R) to run the workflow and see the results of your changes.

Graphing Q's
Q: Must I connect a graphing actor to my RExpression actor in order to see graphical output? A: No. Right-click on the RExpression actor, choose "Configure Actor" and check the box beside, "Automatically display graphics." Kepler will save the graphic as a pdf file in a temporary directory and open your default pdf viewer to display it.
Q: Why are some of my x-axis labels missing?
A: The ImageJ actor generates *.pngand *.pdf files, with default height and width equal to 480x480 pixels. If some of your x-axis labels are long, they may be excluded from the plot. There are several ways to fix this. First, try changing to the other graphics file format (i.e., right-click the RExpression actor, choose "Configure Actor", click the drop-down box beside "Graphics Format", and select the one not currently selected. Re-run your workflow. If that doesn't fix the problem, try changing the dimensions of the graphics file. To do so, right-click the RExpression actor, choose "Configure Actor", and change the "Number of X pixels in image" (or, "Number of Y pixels in image") to a new value. The default generates a square image. Some other common height:width relationships are y/x=2/3, y/x=1/sqrt(3), and y/x=2/(1+sqrt(5)), the latter being the Golden Ratio. Of course, there are aesthetic limits to stretching axes, so if none of these remedies work, you can always try abbreviating your x-axis category labels.

References

External links

Kepler Project website: [http://kepler-project.org]
Kepler Component Repository: [http://library.kepler-project.org/] Ptolemy II project website: [http://ptolemy.eecs.berkeley.edu/ptolemyII/] Knowledge Network for Biocomplexity (KNB) Data archive: [http://knb.ecoinformatics.org] The Golden Ratio on Wikipedia: [http://en.wikipedia.org/wiki/Golden_ratio]

See also

*VisTrails
*Bioinformatics workflow management systems


Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • Kepler (disambiguation) — Kepler may refer to: *Johannes Kepler, a key figure in the scientific revolution. Named after him are:Ideas: *Kepler s laws of planetary motion, a set of calculations for the orbits of planets *The Kepler Poinsot polyhedra, a set of geometrical… …   Wikipedia

  • Bioinformatics workflow management systems — A bioinformatics workflow management system is a specialized form of workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or a workflow, in a specific domain of science,… …   Wikipedia

  • Discovery Net — is one of the earliest examples of a scientific workflow system allowing users to coordinate the execution of remote services based on Web service and Grid Services (OGSA and Open Grid Services Architecture) standards. The system was designed and …   Wikipedia

  • VisTrails — Infobox Software name = VisTrails caption = collapsible = author = developer = University of Utah released = latest release version = 1.1 latest release date = May 16, 2008 latest preview version = latest preview date = frequently updated =… …   Wikipedia

  • DataONE — Data Observation Network for Earth (DataONE)[1] is a project supported by the National Science Foundation under the DataNet program. DataONE will provide scientific data archiving for ecological and environmental data produced by scientists… …   Wikipedia

  • Ptolemy Project (computing) — Infobox Software name = Ptolemy II caption = developer = University of California, Berkeley latest release version = 7.0.1 latest release date = 2008 04 04 operating system = Linux, Solaris, Windows genre = Model based design, Visual programming… …   Wikipedia

  • EUFORIA project — EUFORIA (EU Fusion fOR Iter Applications) is a project funded by European Union under the [http://cordis.europa.eu/fp7 Seventh Framework Programme] (FP7) which will provide a comprehensive framework and infrastructure for core and edge transport… …   Wikipedia

  • Flow-based programming — In computer science, flow based programming (FBP) is a programming paradigm that defines applications as networks of black box processes, which exchange data across predefined connections by message passing. These black box processes can be… …   Wikipedia

  • Exception handling — is a programming language construct or computer hardware mechanism designed to handle the occurrence of exceptions, special conditions that change the normal flow of program execution. Programming languages differ considerably in their support… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”