*banner
 

Machine Learning Module for Big Data Analysis in Kepler
Mai Nguyen, Daniel Crawl, Jianwu Wang, Ilkay Altintas

Citation
Mai Nguyen, Daniel Crawl, Jianwu Wang, Ilkay Altintas. "Machine Learning Module for Big Data Analysis in Kepler". Talk or presentation, 16, October, 2015; Presented at the Eleventh Biennial Ptolemy Miniconference, Berkeley.

Abstract
Kepler is a scientific workflow system that is built on the Ptolemy II framework. Kepler provides a graphical user interface that allows users to easily design scientific workflows by simply dragging and dropping actors implementing different operations and linking them together to create the steps necessary for a specific workflow. Machine learning techniques provide a way to analyze the problem being studied using a data-driven approach, and is an essential part of many scientific processes. The machine learning module in Kepler allows users to integrate machine learning functionality into a workflow, even if the machine learning algorithms are implemented on different tools or platforms. For example, an R script can be executed within Kepler using the RExpression actor. A Mahout algorithm or KNIME workflow can be executed in command-line mode using the ExternalExecution actor. For big data processing, actors implementing Spark MLLib algorithms are being developed in Kepler. Spark is a cluster computing framework, and MLlib is a distributed machine learning library on top of Spark. Spark’s distributed in-memory architecture provides fast and scalable processing of iterative operations, which is ideal for machine learning algorithms. The machine learning module in Kepler can also create an actor for a single machine learning algorithm based on different implementations. As an example, the kmeans-all actor in Kepler implements the k-means clustering algorithm using R, Spark MLlib, Mahout, and KNIME. This feature allows the user to compare accuracy and processing results for a single algorithm using different implementations. This can be accomplished using the same actor, with the only change being the choice of implementation when the actor is executed. The user does not need to know much about any of the underlying tools (e.g., R or Spark) in order to use these actors. Each actor in the machine learning module can also be connected to other actors available in Kepler to build complex workflows.

Electronic downloads

Citation formats  
  • HTML
    Mai Nguyen, Daniel Crawl, Jianwu Wang, Ilkay Altintas. <a
    href="http://chess.eecs.berkeley.edu/pubs/1123.html"><i>Machine
    Learning Module for Big Data Analysis in
    Kepler</i></a>, Talk or presentation,  16,
    October, 2015; Presented at the <a
    href="http://ptolemy.eecs.berkeley.edu/conferences/15/"
    >Eleventh Biennial Ptolemy Miniconference</a>,
    Berkeley.
  • Plain text
    Mai Nguyen, Daniel Crawl, Jianwu Wang, Ilkay Altintas.
    "Machine Learning Module for Big Data Analysis in
    Kepler". Talk or presentation,  16, October, 2015;
    Presented at the <a
    href="http://ptolemy.eecs.berkeley.edu/conferences/15/"
    >Eleventh Biennial Ptolemy Miniconference</a>,
    Berkeley.
  • BibTeX
    @presentation{NguyenCrawlWangAltintas15_MachineLearningModuleForBigDataAnalysisInKepler,
        author = {Mai Nguyen and Daniel Crawl and Jianwu Wang and
                  Ilkay Altintas},
        title = {Machine Learning Module for Big Data Analysis in
                  Kepler},
        day = {16},
        month = {October},
        year = {2015},
        note = {Presented at the <a
                  href="http://ptolemy.eecs.berkeley.edu/conferences/15/"
                  >Eleventh Biennial Ptolemy Miniconference</a>,
                  Berkeley},
        abstract = {Kepler is a scientific workflow system that is
                  built on the Ptolemy II framework. Kepler provides
                  a graphical user interface that allows users to
                  easily design scientific workflows by simply
                  dragging and dropping actors implementing
                  different operations and linking them together to
                  create the steps necessary for a specific
                  workflow. Machine learning techniques provide a
                  way to analyze the problem being studied using a
                  data-driven approach, and is an essential part of
                  many scientific processes. The machine learning
                  module in Kepler allows users to integrate machine
                  learning functionality into a workflow, even if
                  the machine learning algorithms are implemented on
                  different tools or platforms. For example, an R
                  script can be executed within Kepler using the
                  RExpression actor. A Mahout algorithm or KNIME
                  workflow can be executed in command-line mode
                  using the ExternalExecution actor. For big data
                  processing, actors implementing Spark MLLib
                  algorithms are being developed in Kepler. Spark is
                  a cluster computing framework, and MLlib is a
                  distributed machine learning library on top of
                  Spark. Sparkâs distributed in-memory
                  architecture provides fast and scalable processing
                  of iterative operations, which is ideal for
                  machine learning algorithms. The machine learning
                  module in Kepler can also create an actor for a
                  single machine learning algorithm based on
                  different implementations. As an example, the
                  kmeans-all actor in Kepler implements the k-means
                  clustering algorithm using R, Spark MLlib, Mahout,
                  and KNIME. This feature allows the user to compare
                  accuracy and processing results for a single
                  algorithm using different implementations. This
                  can be accomplished using the same actor, with the
                  only change being the choice of implementation
                  when the actor is executed. The user does not need
                  to know much about any of the underlying tools
                  (e.g., R or Spark) in order to use these actors.
                  Each actor in the machine learning module can also
                  be connected to other actors available in Kepler
                  to build complex workflows. },
        URL = {http://chess.eecs.berkeley.edu/pubs/1123.html}
    }
    

Posted by Christopher Brooks on 19 Oct 2015.
Groups: ptolemy
For additional information, see the Publications FAQ or contact webmaster at chess eecs berkeley edu.

Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright.

©2002-2018 Chess