The official PASS Blog is where you’ll find the latest blog posts from PASS community members and the PASS Board. Contributors share their thoughts and discuss a wide variety of topics spanning PASS and the data community.

Azure Databricks from PyCharm IDE

Azure Databricks is a powerful platform for data pipelines using Apache Spark.  It provides the power of Spark’s distributed data processing capabilities with many features that make deploying and maintaining a cluster easier, including integration to other Azure components such as Azure Data Lake Storage and Azure SQL Database.  If you have tried out tutorials for Databricks you likely created a notebook, pasted some Spark code from the example, and the example ran across a Spark cluster as if it were magic.  Notebooks are useful for many things and Azure Databricks even lets you schedule them as jobs.  But, when developing a large project with a team of people that will go through many versions, many developers will prefer to use PyCharm or another IDE (Integrated Development Environment).  Getting to a streamlined process of developing in PyCharm and submitting the code to a Spark cluster for testing can be a challenge and I have been searching for better options for years.

Read more

RSS
Back to Top
cage-aids
cage-aids
cage-aids
cage-aids