Declarative statistical visualizations in Scala

What is Vegas?

Vegas is a Scala API for declarative, statistical data visualizations. With Vegas you can work with data files as well as Spark DataFrames and perform filtering, transformations and aggregations as the part of the plotting specification.

What is Declarative Visualization?

Instead of being imperative, where the user needs to massage the data set into the final form that is then visualized, declarative visualizations in Vegas work by simply specifying what needs to done to the data to produce the visualization. The implementation details are abstracted away from the user.

How is Vegas Statistical?

Vegas is inherently statistical as basic aggregations, group by operations and faceting are built into its declarative grammar.

How does it work?

Vegas works by compiling down Scala code into a strongly typed JSON specification under-the-hood, strictly adhering to the Vega-Lite specifications that is quickly becoming a powerful common language of choice for declarative data visualizations across many languages.


Here we illustrate some of the main concepts of plotting with Vegas with some examples. For a more comprehensive set of examples check out the Github repository.

A Simple Area Chart

Here, we illustrate several concepts. First, the , encodeX and encodeY directives map the two quantitative columns Acceleration and HorsePower to the X and Y axes. Then, we say that we want to bin the Acceleration variable, and for each bin, find the man Horsepower. Finally we tell Vegas to break up the dataset by the number of cylinders which is a categorical variable, via the encodeColor directive. Note the Vegas automatically does the binning and the aggregation, and produces the color-coding together with the legend.
Vegas("Sample Area Chart", width=800, height=600)
  .encodeX("Acceleration", Quantitative, bin=Bin())
  .encodeY("Horsepower", Quantitative, AggOps.Mean, enableBin=false)
  .encodeColor(field="Cylinders", dataType=Nominal)

A Binned Scatterplot

Vegas("Sample Binned Scatterplot", width=800, height=600)
  .encodeX("IMDB_Rating", Quantitative, bin=Bin(maxbins=10.0))
  .encodeY("Rotten_Tomatoes_Rating", Quantitative, bin=Bin(maxbins=10.0))
  .encodeSize(aggregate=AggOps.Count, field="*", dataType=Quantitative)

Scatterplot with binned color coding

Vegas("Sample Scatterplot", width=800, height=600)
  .encodeX("Horsepower", Quantitative)
  .encodeY("Miles_per_Gallon", Quantitative)
  .encodeColor(field="Acceleration", dataType=Quantitative, bin=Bin(maxbins=5.0))

A Multi-series Line Chart.

Vegas("Sample Multi Series Line Chart", width=800, height=600)
    .withURL(Stocks, formatType=DataFormat.Csv)
    .encodeX("date", Temp)
    .encodeY("price", Quant)
       legend=Legend(orient="left", title="Stock Symbol"))
    .encodeDetailFields(Field(field="symbol", dataType=Nominal))