Parallel Examples


Table of contents

  1. Examples
    1. Filtering Operators
    2. Aggregating, Ordering
    3. Conditionals
    4. Grouping
    5. Window Operations
  2. Upcoming

Examples

The sample programs of each table link to examples within Apache Spark programs. The programs explore


Filtering Operators

The examples do not yet include (a) the logical operators not, or, and, and (b) the filtering operators distinct, fetch.

sample programscomment
where, filter, limit Explicit filtering operators.
like, in, between,
is null
Logical operators.
$=$, $\neq$, $\gt$, $\lt$,
$\ge$, $\le$
Relational operators.


Aggregating, Ordering

sample programscomment
count(), sum(), avg(),
min(), max()
For aggregating
order by  


Conditionals

sample programscomment
case statement Read more about the case statement


Grouping

sample programscomment
group by, having Dataset[Row] does not have a having function, instead there are effective proxy functions. Beware of the SQL query structure w.r.t. using Spark SQL having
roll up For hierarchical arithmetic.


Window Operations

sample programscomment
sum().over()  
rank(), dense_rank() Read more about rank() and dense_rank()
row_number() Read more about row_number()



Upcoming

  • Combinatorial queries. Via $cube()$
  • Joins. More join examples, e.g., right outer join, full outer join, left semi join filters the left table w.r.t. keys present in the right table, left anti join filters the left table for records that are NOT present in the right table
  • Pivoting. Pivoting via Dataset objects is explicit and elegant. The DataReconfiguration class, which reconfigures the data used by the buildings project/module, uses the Dataset pivot function for a reconfiguration step.