Post

Solving the Islands Problem in DAX

Solving the Island problem in DAX, to return a list of continuous Dates

A while ago the SQLBI guys release a blog providing a solution for Displaying a list of selected months. The input is a selection of filter month, and the output list these months, with contiguous sequences grouped together. This was done by looking at leading and lagging values to detect edges of continuous sequences. This problem falls into the classic Island Problem, and there is another nice way to solve this.

Simple Solution

The Island problem states that we want to group ranges of continuous sequences. There is also a similar but opposite Gaps problem, which detects missing values in a sequence. To solve the Island problem we can add a extra column with sequential integers, which increases at the same cadence.

If we have daily the sequential sequence would need to increase by 1 for each row. If we have weekly data the sequence would need to increase by 7.

We can calculate the difference between the two sequences. If there is no gap then the difference will be constant. If there is a gap the difference will change, and denoting a new island. This is demonstrated with this example.

DEFINE 
VAR DateFilter = 
	TREATAS(
		{
			DATE(2021, 11, 01),
			DATE(2022, 11, 02),
			DATE(2022, 11, 03),
			DATE(2022, 11, 06),
			DATE(2022, 11, 07),
			DATE(2022, 11, 08),
			DATE(2022, 11, 11),
			DATE(2022, 11, 12)	
		},
		'Dates'[Date]
	)

EVALUATE
CALCULATETABLE(
	ADDCOLUMNS(
		VALUES( Dates[Date] )
		,"@RowNumber"
			, RANK( DENSE, VALUES( Dates[Date] ), ORDERBY( Dates[Date], ASC ) )
		,"@Islands"
			,DATEDIFF(
				RANK( DENSE, VALUES( Dates[Date] ), ORDERBY( Dates[Date], ASC ) )
				,Dates[Date]
				,DAY 
			)
	)
	,DateFilter
)
Date@RowNumber@Islands
02/11/2022 00:00:00144866
03/11/2022 00:00:00244866
06/11/2022 00:00:00344868
07/11/2022 00:00:00444868
08/11/2022 00:00:00544868
11/11/2022 00:00:00644870
12/11/2022 00:00:00744870

As you can see there are 3 separate continuous sequences, and each group gets a distinct [@Island] value.

We can use this as a base for the final solution.

// Islands
VAR DataSequence = 
	ADDCOLUMNS(
		VALUES( Dates[Date] )
		, "@Islands"
		,DATEDIFF(
			RANK( DENSE, VALUES( Dates[Date] ), ORDERBY( Dates[Date], ASC ) )
			,Dates[Date]
			,DAY 
		)
	)
VAR Islands =	
	SUMMARIZE(
		DataSequence
		,[@Islands]
		,"IslandString"
			,VAR MinDate = FORMAT( MIN( Dates[Date] ), "mmm-yy" )
			VAR MaxDate = FORMAT( MAX( Dates[Date] ), "mmm-yy" )
			RETURN
			IF( MinDate <> MaxDate, MinDate & " - " & MaxDate, MaxDate )
	)
VAR result = CONCATENATEX ( islands, [IslandString], " | " )
RETURN
result

Initial Solution

Extended Solution

This works well for a given granularity, but what about if we want to show a mixture of selected days, months and years. We can extend our solution as follows.

// Island Extended
Islands = 
VAR DataSequence = 
	ADDCOLUMNS(
		SUMMARIZE(
			Dates
			,Dates[Year]
			,Dates[Month]
			,Dates[Date]
		)
		, "@Islands"
		,DATEDIFF(
			RANK( DENSE, VALUES( Dates[Date] ), ORDERBY( Dates[Date], ASC ) )
			,Dates[Date]
			,DAY 
		)
	)
VAR Islands =	
	SUMMARIZE(
		DataSequence
		,[@Islands]
		,"IslandString"
			,VAR MinDate =  MIN ( Dates[Date] )
			VAR MaxDate =   MAX ( Dates[Date] )
			VAR MinYear = 	CALCULATE( MIN( Dates[Date] ), ALLEXCEPT( Dates, Dates[Year] ))
			VAR MaxYear = 	CALCULATE( MAX( Dates[Date] ), ALLEXCEPT( Dates, Dates[Year] ))
			VAR MinMonth =  CALCULATE( MIN( Dates[Date] ), ALLEXCEPT( Dates, Dates[Month] ))
			VAR MaxMonth =  CALCULATE( MAX( Dates[Date] ), ALLEXCEPT( Dates, Dates[Month] ))
			VAR GranularityCheckStart = 
			SWITCH(
				true
				,MinDate = MinYear && MaxDate = MaxYear
					, CALCULATE( MAX( Dates[Year] ), Dates[Date] = MinDate )
				,MinDate = MinMonth && MaxDate = MaxMonth
					, FORMAT( CALCULATE( MAX( Dates[Month] ), Dates[Date] = MinDate ), "mmm-yy" )
				,MinDate
			)
			VAR GranularityCheckEnd = 
				SWITCH(
					true
					,MinDate = MinYear && MaxDate = MaxYear
						, CALCULATE( MAX( Dates[Year] ), Dates[Date] = MaxDate )
					,MinDate = MinMonth && MaxDate = MaxMonth
						, FORMAT( CALCULATE( MAX( Dates[Month] ), Dates[Date] = MaxDate ), "mmm-yy" )
					,MaxDate
				)
			return
			IF( GranularityCheckStart = GranularityCheckEnd, GranularityCheckStart, GranularityCheckStart  & " - " & GranularityCheckEnd )
	)
VAR result = CONCATENATEX ( islands, [IslandString], UNICHAR( 10 ), [@Islands], ASC )
RETURN
result

Extended Solution

This post is licensed under CC BY 4.0 by the author.