About simulating First/Last aggregate functions in MySQL

By Joao

I came across an interesting post at Xaprb.com on selecting the first and last row of a group in MySQL.  This solution builds on a comment in that post about using GROUP_CONCAT to achieve this.

As a practical example of the usage of the first and last aggregate functions, consider an application converting financial data between different timeframes.

The table structure might look like this:

CREATE TABLE EURUSD_3600 (
    datetime datetime NOT NULL,
    open float NOT NULL,
    low float NOT NULL,
    high float NOT NULL,
    close float NOT NULL,
    PRIMARY KEY (datetime)
);

Assuming data in the hourly  timeframe (3600 is the number of seconds in one hour):

INSERT INTO EURUSD_3600 SELECT '2001-01-03 00:00:00',0.9507,0.9505,0.9509,0.9506;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 01:00:00',0.9506,0.9492,0.951,0.9496;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 02:00:00',0.9496,0.9495,0.9509,0.9505;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 03:00:00',0.9504,0.9498,0.9508,0.9499;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 04:00:00',0.9499,0.9499,0.9507,0.9503;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 05:00:00',0.9503,0.9503,0.9506,0.9506;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 06:00:00',0.9506,0.9505,0.9507,0.9505;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 07:00:00',0.9505,0.9503,0.9513,0.9509;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 08:00:00',0.951,0.9468,0.951,0.9489;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 09:00:00',0.949,0.9489,0.9543,0.9539;

SQL is a suitable tool to group this data into a more compact timeframe, for instance, the 2-hour timeframe. By using a formatting function on the datetime field, a standard GROUP BY SQL statment can be used to achieve this:

CONCAT(
    year(datetime), '-',
    month(datetime), '-',
    day(datetime), ' ',
    floor(hour(datetime) / 2) * 2, ':00:00'
)

Calculating the low and high of the period is easy, using the standard min/max aggregate functions:


SELECT
    CAST(
        CONCAT(
            year(datetime), '-',
            month(datetime), '-',
            day(datetime), ' ',
            floor(hour(datetime) / 2) * 2, ':00:00'
        ) AS DATETIME) AS datetime,
    MIN(low) AS low,
    MAX(high) AS high
FROM EURUSD_3600
GROUP BY CAST(
        CONCAT(
            year(datetime), '-',
            month(datetime), '-',
            day(datetime), ' ',
            floor(hour(datetime) / 2) * 2, ':00:00'
        ) AS DATETIME)

GROUP_CONCAT can be used for the opening(first) and close(last), the trick is to be sure to use the ORDER BY clause in the group concatenation, otherwise the returned first and last records might not be as expected:


SUBSTRING_INDEX(
GROUP_CONCAT(CAST(open AS CHAR) ORDER BY datetime),
',',
1
) as open

SUBSTRING_INDEX(
GROUP_CONCAT(CAST(close AS CHAR) ORDER BY datetime),
',',
-1
) as close

GROUP_CONCAT returns a comma separated string of all groups values, then SUBSTRING_INDEX is used to return the first and last values of the group. This works as long as data is properly sorted and the resulting string fits in the GROUP_CONCAT buffer.

Note however that there is a problem with the preceding code when calculating the close value. SUBSTRING_INDEX looks from the end of the string, but the GROUP_CONCAT buffer has a limited size, and the it will be truncated if the concatenated rows exceed the buffer size, in which case the returned value will not be the last in the group as desired.

An easy solution is to reverse the sort order and fetch the first element instead:


SUBSTRING_INDEX(
GROUP_CONCAT(CAST(close AS CHAR) ORDER BY datetime DESC),
',',
1
) as close

MySQL will still throw a warning about truncated group_concat when there are too many rows in the source table, but this can be safely ignored for the purpose of this example.

The final query turns out as:


SELECT
    CAST(
        CONCAT(
            year(datetime), '-',
            month(datetime), '-',
            day(datetime), ' ',
            floor(hour(datetime) / 2) * 2, ':00:00'
        ) AS DATETIME) AS datetime,
    SUBSTRING_INDEX(
        GROUP_CONCAT(CAST(open AS CHAR) ORDER BY datetime),
        ',',
        1
    ) as open,
    MIN(low) AS low,
    MAX(high) AS high,
    SUBSTRING_INDEX(
        GROUP_CONCAT(CAST(close AS CHAR) ORDER BY datetime DESC),
        ',',
        1
    ) as close
FROM EURUSD_3600
GROUP BY CAST(
        CONCAT(
            year(datetime), '-',
            month(datetime), '-',
            day(datetime), ' ',
            floor(hour(datetime) / 2) * 2, ':00:00'
        ) AS DATETIME);

 

Visit sunny St. George, Utah, USA