MySQL Query GROUP_CONCAT Over Multiple Rows
Posted
by
PeteGO
on Stack Overflow
See other posts from Stack Overflow
or by PeteGO
Published on 2012-12-06T22:26:50Z
Indexed on
2012/12/06
23:03 UTC
Read the original article
Hit count: 502
I'm getting name and address data out of generic question / answer data to create some kind of normalised reporting database.
The query I've got uses group_concat and works for individual sets of questions but not for multiple sets.
I've tried to simplify what I'm doing by using just forename and surname and just 3 records, 2 for 1 person and 1 for another. In reality though there are more than 300,000 records.
Example of results with qs.Id = 1
.
QuestionSetId Forename Surname
-------------------------------------------------------
1 Bob Jones
Example of results with qs.Id IN (1, 2, 3)
.
QuestionSetId Forename Surname
-------------------------------------------------------
3 Bob,Bob,Frank Jones,Jones,Smith
What I would like to see for qs.Id IN (1, 2, 3)
.
QuestionSetId Forename Surname
-------------------------------------------------------
1 Bob Jones
2 Bob Jones
3 Frank Smith
So how can I make the 2nd example return a separate row for each set of name and address information?
I realise the current way the data is stored is "questionable" but I cannot change the way the data is stored.
I can get sets of individual answers but not sure how to combine the others.
My simplified Schema that I cannot change:
CREATE TABLE StaticQuestion (
Id INT NOT NULL,
StaticText VARCHAR(500) NOT NULL);
CREATE TABLE Question (
Id INT NOT NULL,
Text VARCHAR(500) NOT NULL);
CREATE TABLE StaticQuestionQuestionLink (
Id INT NOT NULL,
StaticQuestionId INT NOT NULL,
QuestionId INT NOT NULL,
DateEffective DATETIME NOT NULL);
CREATE TABLE Answer (
Id INT NOT NULL,
Text VARCHAR(500) NOT NULL);
CREATE TABLE QuestionSet (
Id INT NOT NULL,
DateEffective DATETIME NOT NULL);
CREATE TABLE QuestionAnswerLink (
Id INT NOT NULL,
QuestionSetId INT NOT NULL,
QuestionId INT NOT NULL,
AnswerId INT NOT NULL,
StaticQuestionId INT NOT NULL);
Some example data for only forename and surname.
INSERT INTO StaticQuestion (Id, StaticText)
VALUES (1, 'FirstName'),
(2, 'LastName');
INSERT INTO Question (Id, Text)
VALUES (1, 'What is your first name?'),
(2, 'What is your forename?'),
(3, 'What is your Surname?');
INSERT INTO StaticQuestionQuestionLink (Id, StaticQuestionId, QuestionId, DateEffective)
VALUES (1, 1, 1, '2001-01-01'),
(2, 1, 2, '2008-08-08'),
(3, 2, 3, '2001-01-01');
INSERT INTO Answer (Id, Text)
VALUES (1, 'Bob'),
(2, 'Jones'),
(3, 'Bob'),
(4, 'Jones'),
(5, 'Frank'),
(6, 'Smith');
INSERT INTO QuestionSet (Id, DateEffective)
VALUES (1, '2002-03-25'),
(2, '2009-05-05'),
(3, '2009-08-06');
INSERT INTO QuestionAnswerLink (Id, QuestionSetId, QuestionId, AnswerId, StaticQuestionId)
VALUES (1, 1, 1, 1, 1),
(2, 1, 3, 2, 2),
(3, 2, 2, 3, 1),
(4, 2, 3, 4, 2),
(5, 3, 2, 5, 1),
(6, 3, 3, 6, 2);
Just in case SQLFiddle is down here are the 3 queries from the examples I've linked to:
1: - working query but only on 1 set of data.
SELECT MAX(QuestionSetId) AS QuestionSetId,
GROUP_CONCAT(Forename) AS Forename,
GROUP_CONCAT(Surname) AS Surname
FROM (SELECT
x.QuestionSetId,
CASE x.StaticQuestionId WHEN 1 THEN Text END AS Forename,
CASE x.StaticQuestionId WHEN 2 THEN Text END AS Surname
FROM (SELECT (SELECT link.StaticQuestionId
FROM StaticQuestionQuestionLink link
WHERE link.Id = qa.QuestionId
AND link.DateEffective <= qs.DateEffective
AND link.StaticQuestionId IN (1, 2)
ORDER BY link.DateEffective DESC LIMIT 1) AS StaticQuestionId,
a.Text,
qa.QuestionSetId
FROM QuestionSet qs
INNER JOIN QuestionAnswerLink qa ON qs.Id = qa.QuestionSetId
INNER JOIN Answer a ON qa.AnswerId = a.Id
WHERE qs.Id IN (1)) x) y
2: - working query but undesired results on multiple sets of data.
SELECT MAX(QuestionSetId) AS QuestionSetId,
GROUP_CONCAT(Forename) AS Forename,
GROUP_CONCAT(Surname) AS Surname
FROM (SELECT
x.QuestionSetId,
CASE x.StaticQuestionId WHEN 1 THEN Text END AS Forename,
CASE x.StaticQuestionId WHEN 2 THEN Text END AS Surname
FROM (SELECT (SELECT link.StaticQuestionId
FROM StaticQuestionQuestionLink link
WHERE link.Id = qa.QuestionId
AND link.DateEffective <= qs.DateEffective
AND link.StaticQuestionId IN (1, 2)
ORDER BY link.DateEffective DESC LIMIT 1) AS StaticQuestionId,
a.Text,
qa.QuestionSetId
FROM QuestionSet qs
INNER JOIN QuestionAnswerLink qa ON qs.Id = qa.QuestionSetId
INNER JOIN Answer a ON qa.AnswerId = a.Id
WHERE qs.Id IN (1, 2, 3)) x) y
3: - working query on multiple sets of data only on 1 field (answer) though.
SELECT
qs.Id AS QuestionSet,
a.Text AS Answer
FROM
QuestionSet qs
INNER JOIN QuestionAnswerLink qalink ON qs.Id = qalink.QuestionSetId
INNER JOIN StaticQuestionQuestionLink sqqlink ON qalink.QuestionId = sqqlink.QuestionId
INNER JOIN Answer a ON qalink.AnswerId = a.Id
WHERE
sqqlink.StaticQuestionId = 1 /* FirstName */
AND sqqlink.DateEffective =
(SELECT DateEffective
FROM StaticQuestionQuestionLink
WHERE StaticQuestionId = 1
AND DateEffective <= qs.DateEffective
ORDER BY DateEffective
DESC
LIMIT 1)
© Stack Overflow or respective owner