Any MySQL gurus out there, who also know about WordPress’s database structure? I have a question for you.
I want to write an SQL query that will return a summary table where the rows are our authors, the columns are the categories we have for posts, and the cells show the number of times author n has published a post in category m. I know nothing of SQL queries, especially when it comes to complex ones that involve joins between tables. In this case, I think the relevant tables are as follows:
wp-categories, with fields cat-ID and category-nicename (and others)
wp-posts, with fields ID, post-author (and others)
wp-users, with fields ID, user-nicename (and others)
wp-post2cat, with fields post-id and category-id.
Any help is much appreciated.
{ 24 comments }
Cranky Observer 04.22.05 at 11:32 am
Is the number of categories fixed for a reasonable length of time (e.g. months), or is it dynamic day-by-day? Would you be willing to edit the query periodically to keep up with changes in the categories?
Turning summarized rows into columns is one of the harder problems in SQL. If you can maintain the query so that it ‘knows’ the categories in advance, it is doable (in Oracle at least), but if you want the number of columns to be dynamic with no editing of the query it is a lot harder. Unless there is a built-in feature of MySQL which does this, which is possible.
Cranky
Cranky Observer 04.22.05 at 11:37 am
Here are some references:
http://www.dbforums.com/t1146187.html
The Definitive Guide to MySQL, 2nd Ed., is said to have a “SQL Recipies” chapter with examples of a pivot table, but from the review it looks like it starts assuming a high level of knowledge.
abb1 04.22.05 at 11:48 am
Run a query that returns Author, Category and Number of Posts, import it into Excel and do PivotTable.
Kieran Healy 04.22.05 at 12:01 pm
Is the number of categories fixed for a reasonable length of time (e.g. months), or is it dynamic day-by-day? Would you be willing to edit the query periodically to keep up with changes in the categories?
I only need to do it once: the goal is to generate a bit of data to illustrate correspondence analysis, not to have a dynamic query that needs to be run every day or anything.
Barry 04.22.05 at 12:01 pm
Excel (and Access) can link to an external file, so once it’s set up, you merely have to re-reun the query, and refresh the pivot table.
Kieran Healy 04.22.05 at 12:02 pm
Run a query that returns Author, Category and Number of Posts, import it into Excel and do PivotTable.
Even this may be beyond my meagre SQL skilz, but you’re right that a long-form table of this sort could easily be reshaped in Excel or R.
Barry 04.22.05 at 12:15 pm
Don’t try for number of posts. Just list the columns Author, Category, and any other information that you might want to use (such as date). It should be something like “select Author, Category, Date from TableName”.
The do the counting in the pivot table.
Cranky Observer 04.22.05 at 12:16 pm
Does wp-posts have one record per post? That is, is each row in wp-posts something like:
ID author [post title] [post text]
?
Cranky
Jeff R. 04.22.05 at 12:20 pm
See this: http://lists.mysql.com/mysql/182361. This is a little simpler than your problem. You can create a temp table with columns like authorID, catID and count then do an insert into table.
Jayanne 04.22.05 at 12:22 pm
OT, but I could do with some blogging software help! Is there a help forum for Typepad people? — Blogger and Tripod are so slow it’s a struggle to post there
TIA
Cranky Observer 04.22.05 at 12:25 pm
Here is an attempt, trying to be as vendor-neutral as I can. Formatting will get screwed up when I post though.
select wp-posts.post-author,
wp-post2cat.category-id,
count(*)
from wp-posts,
wp-post2cat
where wp-posts.ID = wp-post2cat.post-id
group by wp-posts.post-author,
wp-post2cat.category-id
order by wp-posts.post-author,
wp-post2cat.category-id
Why don’t you see if that works, and if so I can add in the fancy names if you want.
Cranky
Scott Swank 04.22.05 at 12:30 pm
Kieran,
Start out with the following. I won’t guarantee that it’s valid mysql syntax (though it is correct sql syntax) — but it should be close enough that anyone that’s worked with mysql even a bit can get it the rest of the way to a correct phrasing.
SELECT u.user-nicename,
c.category-nicename
FROM wp-users u,
wp-posts p,
wp-post2cat pc,
wp-categories c
WHERE p.post-author = u.id
AND pc.post-id = p.id
AND c.cat-id = pc.category-id;
This should give you all of the author/category pairings. Now to summarize them and count combinations you change this to the following:
SELECT u.user-nicename,
c.category-nicename,
COUNT(*) AS post-cnt
FROM wp-users u,
wp-posts p,
wp-post2cat pc,
wp-categories c
WHERE p.post-author = u.id
AND pc.post-id = p.id
AND c.cat-id = pc.category-id
GROUP BY
u.user-nicename,
c.category-nicename;
If you’ve gotten this far, then you almost certainly want to simply pull the results into a tool like Excel and pivot things from there. If you’re feeling bold, then you can…
SELECT u.user-nicename,
COUNT(DECODE(c.category-nicename, ‘academia’, ‘x’)) AS academia-cnt,
COUNT(DECODE(c.category-nicename, ‘blogging’, ‘x’)) AS blogging-cnt,
COUNT(DECODE(c.category-nicename, ‘etc’, ‘x’)) AS etc-cnt
FROM wp-users u,
wp-posts p,
wp-post2cat pc,
wp-categories c
WHERE p.post-author = u.id
AND pc.post-id = p.id
AND c.cat-id = pc.category-id
GROUP BY
u.user-nicename;
This is saying: 1) group by user, 2) count occurances of “this funny decode thing”. I won’t guarantee that mysql supports the function decode(), but here’s what it does so that you can look for a suitable replacement. It maps a value, in our case category, to a list of pairs of values. In the first case we want to compare the category with ‘academia’ and if it matches then we count up ‘x’, if it does not match then we count up nothing (called NULL in databases). So this counts the number of occurances of ‘academia’. The next column similarly counts the number of occurances of ‘blogging’, and you would add additional columns similiarly.
As the cranky indicated though, you’ll have to keep this up to date manually as you add/remove categories.
Cheers,
Scott
Scott Swank 04.22.05 at 12:33 pm
Hmm, word press doesn’t like my sql statements and mangled them. I’ll e-mail them to you directly.
Kieran Healy 04.22.05 at 12:35 pm
Hey Cranky, that works. Cool. How do I include the fancy names?
KCinDC 04.22.05 at 12:37 pm
This gets the needed data (I just tried it on a WordPress blog I manage):
SELECT u.user_nickname, c.category_nicename, COUNT(*) FROM `wp_users` u INNER JOIN wp_posts p ON u.ID = p.post_author INNER JOIN wp_post2cat pc ON p.ID = pc.post_id INNER JOIN wp_categories c ON pc.category_id = c.cat_ID GROUP BY u.user_nickname, c.category_nicename
It’s not in the table format you want, but I’d do that part in PHP or something rather than trying to do it in the query — which you could do with something long and hard-coded like
SELECT u.user_nickname,
COUNT( category_nicename = ‘category_name1’) AS category_name1_count,
COUNT( category_nicename = ‘category_name2’) AS category_name2_count,
…
FROM `wp_users` u INNER JOIN wp_posts p ON u.ID = p.post_author INNER JOIN wp_post2cat pc ON p.ID = pc.post_id INNER JOIN wp_categories c ON pc.category_id = c.cat_ID GROUP BY u.user_nickname
Cranky Observer 04.22.05 at 12:43 pm
Scott’s example looks good. Just check to be sure it isn’t giving you multiple copies of the same summary (he used a different approach from what I would have used, and I don’t have a MySQL handy to test). If it doesn’t work let me know.
_A Visual Introduction to SQL_ by Chappel et. al. is a very straightforward, readable, and usable introduction to this kind of stuff. It has been in print for 20 years so I guess others think the same! I actually prefer the 1st edition (blue cover) to the 2nd edition (white and green cover) if you can find one used.
Cranky
KCinDC 04.22.05 at 12:43 pm
Ugh, the COUNT stuff gets eaten. Not sure what’s triggering it. In the first query it should be COUNT[*] with parentheses in place of brackets, and in the second it’s supposed to be COUNT[ c.category_nicename = ‘category_name1’ ], for example.
Crossing my fingers that the brackets survive posting.
KCinDC 04.22.05 at 12:44 pm
Argh, in the second query it should actually be SUM, not COUNT.
abb1 04.22.05 at 12:46 pm
Yeah, I don’t do MySQL, but I suspect that’s right: in MySQL you probably need to do explicit joins rather than FROM table_name, table_name, table_name WHERE.
Scott Swank 04.22.05 at 12:54 pm
You can also head over to this site:
http://pipetalk.quest-pipelines.com/default.asp?boardid=mysql
And ask questions about mysql that a very knowledgable fellow named Andrew Gilfrin (among others) will answer for you. Tell him I sent you. :)
jet 04.22.05 at 3:11 pm
Anyone here had good vibes using MySQL 5.0?
KCinDC 04.22.05 at 3:31 pm
5.0? We’re still cautiously examining 4.1.
Cranky Observer 04.22.05 at 5:11 pm
Kieran,
Problem solved? Or do you need more help?
Cranky
Kieran Healy 04.22.05 at 6:56 pm
Problem solved! Thanks a million everyone — followup “here”:https://crookedtimber.org/2005/04/22/crooked-timbers-field-of-positions/.
Comments on this entry are closed.