drupal & mysql utf8 issue
Let's say you do a mysql upgrade on your Drupal server and your Drupal nodes get garbled characters in their content like this:
 1. Glen tells the world what he'll do for a client, right in his bio, (placing each listed home on 30 web sites). Do the same and use YOUR bio on PruSourceOne.com, Obeo, Realtor.com and your own web site
Obviously,  doesn't belong there. It is likely you are having this Drupal/mysql 4.0/4.1 problem.
I attempted to troubleshoot this using both methods, and both failed. Ultimately, by restoring to a normal mysqldump(from the day prior) I managed to get to the point where the only obscure characters where ÂÂ.
I ultimately removed these by literally opening the *.sql dump with gedit and doing a search/replace on those characters. I would have though this would have fixed it, but surprisingly I received the same error that I got when I follow method B on the drupal blog:
ERROR 1062 (23000) at line 2616: Duplicate entry 'content/some-clean-urlÂ' for key 2
Long story short, I tried to check 2616 when I first saw this but quanta was crashing. However, gedit was playing nice and I manage to see the line in question, it was a alias table entry. I think what was happening is that when the mysql DB was 'normalized' for the utf8 charset, it inadvertantly made two previous dissimilar aliases identical. So I couldn't do an import until I removed the dupe aliases. Make sense?
UPDATE 9/24/09
I am very frustrated right now. I have it fixed for the most part(except for one user's content who writes in Word then copies and pastes into tinMCE, and here is a recap of my day.
It became apparent to me that the body of all nodes became just a few words or first sentence. A manual query of the DB, similar to this:
SELECT r.title, r.body FROM node n INNER JOIN users u ON u.uid = n.uid INNER JOIN node_revisions r ON r.vid = n.vid WHERE n.nid = 554; where 554 is the node number
would verify the short body so I realized that my search and replace somehow magically broke my *.sql file in such a way that it would still work, but trim the content to be very short. My first thought was did I also delete the latin1 EOLs that are needed to make the *.sql wore: k properly? So I pulled up an example line in a editor like this one:
<strong>Upload Snippets & Photos of Local Restaurants or Businesses to your FaceBook and Twitter pages.</strong>
Now, I decided to delete and to push the sql into a fresh DB, which I did and sure enough, the entire node became "Upload Snippets & Photos of Local Restaurants or Businesses" So obviously I wasn't breaking and *.sql file formatting, this was actual content, or a SQL value if you will. At this point I gave up on my S&R attempts and pursued othered avenues.
First thought via a friend was open in gEdit, and save as UTF-8. This just turned the characters into ones with the weird boxes interspersed between them. Not good. Next up, GNU Recode. After 'fixing' an odd line of code, it compiled but I got nowhere using the command line and the (seemingly) lacking documentation.
I was fed up, and dropped my table, recreated, added priveleges, BUT left out this:
ALTER DATABASE drupal DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
which you will recognize from the Drupal guide about this issue. I restored to a backup from the ninth, and it loaded well. Well enough for me to be content. Yet, I thought I would try to load something from the 20th(which is what I started working with), and it was much more littered with random A characters. Not sure what would cause this, but I proceeded to load a backup from the 15th in the same manner, it was clearly less littered with random As than the 20th, but much more than the 9th. I stuck with the ninth and called it a frustrated day.
Why the slow increase of littered As? Was my brain fried and I was hallucinating? Was my DB slowly degenerating yet still sending out backups? Did the new version of mySQL take its time converting the latin1 to utf-8? I just don't know. Turns out I could have avoided the whole mysql/drupal 4.0/4.1 blog and just done a simple restore to an earlier date. At least that is what I am seeing now. Ugggh.