Need testers for some big changes on staging

load previous
Aug 28, 2024 10:10 am
Keleth says:
So looks like something in how MySQL backed up the data messed up the emojis. I'm not really sure what to do. I'll have to google.
FlyingSucculent says:
It looks like some other symbols are also messed up. In one of my games these occurred:
Quote:
Our captain makes a hard leftâ€" (originally: left—)
Pinky’s (originally: Pinky’s)
Edit: Also, I have a character named Remíe (with an accented í), and while the name is corrupted in text of posts, the name on the sheet isn't. (It's a 13th Age sheet, if it matters.)
So unicode related? Hopefully it is as easy as getting everyone to talk UTF8.
Aug 28, 2024 10:12 am
FlyingSucculent says:
Some weirdness with bookmarked characters. ... haven't changed the bookmarks in a while, so not an old data issue. ... And game bookmarks are fully gone ...
Both my Games and Characters Bookmarks are not the same on staging as on prod. I also have not changed them in a while. Both menus have entries in them, and are things I had bookmarked at one time.
Aug 28, 2024 12:09 pm
• On staging, my Subscriptions list is empty.
Aug 28, 2024 8:06 pm
vagueGM says:
So unicode related? Hopefully it is as easy as getting everyone to talk UTF8.
Yah, unicode issues and I don't know what. The current server has all its tables set to UTF8, which is what existed at that point. The new server is using utf8mb4, which supports emojis. So how is our current server supporting emojis? I checked, and we don't do anything special when storing or printing. Yet it works, when every resource I can find online says it shouldn't. And all the initial answers I've gotten say it shouldn't work. In short, I'm stuck. I even got my Stack Overflow question marked duplicate to a related but not the same question.

If anyone has experience with unicode and PHP/MySQL, please reach out, because I don't know what to do.
Aug 28, 2024 8:31 pm
Sounds like witchcraft to me. D: I wish I could help.
Out of curiosity, since sheets aren't affected by this issue, are they stored by some other method?
Aug 28, 2024 8:43 pm
@Adam might know.

Adam, sorry if I'm misremembering, but I think some of the features you added involved emojis. Do you remember if they are being stored in a special way, or need to be dealt with in a special way?
Aug 28, 2024 8:44 pm
Keleth says:
... The current server has all its tables set to UTF8, which is what existed at that point. The new server is using utf8mb4 ...
FlyingSucculent says:
... Pinky’s (originally: Pinky’s) ...
So presumably prod is using utf8mb3? utf8mb3 stores its values in three bytes, and we are getting three characters ' ’ ' instead of the ' ’ ' which is 0x2019.

Either the export from the old database or, more likely, the import into the new database is treating these three byte characters as three single byte CP-1252 code page characters, which is what we are seeing displayed on staging.
Keleth says:
... which supports emojis. So how is our current server supporting emojis? ...
Presumably it only supports three byte long emojis? Or is it storing those emojis as three bytes and rendering them 'combined'?
FlyingSucculent says:
... since sheets aren't affected by this issue, are they stored by some other method?
That seem worth looking into. If they are stored differently then it may not help, but if they are exporting or importing differently then it could be an answer?

Are those from the mongodb? If so that indicates the old mysql is the source of the issue?

How did you import the data?

Maybe try adding --default-character-set=utf8 to the mysql import command? Else we might have to look into convincing the old server to export it in way we can use?
Aug 28, 2024 8:47 pm
vagueGM says:
FlyingSucculent says:
... since sheets aren't affected by this issue, are they stored by some other method?
That seem worth looking into. If they are stored differently then it may not help, but if they are exporting or importing differently then it could be an answer?

Are those from the mongodb? If so that indicates the old mysql is the source of the issue?

How did you import the data?

Maybe try adding --default-character-set=utf8 to the mysql import command? Else we might have to look into convincing the old server to export it in way we can use?
Yah, everything copied from mongo is fine, because mongo doesn't really bother with encoding issues; you feed it stuff, it saves it. The issue is strictly the MySQL data being copied over.

I took the latest database backup (a sql file), changed all the tables to be utf8mb4, and then ran them as sql queries. The backup is already using a default character set to export. I'm not sure what other "right way" to export may be. Is it an issue on the import side? I don't know.
Aug 28, 2024 8:50 pm
The import may be thinking those malformed three byte utf8 characters are three characters. Try telling it you really mean it to treat everything as utf8 with --default-character-set=utf8 .
Aug 28, 2024 9:19 pm
vagueGM says:
The import may be thinking those malformed three byte utf8 characters are three characters. Try telling it you really mean it to treat everything as utf8 with --default-character-set=utf8 .
Already doing that on export and import.
Aug 28, 2024 9:24 pm
Keleth says:
... Already doing that on export and import.
OK, so if that is not working, maybe

mysqldump --skip-set-charset --default-character-set=utf8mb3

to be explicit on the export?

Or experiment with

mysqldump --skip-set-charset --default-character-set=latin1

so the old server does not try to do anything with the three bytes and leaves them 'as is'?
And then with

mysql --default-character-set=utf8mb4

or

mysql --default-character-set=utf8mb3

to be explicit on the import even if using latin1 on the export?

I don't know if these import options affect how the data is stored, or just how it is treated while being imported, you should be able to ask the database about it afterwards if you imported utf8mb3 with success?
Aug 28, 2024 10:09 pm
So I tried --default-char-set=uft8 on export and --default-char-set=utf8mb3 on import. No luck. I'll try the other two options.
Aug 28, 2024 11:30 pm
Seems like something is set to ASCII on staging.

In UTF-8, "right single quotation mark" aka ’ is encoded as 3 bytes:
0xe2
0x80
0x99

In ASCII, those three bytes encode:
â


respectively.

So seems like the data is saved properly in the database, but it's being interpreted as ASCII on staging.
Aug 29, 2024 12:20 am
I guess the other possibility is that it is being switched during the data migration. Can you read the raw data? If it migrated properly, but it's being interpreted wrong, then the raw data for the incorrect ’ would be the 3 bytes above. If the data migrated incorrectly then the raw data would be:
0xc3
0xa2
0xe2
0x82
0xac
0xe2
0x84
0xa2

That might help narrow down where the issue is.
Last edited August 29, 2024 12:33 am
Aug 29, 2024 12:29 am
Unrelated to the Unicode issue, but I just noticed that some of originally private games are listed as public on staging. They aren't actually public, you can't read the forums and the posts don't appear among the recent public posts, but it is a little weird. (It includes games which never have been public )

PS: probably expected at this point, but my test string of Cyrillic did become this:
Тест, ÑŽ, щ, ц, Ñ‹, Ñ„. (Originally: Тест, ю, щ, ц, ы, ф.)
Aug 29, 2024 7:53 am
Chalrytharendir says:
@Adam might know.

Adam, sorry if I'm misremembering, but I think some of the features you added involved emojis. Do you remember if they are being stored in a special way, or need to be dealt with in a special way?
I seem to remember that the db supported emoji and it was the code that needed changing.

Looking at these release notes: Release notes: 18th September 2021

https://i.imgur.com/Q6fX2kI.png

...and the changes that were merged at around that time...

It seems a rogue utf8_decode was in there and I removed it.
https://i.imgur.com/579kXOa.png

I'd check the code for instances of utf8_decode and see if they're needed.
Aug 29, 2024 11:15 am
FlyingSucculent says:
... my test string of Cyrillic did become this:
Тест, ÑŽ, щ, ц, Ñ‹, Ñ„. (Originally: Тест, ю, щ, ц, ы, ф.)
Did you enter that as new text on staging, after the import, or was that that already in the database?

I meant to try entering new unicode to see if it worked, which would say it was something with the export/import, but have not been able to access staging for a while.

If newly entered emojis or unicode still don't work then we are barking up the wrong tree with the database migration and it is probably something in the code that was there to deal with the old/wrong utf8 implementation in mysql5, and that code is now breaking things where it should stay out of the way. Thanks Adam [ref].
Aug 29, 2024 12:55 pm
FlyingSucculent says:
PS: probably expected at this point, but my test string of Cyrillic did become this:
Тест, ÑŽ, щ, ц, Ñ‹, Ñ„. (Originally: Тест, ю, щ, ц, ы, ф.)
I guess this is sort of a clue? Though I don't understand why. The mongo data (like character sheets) is being imported from JSON files via PHP and work. So why are values being input through PHP also not storing correctly? Or rendering correctly?

Unfortunately, I haven't been able to figure out how to view the raw data from a SQL query or in DBeaver.
Aug 29, 2024 1:05 pm
Keleth says:
... Unfortunately, I haven't been able to figure out how to view the raw data from a SQL query or in DBeaver.
Can you check what the command line mysql tool shows? If it is outputting correct characters then we know it is php messing with things, if it is outputting the strangeness we see then we will need to work out how to replace those character sequences with the correct unicode codepoints.

I haven't looked at the sql, do you know what query would output the data for this page? Or from one of the thread names from that page?
Aug 29, 2024 1:06 pm
vagueGM says:
I meant to try entering new unicode to see if it worked, which would say it was something with the export/import, but have not been able to access staging for a while.

If newly entered emojis or unicode still don't work then we are barking up the wrong tree with the database migration and it is probably something in the code that was there to deal with the old/wrong utf8 implementation in mysql5, and that code is now breaking things where it should stay out of the way. Thanks Adam [ref].
I'm gonna stop various backup/restore methods. I'm not seeing anything different across the board, which is telling me it may not be the MySQL itself.

The PHP docs say it should handle UTF-8 by default, but I'm going to try implementing it directly. As for the code, I started by looking there, and the functions that exist to save data/display data don't do anything that should interfere with emojis. The PHP version is the same on both envs.
load next

You do not have permission to post in this thread.