Sunday, December 27, 2009

de-latinisation of the web

So in case you didn't know; ICANN has de-latinised the web recently.


Some background links are
http://online.wsj.com/article/SB125664117322309953.html?mod=WSJ_hpp_LEFTTopStories#printMode

and



Now that non-latin domain names are now live;

How are do you think this is going to be handled moving forward? is this is going to be a real problem. for example - I own the domains
http://www.mobileappstore.com/
http://www.mobileappstore.net/
http://www.mobileappstore.org/
http://www.mobileappstore.mobi/
(yeh i know pretty cool domains to own - i'm selling them if you are interested....)

I was recently contacted by the domain holder for
手机应用商店.cn
手机应用商店.com

Which is mobileappstore.com  in chinese  that he had just registered ????

How the heck is this going to work???

eg. I could register the domain for Paypal.com in Hebrew tomorrow and anyone who uses the Hebrew version of their keyboard would be going to a totally different website that i control and as long as i'm not phishing there is nothing paypal can do??


Cheers,
Dean



P.S.
UPDATE: There are much bigger issues than trademark at stake. As you can see from the facebook conversation below – my phishing concerns are worse than you think. It's way worse.

If you cut and paste the following text into your browser bar;


>                    раyраl.com                      <


it looks like paypal.com BUT ITS NOT!!

Some of the letters are Cyrillic and lead to this webpage - http://xn--yl-6kcb1fc.com/
(it's not registered yet - so it will lead to your isp's default null dns page).
This is a major screwup by ICANN

Regards,
Dean Collins




Alastair Bor
Yea and what about  >     раyраl.com       <  (I just typed the p and a in Cyrillic characters).

Try cutting and pasting the paypal.com text in my comment above into your browser :)

If you only have english qwerty keyboard check out http://www.translit.ru/

Easy way to use foreign characters... including Greek, Hebrew, Georgian, etc... helps if you can read a bit of Russian to navigate the site...
-Alastair


Dean Collins
This is a major screwup by ICANN - there is no way people aren't going to get caught by phishing techniques like this.

What can we do about this?

23 comments:

  1. Wow. That's crazy! I always visually check the domain name to make sure it's the real thing when going to sites like PayPal - but now even that isn't safe anymore I guess.... ugh.

    Is *PayPal* aware of this? they're on of the entities that would actually REALLY want to, and be in a position to influence ICANN enough to fix this.

    ReplyDelete
  2. Hi Dean. Long time no contact.

    It's a well known issue with the internationalisation of DNS. Wikipedia has a good summary: http://en.wikipedia.org/wiki/IDN_homograph_attack . Pesonally, I'm not too worried, as it's not particularly secure anyway to rely on DNS registations for verification of identity. If you want to establish trust for a site, it is better to use a certificate, and even better to combine it with the "Web of Trust" (http://en.wikipedia.org/wiki/Web_of_trust).

    I look forward to the (unlikely) day when there is no DNS on the net! That way, people won't be tempted to place their trust in it. For example, Freenet replaces URLs with hashes. I've got memories of reading a Tim Berners-Lee quote, whereby when he invented the WWW he envisaged people clicking on textual hyperlinks and never actually viewing the URL behind it. The trust comes from the digital signature retrieved from the URL, rather than from the URL itself.

    Regards
    John Dalton

    ReplyDelete
  3. Hi John,

    Why do you think digital certificates are going to solve the problem?

    Wouldn't i be able to get a verisign certificate for "> раyраl.com <" that you also wouldn't be able to tell the difference for?

    ReplyDelete
  4. Just as a notice, the redirect to http://xn--yl-6kcb1fc.com/ is a fuck-up of your/my browser (I am using Chrome) which just isn't ready for UTF-8 urls yet. But that doesn't change the fact that this is evil.

    ReplyDelete
  5. This is really insane.

    It means now I must register the same domain name in *all* character set for every single website I have?

    Who the hell comes up with these bright ideas?!?

    ReplyDelete
  6. The fact that this is a surprise to you all is what is really scary.
    As an example, my bank's website opens a new window with the size of the screen without the address bar.
    Do you see my point?
    All that is required is that browsers honor IDNs (unlike firefox, at least) and at the same time unequivocally mark the url/domain as IDN.
    What is required is a culture of responsibility not some feature that prevents people from understanding what they are doing.
    English is not everyone's language.

    ReplyDelete
  7. No i dont see your point anonymous.

    I have no issue with alternative languages being used - the issue is that

    1/ primary domains are now required to be purchased in 6 languages


    2/ that mixed text domains can allow phishing where even i would miss a cyrillic 'a' compared to a latin 'a' in the url paypal.


    This has nothing to do with the web being english only.

    ReplyDelete
  8. In which way can alternative languages be used without this problem arising?
    What is required is a convention for marking the language the domain is in and the practice of aknowledging it.
    Sorry for the harsh tone.
    pedro

    ReplyDelete
  9. I dont have the answers for you.

    I totally think that alternative languages apart from latin text should be allowed on the web (actually until a week ago i didn't realise they couldn't - i incorrectly assumed Indian domains could always be written in Hindi).

    What i'm surprised about is that icann hasn't more widely publicised this issue.

    I have enough trouble trying to explain to my wife about phishing emails as it is, when even i cant tell the differnce this is a bigger problem.

    Spread the url of this blog post so that hopefully someone can explain to us how to solve this problem (outlook and browser plugin maybe?)

    ReplyDelete
  10. @Dean - I got a different take on the information you linked to. What it looks like they are saying, to me anyways, is that they will offer ways to get to the *same* domains that exist now using only international keyboards/characters. For instance, the WSJ post says this:

    "The change will allow the suffix -- known as a top-level domain -- to be expressed in about 16 other alphabets."

    It doesn't say that new tlds using international characters will be introduced, but that the current ones can be expressed without the need for a Latin keyboard. It's a translation thing rather than actual new domains.

    As for the guy offering you your domain in Chinese, I don't see that any different than someone offering you your domain in, say, Welsh (symudolcaisstore.com). If you actually go to the address he offered you, it redirects to a domain name that is similar to yours, but not it exactly.

    ReplyDelete
  11. @Mvandemar

    re"It doesn't say that new tlds using international characters will be introduced, but that the current ones can be expressed without the need for a Latin keyboard. It's a translation thing rather than actual new domains"


    I understand what you are saying but last night i registered hotmail.com with a cyrillic 'o' on godaddy.com (no you cant cut and paste this example as the comments are plain text)

    basically if i sent you this url in an email you would have no way of distinguishing it from a regular latin ‘o’.


    (and yes I understand what you are saying about the welsh ‘similar translation’ – that was my original concern as well but now my primary concern is the phishing issue)

    Cheers,
    Dean

    ReplyDelete
  12. @timoreilly

    Sounds like the security community has given some thought to the non-latin domain name security issues.

    http://bit.ly/4JEmIg

    ReplyDelete
  13. Interesting post.

    It seems like as a security check, browser makers should start displaying both Unicode IDN and punycode in the navigation/address bar.

    ReplyDelete
  14. > UPDATE - This is really really bad - check out the paypal phishing
    > example on my blog already using Cyrillic characters
    >
    >
    > http://blog.collins.net.pr/2009/12/de-latinisation-of-web.html
    >

    That's old news, and was essentially fixed over 4 years ago in a similar variant. The issue a few years back, was that a browser URL bar would display a unicode encoded paypal.com , but direct a browser to something like "xn--pypal-4ve.com" -- which is the ASCII encoding of the the character set.

    In any event, that issue -- and this one -- is not an issue with ICANN, but with the browsers and OS.

    Bruce Schneier discussing it in 2005
    http://www.schneier.com/blog/archives/2005/02/unicode_url_hac_1.html

    Also some stuff from Shmoo ( security think tank featuring directors of Apache, PGP, etc ) - which first published the paypal example.
    http://www.shmoo.com/idn/

    The shmoo page contans the IDN (interrnational domain name) advisory papers. The issue dates back to 2001 when "Homograph Attacks" were first identified.

    The underlying issue, is that homographs look the same, but are not the same. ie: a cyrillc c vs an ascii c.

    There have been a number of proposed fixes, which haven't been adopted by browsers and os, particulary that Browsers / OS should be saying when there is mixed-code characters, or when a character is in a non- native character set. this was actually part of IETF rfc 3490, which is the base IDN standards rfc
    ie:
    given A = ascii , C= Cyrillic
    warn if CCCCCCC on ascii browser
    warn if AAAAAAAA on cyrillc browser
    warn if AAAACCCC on any browser

    the one thing that ICANN did drop the ball on -- and its kind of unfair saying that, as it would have been very hard to implement equitably -- is that they didn't enforce one of the smarter DNS level security concepts -- that possible characters for IDN domain names be locked down by TLD.

    in any event, the issue is much less at the fault of ICANN than it is with the browsers and operating systems.

    ReplyDelete
  15. ICANN is ONLY in it for the money! Am I surprised? NO! Its time to clean up this conflict of interests but I doubt it will happen.

    ReplyDelete
  16. Hi Dean,

    Digital certificates on their own won't fix it. Verisign doesn't care who applies for a certificate. That's why I would combine it with the web of trust. Ultimately the only people you can trust are those who you have a relationship with. The WoT relies on exactly this (friends of friends of friends of...). A dodgy site will get a low trust rating in a WoT, irrespective of what some central "authority" thinks.

    PS. Marcel. The URL "http://xn--yl-6kcb1fc.com/" is not an error. Mozilla (and others) intentionally mangle non-Latin URLs to combat precisely the attack Dean has highlighted. What you are looking at is the solution, not an error!

    Regards
    John

    ReplyDelete
  17. Yeah, it opens new attack vectors. Practically every non-trivial feature ever added to anything does that. I'm not sure what the solution is, but here are the requirements:

    - non-latin characters must be allowed in domains
    - users must have a way to verify the identity of a site/site operator

    Some ideas:

    - ICANN could disallow mixing char sets in domains
    - a landrush period during which Paypal et al could have found and claimed all their sites' homographs would have been nice...
    - browsers could alert the user when a domain's char set doesn't match their system char set (somebody else said this already)

    Browsers and email clients are already starting to report possible phishing attacks and other scams. The heuristics for detecting a possible homograph attack seem pretty straightforward. Should still be workable.

    Honestly, the biggest problem with this whole thing is the lack of publicity and coordination on the part of ICANN. They should have made a statement saying, "On {DATE > 1 year in future} we'll start allowing non-latin domains to roam freely." This would give companies time to lock down the homographs and browser makers time to implement new security warnings.

    Concept: entirely appropriate
    Formal Policy: needs work
    Roll-out: Bone-headed

    ReplyDelete
  18. ok so here is the lowdown......

    Spent several hours with Godaddy level 2 technical support to work out why my registrations for
    paypal and twitter and Godaddy where not working (where i substituted 1 Cyrillic letter in place of an English letter)

    It turns out basically Verisign already worked out the soution to this - you cant mix letters from multiple languages.

    So you cant substitute a cyrillic a and use English letters for > p ypal<

    There are very limited letters that match English with all Cyrillic (lol ebay is one of them - and it's already registered.....hmmmm).

    ....oh and you definitely cant register paypal, not sure where Christina got her 'L' in Cyrillic from :P - http://mashable.com/2010/01/01/idn-phishing/).

    So unless anyone has anything to add i think this issue is a dud.

    Considering even L2 support people at the registrar didn't know this was implemented i dont feel so bad.


    Cheers,
    Dean

    ReplyDelete
  19. "So unless anyone has anything to add i think this issue is a dud.

    Considering even L2 support people at the registrar didn't know this was implemented i dont feel so bad."

    Looks like you found out the hard way. A lot of people are claiming that this is ridiculous, but it has in fact been something ICANN has been working out since 2000. It isn't some "hey lets get Japanese names out... TOMORROW!" type issue that a lot of people think it is.

    There are a lot of security measures in place by ICANN to prevent something like this. For example, the banning of mixed-script domain name registration. That is why you cannot get your "Godaddy" with Cyrillic characters. Hell, they even don't allow certain symbol domains to be registered.

    I've spoken to people at ICANN and this is definitely an important issue to them, as it was a concern to me when I first read about this.

    The purpose of allowing people to use their languages as domains is to start promoting content build-up and Internet use for many countries.

    Imagine in countries like Somalia where they can start using the internet in their own language? I'm pretty sure not many people there speak English since they don't have the resources, but since thy speak Arabic, they will be able to start browsing and searching and visiting websites PROPERLY. The example I've used many times is the way Google sounds in Arabic. A friend told me sounds like "jojol" or something along those lines... so someone even familiar with the language will have a hard time figuring out HOW to spell something like Google. They just don't have the same letters that match the English letters phonetic sound.

    As a native English speaker, having Japanese URLs will not affect me. I already DO NOT visit their Japanese sites nor do I try to read anything in Japanese. Adding this will only benefit them and won't exactly hurt me. In fact, I think this is a great way to start promoting open-mindedness and technological advancements in these types of countries.

    Well, I've written a lot and I'm not even sure I made any sense. Hope I've helped someone!

    ReplyDelete
  20. Yep, basically a shame ICANN ballsed up the introduction and didn't deliver a proper education campaign.

    They might have been thinking about it since 2001 but somewhere in there if they didn't have time to train/educate registrars/press then #FAIL ICANN.

    ReplyDelete
  21. You thought of it before Gizmodo, Dean.
    http://gizmodo.com/5439471/how-non+latin-domain-names-could-be-used-to-steal-your-money

    ReplyDelete
  22. And i quoted it from Wall Street Journal....so we all owe Rupert 5c

    :)

    ReplyDelete