Mohsen’s corner of the Web

Persian names

· Read in about 3 min · (436 Words)

TL;DR See the list of Persian names here

Intro

I’m gonna be a daddy in 3 weeks, and we’re still thinking about the names. The Persian names that are allowed by Iranian government (yes, you cannot pick any name you want), is listed in https://www.sabteahval.ir/ website. The UX of that website is really awful, so I decided to crawl the website and just go through the names in an easier way.

Crawling and Storing in window.localStorage

The website is generating the whole page for each listing of names, and it uses the app session parameter to keep its state, so it’s a bit of work to use anything other than the browser to crawl the pages. So I decided to write a javascript code and run it under the website using this chrome extension: Custom JavaScript for websites The script tries to save all names in window.localStorage, it gets stuck in some cases (the website is sometimes very slow), but I managed to go through all the names with a few restarts.

Link

Transferring window.localStorage to a File

Now, all the names are under the window.localStorage, but how to move them into a more suitable place for searching? I thought just running json.stringify(window.localStorage) should do the job, but seems the amount of data in window.localStorage is a too big bite for the browser. It took more than 5 minutes and I was still waiting for 6000+ names to be printed in the console. So it did not seem to work.

So, plan B, I threw a simple Python flask app to be able to post all the names from javascript in there and store them as a file. Here’s the flask app: dumper python app. The app also cleans up and removes duplicates from the names.

And this is how I could send the names from javascript console to the python app, listening on port 5000:

$.ajaxSetup({
  contentType: "application/json; charset=utf-8"
});
for (var key in window.localStorage) {$.post("https://localhost:5000", window.localStorage[key], "json");}

But, oops again, javascript console was spilling a lot of INSUFFICIENT_RESOURCES errors:

POST https://localhost:5000/ net::ERR_INSUFFICIENT_RESOURCES

Probably because of too many POSTs sent all together to the web server. To work around it, I tried the following:

for (var key in window.localStorage) {
  window.setTimeout(
    function(k){
      return function() {
        $.post("https://localhost:5000", window.localStorage[k], "json");
      };
    }(key),
    Math.random() * 40000);}

It distributes the POSTs in a range of 40 seconds, and this trick stopped the errors. Now I’ve got the full list of names, cleaned up and sorted: Full list of names

Viewing

For viewing, I just loaded everything in a simple AngularJS page. You can see the final results here: Names