Large dataset over 64kb

7 years ago
Hi guys,
I have a dataset of approx 155kb and I desperately trying to visualize it.
It contains 4000 rows, each row has a person name, a value and a hex color.
Your limit is 64kb. I cannot strip down my data, I need all of it to be visualized.

I tried to do a workaround: split the data into several sub-datasets (each one below 64kb). But it does not work – text sizes are different* on different sub-images (while the values are the same). When I try to stitch these sub-images together, the overall picture is wrong. I tried multitude of times with different settings, different shapes… – this workaround solution is proved impossible to implement.

What are my options? Is there a way to visualize 155kb of data? Can you help me do that by raising the size limit for me? Or, maybe, do that on your side, for extra payment?

* Text settings are set correctly, the option "use size column from word data" is checked.
Edited 13 Oct, 2016 14:42
7 years ago
Hi Max, actually Tagul is not very well suited for your needs because:

1) Even if you manage to overcome 64kb limitation Tagul will import only 999 words as it is the defined limit
2) We could raise the limit on our side but then (I am afraid) you will face with another issue. The "use size column from word data" option won't provide you the very accurate results. I mean that the actual sizes for words still won't exactly reflect the actual values in the size column. It is because Tagul visualizing algorithm tries to fit the word cloud shape sacrificing the words size accuracy. Therefore Tagul is not well suited for scientific purposes. I recommend you to play with 999 words and see whether it is an issue for you or not.

If it is not an issue please send the data to the support email and I'll think what we can do.
7 years ago
Alex,

thanks for your quick reply!

My dataset is sport athletes rating. The data is not for scientific purposes, but a certain level of accuracy is desired. There are almost 4000 names, this image is to be printed onto A2-size poster for club members. No average man will ever use a ruler to measure his name size. But I experience large deviations among first, say, 20-30 athletes – the sizes are too different for athletes with same numbers, too different from each other (it is clearly noticeable). Slight differences would have been fixed with ease in Adobe Illustrator afterwards.

I've uploaded my dataset to: c.infographer.ru/trilife/dataset.txt
It uses Cyrillic names, Pt Sans regular font works fine.
I use custom shape: c.infographer.ru/trilife/mask.png
My target outcome is: c.infographer.ru/trilife/cloud-target.png (this image is a sum of four images (i made four sub-images with partial masks!), but the sizes for top athletes are extremely different).

In case of no luck, would you advise me any other online tool? I've googled all across the web, and I cannot find any other that can do the same job.

Thank you for your cooperation,
Max
Edited 13 Oct, 2016 17:59
7 years ago
When I open the data set url I get: 404 not found
7 years ago
Whoops, sorry, I mistyped file name. Now the link works fine
c.infographer.ru/trilife/dataset.txt
7 years ago
Now there is another problem. The encoding seems to be wrong as I can't read the russian names. Which encoding is used in the text?
7 years ago
It should be in UTF-8..
Let's do it the other way, Here's the original MS Excel file: http://c.infographer.ru/trilife/athletes.xlsx
Is it readable? Microsoft system fonts, Calibri, Arial are usually multi-lingual.

Please note that there are three different rating values:
- original value
- processed value #1, in order to "flatten" tiny differences between athletes and make differences more visible
- processed value #2, a square of previous one, to make them even more different.
As you see, I am OK if the picture is not 100% perfect in texts sizes, just need all of them to be comparable
I was willing to try all three version and see which image looks best.

There are three ready-made tables for you, they are located to the right of the original table. Each one is copy-paste'able to your text processor.
7 years ago
I've created 3 word clouds in your account according to the different rating values you provided. BTW next time you need to import more than 999 words, you can open your browser console (F12) and type TAGS_MAX_AMOUNT = 5000 command and then do the import.
7 years ago
Alex,

Wow, thanks!
I created a new word cloud in my account, using the mask I need. It works! All texts are imported, the data sheet looks complete. The viz is beautiful and I am ready to buy it in vector format.

Two questions though, as far as I don't know how the viz script works…
1) About my latest viz "Value #2 with mask": are there all the texts (approx 4.000), including the smallest ones? The setting is "Words amount: keep as is".
The preview image is small, I am not sure that there are really all of them. I expect them ALL to be there, just tiny, visible by zooming to the vector image copy (after I download it). Correct?

2) Is the pixel size (WxH) of the mask important for your viz scripts, or not?
Let's assume I upload two similar mask images, one of them is twice as big (in pixels) as the other one. Will the outcome (text sizes, vector shapes, etc) be different, or the same?

Your help and your prompt replies are much appreciated, thanks again!
Top score, you deserve it.
7 years ago
BTW, for this amount of texts, a narrower font would be interesting to use.
You have PT Sans regular already – how about PT Sans Narrow, for example?

Reply

Sign up or Login to post a reply.