Tuesday, June 9, 2009

Dyslexiastatisticscountries

Convert HTML files to PDF automatically

still do not fully explain why we find a converter html-to-pdf decent used in an online application is so problematic, and sopprattutto because Adobe has not thought to develop an application to do so (Since the pdf is their thing) ..

Mha, so goes the world.

However, in 2 and a half years of testing, testing, and various expletives (few) successes, these are our results:


CSS Quality weight file Time (sec.) Mark D. Mark L.
html2ps Excellent 814 Kb 15-18 7 5.5
HTMLDOC Good 164 Kb 4-5 6 7
Firefox + Command line print Excellent 382 Kb 5-6 8 8
Wkhtmltopdf Excellent 207 Kb 3-4 9 9


It may seem a trivial, however, has become an odyssey that has lasted two years and a half .. and that still is not completely finished.

Why create a PDF on the fly, on-the-fly from a HTML document?

First of all because a PDF is more difficult to change, can be emailed (again automatically, why not) and certainly looks much better than an html file.

also an html file in the images and css are often outside: it becomes difficult to send (or allow for the download) and make sure that the recipient sees exactly what we visualize .. pdf with all this is resolved.

There are already many methods, open source also, create pdf file dynamically (by php or python, for example)

True, but our needs (and not just ours) is to convert it: to create, and change later, one page (x) html is very simple ... because I have to recreate the same page in pdf language? And when I change something on my page, I have to get back to even change the page .. pdf why, when I have the source ?

[ Error: irreparable invalid markup ('\u0026lt;a [...] /outbound/article/www.tufat.com');">') in entry. Owner must fix manually. Raw contents below.]

\u0026lt;p> still do not fully explain why we find a \u0026lt;strong> converter \u0026lt;/ strong> \u0026lt;em> html-to-pdf \u0026lt;/ em> in decent usable \u0026lt;strong> ’ an online application \u0026lt;/ strong> is so problematic, and sopprattutto because Adobe has not thought to develop an application to do so (because the PDF is what they )..\u0026lt;/ p> MHA so goes the world. \u0026lt;/ p> However, in 2 and a half years of testing, testing, and various expletives (few) successes, these are our results: \u0026lt;/ p> \u0026lt;table cellspacing = "0" border = "0" cellpadding = "0" width = "640"> \u0026lt;tbody> \u0026lt;tr> \u0026lt;td width="194"> \u0026lt;br /> \u0026lt;/ td> \u0026lt;td align = "center" width = " ; 58 "> \u0026lt;strong> CSS \u0026lt;/ strong> \u0026lt;/ td> \u0026lt;td width="69"> \u0026lt;strong> Qualità \u0026lt;/ strong> \u0026lt;/ td> \u0026lt;td width="82"> \u0026lt; strong> peso del file \u0026lt;/ strong> \u0026lt;/ td> \u0026lt;td width="92"> \u0026lt;strong> Tempo (sec) \u0026lt;/ strong> \u0026lt;/ td> \u0026lt;td width="72"> \u0026lt;strong> Voto D. \u0026lt;/ strong> \u0026lt;/ td> \u0026lt;td width="73"> \u0026lt;strong> Voto L. \u0026lt;/ strong> \u0026lt;/ td> \u0026lt;/ tr> ; \u0026lt;td> \u0026lt;strong> html2ps \u0026lt;/ strong> \u0026lt;/ td> \u0026lt;td> \u0026lt;img src = "http://posterous.com/ getfile/files.posterous.com/kgo/tbTO2E3l6bwqMMbFqwN4Y7MSaJfft5L2xEcX0n1GULJs8q4xKAUHFwLfyTHl/ok_24.gif "width =" 24 " height = "24" /> \u0026lt;/ td> \u0026lt;td> Ottimo \u0026lt;/ td> \u0026lt;td> 814 KB \u0026lt;/ td> \u0026lt;td align = "center"> 15-18 \u0026lt;/ td> \u0026lt;td> 7 \u0026lt;/ td> \u0026lt;td> 5,5 \u0026lt;/ td> \u0026lt; ; / tr> \u0026lt;td> \u0026lt;strong> HTMLDOC \u0026lt;/ strong> \u0026lt;/ td> \u0026lt;td> \u0026lt;img src = "http://posterous.com/getfile/files.posterous.com/kgo/HSlUUIedcDknCvcEGSU2DkjelmZelt4ZeMzCxIL5y1YO6xsElf7lHQFrDSud/close_2_24.gif" width = "24" height = "24" /> \u0026lt;/ td> \u0026lt;td> Buono \u0026lt;/ td> \u0026lt;td> 164 Kb \u0026lt;/ td> \u0026lt;td align = "center"> 4-5 \u0026lt;/ td> \u0026lt;td> 6 \u0026lt;/ td> \u0026lt;td> 7 \u0026lt;/ td> \u0026lt;/ tr> \u0026lt;td> \u0026lt;strong> Firefox + Command line print \u0026lt;/ strong> \u0026lt;/ td> \u0026lt; td align = "center"> \u0026lt;img src = "http://posterous.com/getfile/files.posterous.com/kgo/tbTO2E3l6bwqMMbFqwN4Y7MSaJfft5L2xEcX0n1GULJs8q4xKAUHFwLfyTHl/ok_24.gif" width = "24" height = " 24 "/> \u0026lt;/ td> \u0026lt;td> Ottimo \u0026lt;/ td> \u0026lt;td> 382 Kb \u0026lt;/ td> \u0026lt;td> 5-6 \u0026lt;/ td> \u0026lt;td> 8 \u0026lt;/ td> ; \u0026lt;td> 8 \u0026lt;/ td> \u0026lt;/ tr> \u0026lt;td> \u0026lt;strong> Wkhtmltopdf \u0026lt;/ strong> \u0026lt;/ td> \u0026lt;td align = " ; center "> \u0026lt;img src =" http://posterous.com/getfile/files.posterous.com/kgo/tbTO2E3l6bwqMMbFqwN4Y7MSaJfft5L2xEcX0n1GULJs8q4xKAUHFwLfyTHl/ok_24.gif " width = "24" height = "24" /> \u0026lt;/ td> \u0026lt;td> Ottimo \u0026lt;/ td> \u0026lt;td> 207 Kb \u0026lt; / td> \u0026lt;td> 3-4 \u0026lt;/ td> \u0026lt;td> 9 \u0026lt;/ td> \u0026lt;td> 9 \u0026lt; / td> \u0026lt;/ tr> \u0026lt;/ tbody> \u0026lt;/ table> \u0026lt;p> \u0026lt;span> \u0026lt;/ span> ’ /> may seem a trivial operation, however, has become a ’ odyssey that has lasted two years and a half .. and that still is not completely finished. \u0026lt;/ p> \u0026lt;strong> Why create a PDF on the fly, \u0026lt;em> on-the-fly \u0026lt;/ em>, from a html document \u0026lt;/ strong> \u0026lt;/ p> First of all because a PDF is more difficult to change, can be emailed (again automatically, why not) and certainly looks much better than an html file. \u0026lt;/ p> In addition to a html file the images and css are often outside: it becomes difficult to send (or allow for the download) and make sure that the recipient sees exactly what we visualize .. pdf with all this is resolved. \u0026lt;/ p> \u0026lt;strong> There are already many methods, including open source, to create \u0026lt;em> \u0026lt;/ em> pdf file dynamically (by php or python, for example ) \u0026lt;/ strong> \u0026lt;/ p> True, but our need (not just ours) is to convert it: to create, modify and then a page (x) html is very simple because I \u0026lt;em> … re \u0026lt;/ em> the same page in \u0026lt;em> language \u0026lt;/ em> pdf? And when I change something on my page, I have to get back to even change the page .. pdf why, when I have the \u0026lt;em> source \u0026lt;/ em> \u0026lt;/ p> \u0026lt;h2> \u0026lt;a href = "http://www.tufat.com/s_html2ps_html2pdf.htm" title = " html2ps html to pdf converter " / Outbound / article / www.tufat.com '); "target =" _blank "> html2ps \u0026lt;/ a> \u0026lt;/ h2> Thu PHP class that converts the first page in \u0026lt;em> ; ghostscript \u0026lt;/ em>, then in pdf. We have tested and tried a long time, really works well and has compatibility with CSS 2.0 very good .. but not without flaws. First the configuration of PHP: it needs a lot of resources to do what must be done and then some parameters are adjusted by the php.ini: \u0026lt;/ p> \u0026lt;blockquote> \u0026lt;p> max_execution_time = 600 \u0026lt;/ p> \u0026lt;/ blockquote> Thu, 10 minutes .. should be sufficient, usually succeeds in 15-30 seconds, but it all depends on the server and the complexity of the page. \u0026lt;/ p> \u0026lt;blockquote> \u0026lt;p> memory_limit = 450M \u0026lt;/ p> \u0026lt;/ blockquote> Thu , 450MB of memory for allocation ’ .. if they seem a lot, you think someone should actually 1024MB (1Gb!) \u0026lt;/ p> \u0026lt;blockquote> \u0026lt;p> backtrack_limit = 2000000 \u0026lt;/ p> \u0026lt;/ blockquote> \u0026lt;p> Same as above; 2000000 is probably a bit exaggerated, but to be safe. \u0026lt;/ p> Also, you must ensure that all the fonts on the page to convert are also present in his folder ( html2ps/fonts) .. This is easy to put up a solvable problem, if you have direct control over ’ origin of pages to convert .. \u0026lt;/ p> So, yes it works fine, but it requires too many resources .. also a tomorrow, with the advent of CSS 3.0 ’ arises the problem of stale classes .. MOOLTO are great. \u0026lt;/ p> A example of our home page converted with html2ps pdf you can download \u0026lt;a href = "http://www.kgo.it/blog/wp-content/uploads/2009/05/prova-con-html2ps.pdf" title = "Sample HTML to PDF conversion with html2ps" / downloads/blog/wp-content/uploads/2009/05/prova-con-html2ps.pdf '); "target =" _blank "> download it here \u0026lt;/ a> (814Kb). \u0026lt;/ p> \u0026lt;h2> \u0026lt;a href = "http://www.digitaljunkies.ca/dompdf/" title = "The PHP 5 HTML to PDF Converter '/ outbound / article / www.digitaljunkies.ca'); " target = "_blank"> DOMPDF \u0026lt;/ a> \u0026lt;/ h2> Thu Other PHP class, already has a decidedly less support for CSS, we have practically abandoned now .. \u0026lt;/ p> ; Slower than html2ps, with the same defects generated files larger end. I admit that I have tested recently, but did not give us a good impression .. \u0026lt;/ p> \u0026lt;h2> \u0026lt;a href = "http://www.htmldoc.org/" title = "& lt; HTML > DOC '/ outbound / article / www.htmldoc.org'); " target = "_blank"> HTMLDOC \u0026lt;/ a> \u0026lt;/ h2> Thu, here we are, this is a small wagon-armed \u0026lt;br /> Fast (2-3 seconds on average, but also here is always dependent on the server and the size / complexity of the page), reliable (in 2 and a half years that we use has never stopped), open source (but the compiled version is free of charge). \u0026lt;/ p> \u0026lt; p> One small (but big \u0026lt;strong> \u0026lt;/ strong>) defect: forget also the CSS, this still works with the old definitions html (< font ” … ” color = > etc. ..)\u0026lt;/ p> An example (… sad, without css) of our home page with \u0026lt;a href = "http://www.kgo.it/blog/wp-content/uploads/2009/05/prova-con-htmldoc.pdf" title = "Example of conversion to html pdf file with htmldoc "/ downloads/blog/wp-content/uploads/2009/05/prova-con-htmldoc.pdf ');" target = "_blank"> HTMLDOC can see here \u0026lt;/ a> (164KB). \u0026lt;/ p> \u0026lt;h1 style = "text-align: center;"> \u0026lt;em> Then, finally, the idea ’ \u0026lt;/ em> \u0026lt;/ h1> Thu, think about it, html2ps dompdf, htmldoc , what do they do? Texturing, interpret the HTML page and convert it to pdf. \u0026lt;/ P> The first two texturing each tag, taking into account the CSS (for html2ps with excellent results, I do not know with dompdf) and create the & # 8217; pdf output. \u0026lt;/ p> htmldoc ignores css (css was developed when they were still in trees). \u0026lt;/ p> \u0026lt;p> ’ What is the software that interprets \u0026lt;em> \u0026lt;/ em> the best html ’ \u0026lt;/ p> \u0026lt;strong> Obviously a browser. \u0026lt; / strong> \u0026lt;/ p> The browser is made to grind all the sauces in html, css or not .. then obviously c ’ is who does it better (firefox!) and who .. well I know. \u0026lt;/ p> We then tried to propose this idea \u0026lt;a href = "http://forums.mozillazine.org/viewtopic.php?f=27& t = 661813 & ; amp; start = 0 st = 0 & & & sk = t sd = a " title = "Mozilla Forum '/ outbound / article / forums.mozillazine.org');" target = "_self"> on the mozilla forum \u0026lt;/ a>, and we came to the aid torisugari \u0026lt;a href = "http://torisugari.googlepages.com/commandlineprint2" title = "Torisugari command line print" / outbound / article / torisugari.googlepages.com '); "target =" _self "> developing an extension ’ \u0026lt;/ a> to print \u0026lt;em> \u0026lt;/ em> with firefox from command-line .. and what to say, it works beautifully! \u0026lt;/ p> However, I am a little reluctant to use a browser as a tool in this regard .. firefox is not specifically done for this effect, we can say that this work represents 5% of the structure of firefox .. In short, there are too many variables, too many things that could go wrong (yes, unfortunately sometimes firefox crashes …) that keep me quiet in ’ use it in production environment. \u0026lt;/ p> ; \u0026lt;p> However, after several tests and several adjustments, finally works: Conversion time 2-5 seconds, quality? \u0026lt;A href = "http://www.kgo.it/blog/wp-content/uploads/2009/05/prova-con-firefox.pdf" title = "Example of conversion from pdf to html with firefox" ; / downloads/blog/wp-content/uploads/2009/05/prova-con-firefox.pdf '); "target =" _blank "> You be the judge. \u0026lt;/ A> (382kb) \u0026lt;a href="http://www.kgo.it/blog/wp-content/uploads/2009/05/prova-con-firefox.pdf"> \u0026lt;br /> \u0026lt;/ a> \u0026lt;/ p> So, the solution is firefox? Apparently not (or at least until we can find a way to isolate the \u0026lt;em> renderer \u0026lt;/ em> html firefox and use it separately from the browser), because Google we stumbled on this article: \u0026lt;a href = " ; http://leuksman.com/log/2008/03/27/html-to-pdf-why-so-hard/ "title =" HTML to PDF, why so hard? "/ outbound / article / leuksman.com '); "target =" _blank "> HTML to PDF, why so hard? \u0026lt;/ a> where we found the final comments wkhtmltopdf. \u0026lt;/ p> \u0026lt;h2> \u0026lt; to href = "http://code.google.com/p/wkhtmltopdf/" title = "Converting HTML to PDF with webkit '/ outbound / article / code.google.com');" target = "_blank"> WKHTMLTOPDF \u0026lt;/ a> \u0026lt;/ h2> Thu, And here we are, finally, to what likely will replace htmldoc in our server: use webkit (the \u0026lt;em> renderer \u0026lt;/ em> used by Safari, Gecko is basically what Firefox), conversion times similar to those of Firefox (actually slightly lower), and optimal quality: \u0026lt;A href = "http://www.kgo.it/blog/wp-content/uploads/2009/05/prova-con-wkhtmltopdf.pdf" title = "Example of conversion from HTML to PDF with wkhtmltopdf" ; / downloads/blog/wp-content/uploads/2009/05/prova-con-wkhtmltopdf.pdf '); "target =" _blank "> try it \u0026lt;/ a> (207Kb) \u0026lt;a href="http://www.kgo.it/blog/wp-content/uploads/2009/05/prova-con-wkhtmltopdf.pdf"> \u0026lt;br /> \u0026lt;/ a> \u0026lt;/ p> The only drawback: no THEAD support (who supports Firefox). \u0026lt;/ p> \u0026lt;h2> Conclusions \u0026lt;/ h2> \u0026lt;strong> html2ps \u0026lt;/ strong> is definitely a good choice, provided, however, that if they do use a fairly limited \u0026lt;em> \u0026lt;/ em> (it's really too slow and takes too many resources). \u0026lt;strong> Becomes the only alternative ’ \u0026lt;/ strong> for web applications that reside on low-cost hosting \u0026lt;strong> \u0026lt;/ strong> (but allow the change, even temporarily with ini_set (), parameters of php .. therefore \u0026lt;strong> safe_mode off \u0026lt;/ strong>) that \u0026lt;strong> not allow you to install software on the server \u0026lt;/ strong>. \u0026lt;/ p> \u0026lt;strong> HTMLDOC \u0026lt;/ strong> without doubt the most stable and reliable \u0026lt;strong> \u0026lt;/ strong> (I can only say it because it ’ s only one that uses more than two years and never had problems), the quality of the PDF is very good .. but its complete incompatibility with \u0026lt;strong> CSS \u0026lt;/ strong> is a problem that, in my opinion, let the project die slowly (if not already) \u0026lt;/ p> \u0026lt;strong> Firefox \u0026lt;/ strong> extension Command Line Print and \u0026lt;strong> WKHTMLTOPDF \u0026lt;/ strong> if the play almost at par, \u0026lt;strong> both fast and with excellent quality and compatibility \u0026lt;/ strong> … WKHTMLTOPDF is certainly easier to install \u0026lt;strong> \u0026lt; , / strong>, and being a project \u0026lt;strong> recently very active \u0026lt;/ strong>; Firefox ’ d \u0026lt;strong> other hand can count on a whole community of developers ’ \u0026lt;/ strong> and, perhaps more hope than a certainty, sooner or then something will be done more usable \u0026lt;em> \u0026lt;/ em> in this regard. \u0026lt;/ p> style="font-size: 10px;"> \u0026lt;a href = "http : / / posterous.com "> Posted by email \u0026lt;/ a> from \u0026lt;a href="http://kgo.posterous.com/convertire-file-html-in-pdf-automaticamente-o"> KGO \u0026lt;/ a> \u0026lt;/ P>

0 comments:

Post a Comment