{"id":130,"date":"2021-11-25T23:00:00","date_gmt":"2021-11-26T03:00:00","guid":{"rendered":"https:\/\/gerardomiranda.dev\/blog\/?p=130"},"modified":"2021-11-25T22:53:09","modified_gmt":"2021-11-26T02:53:09","slug":"manejando-grandes-cantidades-de-datos-en-ruby-on-rails","status":"publish","type":"post","link":"https:\/\/gerardomiranda.dev\/blog\/es\/manejando-grandes-cantidades-de-datos-en-ruby-on-rails\/","title":{"rendered":"Manejando grandes cantidades de datos en Ruby on Rails"},"content":{"rendered":"\n<p>El otro d\u00eda ten\u00eda una tarea para la que necesitaba generar 2,000,000 de registros en la base de datos antes de poder trabajar en ella. Quien hab\u00eda creado la tarea tuvo la amabilidad de dejarme un script para generar los datos. No le prest\u00e9 demasiada atenci\u00f3n y s\u00f3lo lo dej\u00e9 corriendo mientras completaba otras tareas. Mis otras tareas me llevaron todo el d\u00eda para completar y cuando fui a revisar el progreso del script me di cuenta que no iba ni por la mitad.<\/p>\n\n\n\n<p>Evidentemente el script no era eficiente en lo absoluto, asi que lo detuve y lo revis\u00e9. Lo primero que not\u00e9 es que estaba generando las instancias una por una, que dada la naturaleza de los objetos que estaba creando era inevitable. La verdadera causa de la ineficiencia era c\u00f3mo estaba guardando los registros en la base de datos, directamente con ActiveRecord.save.<\/p>\n\n\n\n<p>S\u00e9 de primera mano que el m\u00e9todo para guardar de ActiveRecord es muy poco eficiente para grandes cantidades de datos, pero no ten\u00eda idea de cu\u00e1nto. As\u00ed que despu\u00e9s de cambiar el script para utilizar la librer\u00eda <a href=\"https:\/\/rubygems.org\/gems\/activerecord-import\" target=\"_blank\" rel=\"noreferrer noopener\">activerecord-import<\/a>, tuve curiosidad de qu\u00e9 tan grande era la diferencia rendimiento con el ActiveRecord.save.<\/p>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h4 class=\"wp-block-heading\">Preparando las pruebas<\/h4>\n\n\n\n<p>Para eso primero necesitamos un modelo, lo mantendremos bastante simple porque no es necesario complicarnos dem\u00e1s.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-1\" data-shcb-language-name=\"Ruby\" data-shcb-language-slug=\"ruby\"><span><code class=\"hljs language-ruby\">create_table <span class=\"hljs-symbol\">:dummy_models<\/span> <span class=\"hljs-keyword\">do<\/span> <span class=\"hljs-params\">|t|<\/span>\n   t.string <span class=\"hljs-symbol\">:dummy_text1<\/span>\n   t.string <span class=\"hljs-symbol\">:dummy_text2<\/span>\n   t.string <span class=\"hljs-symbol\">:dummy_text3<\/span>\n\n   t.timestamps\n <span class=\"hljs-keyword\">end<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-1\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Ruby<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">ruby<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Lo siguiente ser\u00e1 el m\u00e9todo para generar los registros. Para generar el texto utilizaremos la gema <a href=\"https:\/\/rubygems.org\/gems\/faker\/versions\/1.6.6\" target=\"_blank\" rel=\"noreferrer noopener\">Faker<\/a>. Para respetar el problema original en todos los casos vamos a generar los registros uno por uno.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-2\" data-shcb-language-name=\"Ruby\" data-shcb-language-slug=\"ruby\"><span><code class=\"hljs language-ruby\"><span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">self<\/span>.<span class=\"hljs-title\">generate_record<\/span><\/span>\n    DummyModel.new(<span class=\"hljs-symbol\">dummy_text1:<\/span> Faker::Lorem.word, <span class=\"hljs-symbol\">dummy_text2:<\/span> Faker::Lorem.word, <span class=\"hljs-symbol\">dummy_text3:<\/span> Faker::Lorem.word)\n<span class=\"hljs-keyword\">end<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-2\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Ruby<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">ruby<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Ahora vamos con los m\u00e9todos a comparar, primero utilizando ActiveRecord.save<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-3\" data-shcb-language-name=\"Ruby\" data-shcb-language-slug=\"ruby\"><span><code class=\"hljs language-ruby\"><span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">self<\/span>.<span class=\"hljs-title\">with_active_record_create<\/span><span class=\"hljs-params\">(size)<\/span><\/span>\n    (<span class=\"hljs-number\">1<\/span>..size).each <span class=\"hljs-keyword\">do<\/span> <span class=\"hljs-params\">|i|<\/span>\n        record = generate_record\n        record.save!\n    <span class=\"hljs-keyword\">end<\/span> \n<span class=\"hljs-keyword\">end<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-3\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Ruby<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">ruby<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Seguimos con utilzando activerecord-import<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-4\" data-shcb-language-name=\"Ruby\" data-shcb-language-slug=\"ruby\"><span><code class=\"hljs language-ruby\"><span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">self<\/span>.<span class=\"hljs-title\">with_active_record_import<\/span><span class=\"hljs-params\">(size)<\/span><\/span>\n    records = &#91;]\n    (<span class=\"hljs-number\">1<\/span>..size).each <span class=\"hljs-keyword\">do<\/span> <span class=\"hljs-params\">|i|<\/span>\n        records &lt;&lt; generate_record\n    <span class=\"hljs-keyword\">end<\/span>\n    DummyModel.import(records)\n<span class=\"hljs-keyword\">end<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-4\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Ruby<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">ruby<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Yendo un poco mas lejos, utilizaremos un par de opciones que nos permite la librer\u00eda, primero para saltarnos las validaciones y adicionalmente para pasar las columnas en lugar de un objeto ActiveRecord. Para esto \u00faltimo primero generaremos el ActiveRecord correspondiente y extraeremos sus valores, para ver si la diferencia en rendimiento lo justifica.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-5\" data-shcb-language-name=\"Ruby\" data-shcb-language-slug=\"ruby\"><span><code class=\"hljs language-ruby\"><span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">self<\/span>.<span class=\"hljs-title\">with_active_record_import_without_validations<\/span><span class=\"hljs-params\">(size)<\/span><\/span>\n    records = &#91;]\n    (<span class=\"hljs-number\">1<\/span>..size).each <span class=\"hljs-keyword\">do<\/span> <span class=\"hljs-params\">|i|<\/span>\n        records &lt;&lt; generate_record\n    <span class=\"hljs-keyword\">end<\/span>\n    DummyModel.import(records, <span class=\"hljs-symbol\">validate:<\/span> <span class=\"hljs-literal\">false<\/span>)\n<span class=\"hljs-keyword\">end<\/span>\n    \n<span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">self<\/span>.<span class=\"hljs-title\">with_active_record_import_columns_without_validations<\/span><span class=\"hljs-params\">(size)<\/span><\/span>\n    values = &#91;]\n    columns = &#91;<span class=\"hljs-string\">'dummy_text1'<\/span>, <span class=\"hljs-string\">'dummy_text2'<\/span>, <span class=\"hljs-string\">'dummy_text3'<\/span>]\n    (<span class=\"hljs-number\">1<\/span>..size).each <span class=\"hljs-keyword\">do<\/span> <span class=\"hljs-params\">|i|<\/span>\n        record = generate_record\n        values &lt;&lt; &#91;record.dummy_text1, record.dummy_text2, record.dummy_text3]\n    <span class=\"hljs-keyword\">end<\/span>\n    DummyModel.import(columns, values, <span class=\"hljs-symbol\">validate:<\/span> <span class=\"hljs-literal\">false<\/span>)\n<span class=\"hljs-keyword\">end<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-5\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Ruby<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">ruby<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h4 class=\"wp-block-heading\">Comparando los tiempos<\/h4>\n\n\n\n<p>Listo, lo \u00fanico que queda es utilizar benchmarking para ver las diferencias. Comenzaremos pasando solo 1,000 registros.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-6\" data-shcb-language-name=\"CSS\" data-shcb-language-slug=\"css\"><span><code class=\"hljs language-css\"><span class=\"hljs-selector-tag\">With<\/span> <span class=\"hljs-selector-tag\">ActiveRecord<\/span> <span class=\"hljs-selector-tag\">save<\/span>\n  2<span class=\"hljs-selector-class\">.444752<\/span>   0<span class=\"hljs-selector-class\">.297301<\/span>   2<span class=\"hljs-selector-class\">.742053<\/span> (  5<span class=\"hljs-selector-class\">.692380<\/span>)\n<span class=\"hljs-selector-tag\">With<\/span> <span class=\"hljs-selector-tag\">ActiveRecord<\/span> <span class=\"hljs-selector-tag\">import<\/span>\n  0<span class=\"hljs-selector-class\">.260530<\/span>   0<span class=\"hljs-selector-class\">.028553<\/span>   0<span class=\"hljs-selector-class\">.289083<\/span> (  0<span class=\"hljs-selector-class\">.325896<\/span>)\n<span class=\"hljs-selector-tag\">With<\/span> <span class=\"hljs-selector-tag\">ActiveRecord<\/span> <span class=\"hljs-selector-tag\">import<\/span> <span class=\"hljs-selector-tag\">without<\/span> <span class=\"hljs-selector-tag\">validations<\/span>\n  0<span class=\"hljs-selector-class\">.250810<\/span>   0<span class=\"hljs-selector-class\">.024476<\/span>   0<span class=\"hljs-selector-class\">.275286<\/span> (  0<span class=\"hljs-selector-class\">.289907<\/span>)\n<span class=\"hljs-selector-tag\">With<\/span> <span class=\"hljs-selector-tag\">ActiveRecord<\/span> <span class=\"hljs-selector-tag\">using<\/span> <span class=\"hljs-selector-tag\">columns<\/span> <span class=\"hljs-selector-tag\">without<\/span> <span class=\"hljs-selector-tag\">validations<\/span>\n  0<span class=\"hljs-selector-class\">.223519<\/span>   0<span class=\"hljs-selector-class\">.028434<\/span>   0<span class=\"hljs-selector-class\">.251953<\/span> (  0<span class=\"hljs-selector-class\">.267757<\/span>)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-6\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">CSS<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">css<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Ya con esta cantidad de datos, la diferencia es abismal entre ActiveRecord.save y los m\u00e9todos de activerecord-import, con mas de 5 segundos de diferencia. A\u00fan as\u00ed creo que con cantidades de esta magnitud todav\u00eda se puede utilizar ActiveRecord.save sin mucha preocupaci\u00f3n.<\/p>\n\n\n\n<p>Entre los diferentes m\u00e9todos de activerecord-import casi no se puede apreciar diferencia. Vamos a intentar subiendo un poco la vara con 10,000 registros.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-7\" data-shcb-language-name=\"CSS\" data-shcb-language-slug=\"css\"><span><code class=\"hljs language-css\"><span class=\"hljs-selector-tag\">With<\/span> <span class=\"hljs-selector-tag\">ActiveRecord<\/span> <span class=\"hljs-selector-tag\">save<\/span>\n 19<span class=\"hljs-selector-class\">.409276<\/span>   1<span class=\"hljs-selector-class\">.963945<\/span>  21<span class=\"hljs-selector-class\">.373221<\/span> ( 47<span class=\"hljs-selector-class\">.560994<\/span>)\n<span class=\"hljs-selector-tag\">With<\/span> <span class=\"hljs-selector-tag\">ActiveRecord<\/span> <span class=\"hljs-selector-tag\">import<\/span>\n  2<span class=\"hljs-selector-class\">.865762<\/span>   0<span class=\"hljs-selector-class\">.203885<\/span>   3<span class=\"hljs-selector-class\">.069647<\/span> (  3<span class=\"hljs-selector-class\">.313518<\/span>)\n<span class=\"hljs-selector-tag\">With<\/span> <span class=\"hljs-selector-tag\">ActiveRecord<\/span> <span class=\"hljs-selector-tag\">import<\/span> <span class=\"hljs-selector-tag\">without<\/span> <span class=\"hljs-selector-tag\">validations<\/span>\n  2<span class=\"hljs-selector-class\">.657060<\/span>   0<span class=\"hljs-selector-class\">.195326<\/span>   2<span class=\"hljs-selector-class\">.852386<\/span> (  2<span class=\"hljs-selector-class\">.983243<\/span>)\n<span class=\"hljs-selector-tag\">With<\/span> <span class=\"hljs-selector-tag\">ActiveRecord<\/span> <span class=\"hljs-selector-tag\">using<\/span> <span class=\"hljs-selector-tag\">columns<\/span> <span class=\"hljs-selector-tag\">without<\/span> <span class=\"hljs-selector-tag\">validations<\/span>\n  2<span class=\"hljs-selector-class\">.423606<\/span>   0<span class=\"hljs-selector-class\">.217493<\/span>   2<span class=\"hljs-selector-class\">.641099<\/span> (  2<span class=\"hljs-selector-class\">.779699<\/span>)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-7\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">CSS<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">css<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Bueno con estos tiempos creo que ActiveRecord.save quedar\u00eda descartado para cantidades alrededor de los 10,000 registros. Pero los diferentes m\u00e9todos de activerecord-import a\u00fan se mantienen a poca distancia unos de otros a menos de 1 segundo.<\/p>\n\n\n\n<p>Hagamos trabajar un poco a nuestra PC y generemos 100,000 registros a ver qu\u00e9 sucede.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-8\" data-shcb-language-name=\"CSS\" data-shcb-language-slug=\"css\"><span><code class=\"hljs language-css\"><span class=\"hljs-selector-tag\">With<\/span> <span class=\"hljs-selector-tag\">ActiveRecord<\/span> <span class=\"hljs-selector-tag\">save<\/span>\n207<span class=\"hljs-selector-class\">.721426<\/span>  20<span class=\"hljs-selector-class\">.519692<\/span> 228<span class=\"hljs-selector-class\">.241118<\/span> (503<span class=\"hljs-selector-class\">.432937<\/span>)\n<span class=\"hljs-selector-tag\">With<\/span> <span class=\"hljs-selector-tag\">ActiveRecord<\/span> <span class=\"hljs-selector-tag\">import<\/span>\n 27<span class=\"hljs-selector-class\">.889573<\/span>   2<span class=\"hljs-selector-class\">.115005<\/span>  30<span class=\"hljs-selector-class\">.004578<\/span> ( 32<span class=\"hljs-selector-class\">.025789<\/span>)\n<span class=\"hljs-selector-tag\">With<\/span> <span class=\"hljs-selector-tag\">ActiveRecord<\/span> <span class=\"hljs-selector-tag\">import<\/span> <span class=\"hljs-selector-tag\">without<\/span> <span class=\"hljs-selector-tag\">validations<\/span>\n 27<span class=\"hljs-selector-class\">.754645<\/span>   2<span class=\"hljs-selector-class\">.084204<\/span>  29<span class=\"hljs-selector-class\">.838849<\/span> ( 31<span class=\"hljs-selector-class\">.793397<\/span>)\n<span class=\"hljs-selector-tag\">With<\/span> <span class=\"hljs-selector-tag\">ActiveRecord<\/span> <span class=\"hljs-selector-tag\">using<\/span> <span class=\"hljs-selector-tag\">columns<\/span> <span class=\"hljs-selector-tag\">without<\/span> <span class=\"hljs-selector-tag\">validations<\/span>\n 24<span class=\"hljs-selector-class\">.807959<\/span>   1<span class=\"hljs-selector-class\">.963377<\/span>  26<span class=\"hljs-selector-class\">.771336<\/span> ( 28<span class=\"hljs-selector-class\">.741473<\/span>)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-8\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">CSS<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">css<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Definitivamente ActiveRecord.save queda descartado si quieres guardar 100,000+ registros, ahora entiendo por qu\u00e9 ese script no terminaba de ejecutar. En cuanto a los dem\u00e1s se puede ver que el m\u00e9todo para guardar directamente las columnas sin pasar  un objeto ActiveRecord fue mas r\u00e1pido que sus contrapartes por al menos 2 segundos. Esto no es nada del otro mundo pero hay que tomar en cuenta que primero creamos los objetos ActiveRecord y luego extrajimos los datos. Asi que en caso de que manejes millones de registros y si puedes no crear un objeto ActiveRecord antes, puede valer la pena optar por esta \u00faltima opci\u00f3n.<\/p>\n\n\n\n<p>Estas comparaciones me muestran que definitivamente cuando me encuentre desarrollando en Ruby on Rails, tengo que tener muy en cuenta la cantidad de datos con la que voy a trabajar. No sea que uno de estos d\u00edas uno de esos scripts se vaya a producci\u00f3n y la aplicaci\u00f3n se muera.<\/p>\n\n\n\n<p>El c\u00f3digo est\u00e1 disponible para que puedas verlo <a href=\"https:\/\/github.com\/gerardo-m\/rails_test_lab\/blob\/master\/lib\/generating_data.rb\" target=\"_blank\" rel=\"noreferrer noopener\">aqu\u00ed<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>S\u00e9 de primera mano que el m\u00e9todo para guardar de ActiveRecord es muy poco eficiente para grandes cantidades de datos, pero no ten\u00eda idea de cu\u00e1nto.<\/p>\n","protected":false},"author":1,"featured_media":165,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[28],"tags":[77,76,35],"class_list":["post-130","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-aprendizaje","tag-data","tag-ruby-on-rails","tag-web"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/gerardomiranda.dev\/blog\/wp-json\/wp\/v2\/posts\/130","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/gerardomiranda.dev\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/gerardomiranda.dev\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/gerardomiranda.dev\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/gerardomiranda.dev\/blog\/wp-json\/wp\/v2\/comments?post=130"}],"version-history":[{"count":22,"href":"https:\/\/gerardomiranda.dev\/blog\/wp-json\/wp\/v2\/posts\/130\/revisions"}],"predecessor-version":[{"id":167,"href":"https:\/\/gerardomiranda.dev\/blog\/wp-json\/wp\/v2\/posts\/130\/revisions\/167"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/gerardomiranda.dev\/blog\/wp-json\/wp\/v2\/media\/165"}],"wp:attachment":[{"href":"https:\/\/gerardomiranda.dev\/blog\/wp-json\/wp\/v2\/media?parent=130"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/gerardomiranda.dev\/blog\/wp-json\/wp\/v2\/categories?post=130"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/gerardomiranda.dev\/blog\/wp-json\/wp\/v2\/tags?post=130"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}