Discussion:
CFC method memory problem
(too old to reply)
rchinoy
2009-04-02 17:08:48 UTC
Permalink
Hi all,

I use CFCs a lot, so when I was creating a tool to migrate some database
tables, I built the actual data transfer function as a method of a CFC. The
basic code just reads a block of 1000 records from the old db, does some minor
manipulation of the data, and then inserts the records into the new db a record
at a time inside a cfloop. Lather, rinse, repeat, until all of the 500K+
records have been transferred.

I was surprised to find that this leads to Java out of memory errors around
the 200Kth record (given my current JVM memory settings). Having run across a
similar problem when sending broadcast emails from a CFC method, I simply moved
the code out of the CFC method and into the calling cfm page, and it ran
through all of the records without any problems.

The only explanation I can think of for this behavior is that when the code is
inside a CFC method, the query object used to read the blocks of 1000 records
keeps getting reallocated rather than reused, so when it gets to the point of
trying to store 200 blocks of 1000 records in memory, it runs out of room. If
that's the case, it would seem to be a significant bug that needs to be fixed.

Does anyone know if that's what is really going on?

Thanks,
Russ
Ian Skinner
2009-04-02 17:25:18 UTC
Permalink
Post by rchinoy
Does anyone know if that's what is really going on?
I have never heard of this problem, and I doubt you are the first to
move a lot of data around with a CFC.

But could it also be a bug in your code? Where you think the CFC is
using the same variable memory, but is actually creating new ones?
Seeing some example code would have really helped with this.
rchinoy
2009-04-02 17:42:41 UTC
Permalink
To be honest, I hope it IS a problem with my code, but I'm just not seeing it.
Here's the code:

<cffunction name="ImportProducts" returntype="void">

<cfset var OldInfo = "">
<cfset var StartRecID = 0>
<cfset var GetRecsCount = 1000>

<!--- Truncate the new table --->
<cfquery datasource="dsNew">
TRUNCATE TABLE products
</cfquery>

<!--- Load the old info --->
<cfquery name="OldInfo" datasource="dsOld" maxrows="#GetRecsCount#">
SELECT
products.*,
products_description.products_description,
products_description.products_featured,
products_description.products_name,
products_description.products_viewed
FROM
products
LEFT JOIN products_description ON
(products.products_id = products_description.products_id)
WHERE
products.products_id > #StartRecID#
ORDER BY
products.products_id
</cfquery>

<cfloop condition="OldInfo.RecordCount GT 0">

<!--- Keep track of where we left off --->
<cfset StartRecID = OldInfo.products_id[OldInfo.RecordCount]>

<!--- Insert the old info into the new db --->
<cfloop query="OldInfo">

<cfquery datasource="dsNew">
INSERT INTO products(
PROD_ID,
PROD_ModelNumber,
PROD_Name,
PROD_Description,
PROD_ImgURL,
PROD_NewInvQty,
PROD_AvailableAsOf,
PROD_Weight,
PROD_IsActive,
PROD_IsTaxable,
PROD_StatQtySold,
PROD_StatViewedCnt,
PROD_AddedOn,
PROD_AddedByAUSER_ID,
PROD_LastModifiedOn
)
VALUES (
#OldInfo.products_id#,
'#OldInfo.products_model#',
'#Replace(Replace(OldInfo.products_name, "\'", "'", "all"), "\", "",
"all")#',
'#Replace(Replace(OldInfo.products_description, "\'", "'", "all"), "\",
"", "all")#',
'#OldInfo.products_image#',
#OldInfo.products_quantity#,
<cfif Trim(OldInfo.products_date_available) NEQ "">
#CreateODBCDate(OldInfo.products_date_available)#,
<cfelse>
#CreateODBCDate(Now())#,
</cfif>
#OldInfo.products_weight#,
#OldInfo.products_status#,
#OldInfo.products_tax_class_id#,
#OldInfo.products_ordered#,
<cfif Trim(OldInfo.products_viewed) NEQ "">
#OldInfo.products_viewed#,
<cfelse>
0,
</cfif>
#CreateODBCDateTime(OldInfo.products_date_added)#,
1,
<cfif Trim(OldInfo.products_last_modified) NEQ "">
#CreateODBCDateTime(OldInfo.products_last_modified)#
<cfelse>
NULL
</cfif>
)
</cfquery>

</cfloop>

<!--- Load the old info --->
<cfquery name="OldInfo" datasource="dsOld" maxrows="#GetRecsCount#">
SELECT
products.*,
products_description.products_description,
products_description.products_featured,
products_description.products_name,
products_description.products_viewed
FROM
products
LEFT JOIN products_description ON
(products.products_id = products_description.products_id)
WHERE
products.products_id > #StartRecID#
ORDER BY
products.products_id
</cfquery>

</cfloop>

</cffunction>
Ian Skinner
2009-04-02 19:56:48 UTC
Permalink
Well nothing is obvious in the function, accept this only shows one
iteration of 1000 records, so what is the logic to iterate over this
code 200 times?

I suspect that is the likely source of the problem. A var scoped
variable is local to the current instance of a function. It is not
going to be reused from call to call of the function, as far as I know.
But the garbage collection is not going to run while in the middle of
a process intensive request where this function is being run 200 times.

I suspect the solution may be as simple as reasigning the OldInfo to an
empty string at the end of the function though, but I am not sure.

Also there is some strangeness I did not immediately understand in the
code. First, you run the "OldInfo" query twice in the function. Once
at the beginning and once at the end. Secondly, you truncate the new
table every time you iterate over this function. Wouldn't that lead to
just the last 1000 records existing in the new table?
Adam Cameron
2009-04-02 21:43:13 UTC
Permalink
On top of what Ian sid, you should <cfqueryparam> your dynamic values so
your not having to recompile a new query with every single insert you're
doing. This would hurt the DB rather CF though, I would have thought.
It's just poor form and a bit of a performance hit, whichever way one looks
at it, though. There's a chance CF is caching something on its side of
things for each query though.

You should also row-limit your query on the DB end as well as the CF end of
things. Using maxrows just tells CF to "hang-up" after it gets the rows it
needs... the DB will still be getting all the rest of them. Again... this
hits the DB more than it hits CF, but it will slow everything down.

You might be able to improve the performance of your query by filtering the
products table before doing the join:

FROM (
select top 1000 *
from products
where product_id > <cfqueryparam value="#startRedId#"
cfsqltype="cf_sql_integer">
order by product_id
) PRODUCTS

LEFT JOIN [etc]

None of this will make any difference vis-a-vis CFC vs CFM, but it's not
going to hurt ;-)

Is there no way you can do this DB->DB, rather than putting CF in the
middle of things?

If nothing else, I'd try to use a bulk loader rather than individual
queries, if nothing else.

What DB system is it?
--
Adam
Adam Cameron
2009-04-02 21:48:15 UTC
Permalink
Post by Adam Cameron
If nothing else, I'd try to use a bulk loader rather than individual
queries, if nothing else.
BTW, I'm particularly proud of that sentence, btw.
--
Adam
rchinoy
2009-04-02 22:31:36 UTC
Permalink
Actually, the code as shown has a loop in it to keep grabbing 1000 records (see
the line <cfloop condition="OldInfo.RecordCount GT 0">), so the cffunction only
gets called once.

I may try just assigning OldInfo to an empty string to see if that helps, but
I certainly wouldn't expect that to be required. I suppose the real difference
between the cfc-based code and the cfm-based code is one uses a local var and
the other doesn't/can't, but I'm not clear on why that would make a difference.

Regarding the query optimization, while I could certainly improve the
performance of what I'm doing, this tool is only going to be used twice, or
maybe three times max, so there isn't much point in spending the time to
optimize it.
Dan Bracuk
2009-04-03 02:01:57 UTC
Permalink
[q][i]Originally posted by: [b][b]rchinoy[/b][/b][/i]
Regarding the query optimization, while I could certainly improve the
performance of what I'm doing, this tool is only going to be used twice, or
maybe three times max, so there isn't much point in spending the time to
optimize it.[/q]
No point making it a cfc either.
Adam Cameron
2009-04-03 07:02:05 UTC
Permalink
Post by rchinoy
this tool is only going to be used twice, or
maybe three times max, so there isn't much point in spending the time to
optimize it.[/q]
No point making it a cfc either.
Very true.
--
Adam
rchinoy
2009-04-03 12:13:35 UTC
Permalink
Actually, there is a good reason to put the code in a cfc: there are almost 100
tables to migrate, so the code gets rather long and cumbersome to wade through
if it isn't broken up. I could certainly use include files to do the same
thing, but ultimately my question really has nothing to do with this particular
tool. I would really just like to understand why the out of memory issue exists
when the code is in a CFC method versus when it is in a CFM. This issue has
come up enough times now that I'd like to determine if there is a workaround.
Due to the deadline I'm on, I haven't tried simply assigning the query object
to an empty string at the end of each iteration yet, but I will try to do so
soon so I can report on the result.
Adam Cameron
2009-04-03 14:03:50 UTC
Permalink
Post by rchinoy
Actually, there is a good reason to put the code in a cfc: there are almost 100
tables to migrate, so the code gets rather long and cumbersome to wade through
if it isn't broken up.
Fair enough. Youd didn't really tell us that part before, though.

So you have a calling template which calls an equivalent to this function
for 100-odd tables? Yikes. I'll reiterate my question whether this can't
possibly be done DB to DB or via a bulk insert.
Post by rchinoy
thing, but ultimately my question really has nothing to do with this particular
tool. I would really just like to understand why the out of memory issue exists
when the code is in a CFC method versus when it is in a CFM. This issue has
Well it kinda is about this particular tool, because it's this code that's
causing the problems (or something about the calling code, or something
like that). Obviously there's some idiosyncasy of it which gives you
problems when it's in a CFC method as opposed to mainline code, but I would
not immediately say it's a generic problem with CFCs vs CFMs. Just yet.

Is this the first table that's being migrated? Or is it buried somewhere
within the other 100-odd? Is it always this one, or can it happen on any
of them?

Can you inspect the memory usage between each table's function, and see if
there's anything unexpected going on (other than it running out of memory,
which is - in itself - unexpected!).

Do you have 100-odd methods (one for each table) in the CFC, or lots of
individual CFCs?

I guess you could call a GC in between each table's processing, and see if
you can keep check on memory usage that way. Or maybe check memory levels
within that loop you've got, and if it starts flaring up, doa GC then.
It's not advised to call GCs "by hand", but I have found it's helped some
times.

Have you sued CF's server monitor or something like FusionReactor to check
if anything unexpected is consuming RAM?

I think you should try adding <cfqueryparam> tags to your queries: CF might
be cachinng some info about them, and that could be leaking memory or
something. Maybe try <cfobjectcache action = "clear"> between each method
callor something.

I wonder if there's something about the fact you're replacing oldInfo with
a new query within a loop that has a condition based on oldInfo's record
count. Possibly this is forcing CF to maintain the previous oldInfo
queries in the background, so you're not actually overwriting the previous
one with the next one when you requery, you're actually just getting a new
pointer assigned. Do me a favour, and do this:

<cfset iRecCount = OldInfo.RecordCount>
<cfloop condition="iRecCount GT 0">
[...]
<!--- Load the old info --->
<cfquery name="OldInfo" datasource="dsOld" maxrows="#GetRecsCount#">
[...]
</cfquery>
<cfset iRecCount = OldInfo.RecordCount>
</cfloop>

This is a slightly far-fetched notion, but it doesn't hurt to try: it's a
pretty easy change.
Post by rchinoy
come up enough times now that I'd like to determine if there is a workaround.
Due to the deadline I'm on, I haven't tried simply assigning the query object
to an empty string at the end of each iteration yet, but I will try to do so
soon so I can report on the result.
It might pay to apply a test rig to this notion first:

0) on a dev machine without any other activity
1) grab the heap usage
2) load a big query in
3) grab the heap usage (should losely reflect the fact you've just stuck a
chunk of data in RAM)
4) set the query variable to be an empty string
5) grab the heap usage. Check to see if the memory is reclaimed straight
away.

You might need to do a GC between 4+5 for it to take effect. I speculate
that the RAM won't get freed up straight away, and you might be consuing it
faster than the JVM can clear it.
--
Adam
Loading...