Like any complex software system, SAP Commerce Cloud generates transactional and temporary data. In this article, we outline ways for you to configure data retention and cleanup rules to ensure your data is properly removed and to eliminate any performance impact.
Data Maintenance and Cleanup in Custom Code
Let’s start with a recommendation for your custom extensions:
For example, assume you have a custom job that requires the use of a temporary media item to generate result files:
MediaModel result = createMedia();
convert(batch, result);
publish(result);
If you use the approach outlined above, you will accumulate unused media that:
-
- Increase the size of the media table
-
- Increase the number of files/blobs in your media storage
Both types of behavior will decrease the performance of your system over time. Instead of leaving the temporary media items in the system, remove them after you are done with the processing:
MediaModel result = createMedia();
try {
convert(batch, result);
publish(result);
} finally {
try {
modelService.remove(result);
} catch (Exception e) {
LOG.debug("Removal of temporary media failed", e);
}
}
The above pattern can be applied to any temporary resource, like temporary files on the file system or other items in SAP Commerce Cloud.
Personal Data Retention in Custom Code
Another topic to consider is the data retention of personal information to comply with data protection regulations, such as the General Data Protection Regulation (GDPR) of the European Union.
If you introduce new custom item types that contain personal data that is linked to a customer and if the data retention period is over or the customer wishes to delete his or her personal data, you need to ensure you clean up the data properly.
SAP Commerce Cloud provides hooks to add custom cleanup logic to the personal data retention framework:
-
orderCleanupHooks
-
customerCleanupHooks
For details how to implement and configure such a hook, refer to Personal Data Erasure, as well as Deletion. These articles provide an overview of all data retention rules available out-of-the-box.
Data Maintenance Setup
Now that we have covered recommendations regarding custom code, let’s have a look at the platform and how you should configure data maintenance and cleanup for SAP Commerce Cloud.
This will be split into three parts:
-
- Generic Audit – generated by the Generic Audit feature of SAP Commerce Cloud
-
- Technical Data – generated by jobs, and more
-
- Transactional Data – generated by business logic or by your storefront
The platform provides two ways to clean up unused data:
-
- Maintenance Framework – the old way, requires implementation effort
-
- Data Retention Framework – new and improved, covers most use cases just with configuration
For the rest of this article, the focus will be on the Data Retention Framework, unless otherwise stated.
Generic Audit
The Generic Audit feature was introduced in version 6.6 and is enabled by default.
This feature tracks every change to a type and stores a before and after snapshot of the data in an audit table for the type. It can generate a lot of data very quickly. However, this will degrade the performance of your database and solution.
The default configuration enables the Generic Audit feature for a wide range of types. Therefore, we recommend that you review the default settings carefully and consider disabling any type you don’t need (for example, requirements for audit logging on Products
).
You can enable/disable the auditing of a type in your properties. For example, if you wish to disable audit of changes to Product types, you can set the following in your local*.properties file(s):
audit.product.enabled=false
Tip
Completely disable audit logging for local development (and maybe also on your Continuous Integration server) to speed up platform initialization and test run time by setting the following in your local.properties and local-dev.properties files:
auditing.enabled=false
You should also consider using the Change Log Filtering feature.
This feature allows you to conditionally include or exclude data from audit logging to reduce the amount of data generated and stored in the audit logging tables.
Technical Data
This section covers the technical data that the platform accumulates over time and how to properly clean it up.
The platform ships with the cleanup capabilities. However, some of these capabilities require additional configuration to enable. The usual areas of consideration for this category are:
-
- Cronjobs
-
- Cronjob Logs
-
- Cronjob History
-
- ImpEx Media
-
- Saved Values
-
- Stored HTTP Sessions
-
- Workflows
-
- Distributed ImpEx
All sample configuration in this section is provided on best-effort basis. Make sure to verify and to adapt it to your project!
Cronjobs
Over time, many cronjob instances will accumulate in your SAP Commerce Cloud database.
The most frequent jobs are:
-
- ImpEx Imports/Exports
-
- Catalog Synchronization
-
- Solr Jobs
To clean those up, you can easily configure a retention job with an ImpEx script like the following:
$twoWeeks = 1209600
INSERT_UPDATE FlexibleSearchRetentionRule;code[unique=true];searchQuery;retentionTimeSeconds;actionReference;
; cronjobCleanupRule;"select {c:pk}, {c:itemType}
from {CronJob as c join ComposedType as t on {c:itemtype} = {t:pk} left join Trigger as trg on {trg:cronjob} = {c:pk} }
where
{trg:pk} is null and
{c:code} like '00______%' and
{t:code} in ( 'ImpExImportCronJob', 'CatalogVersionSyncCronJob', 'SolrIndexerCronJob' ) and
{c:endTime} < ?CALC_RETIREMENT_TIME"; $twoWeeks; basicRemoveCleanupAction;
INSERT_UPDATE RetentionJob;code[unique=true];retentionRule(code);batchSize
; cronjobRetentionJob; cronjobCleanupRule; 1000
INSERT_UPDATE CronJob;code[unique=true];job(code);sessionLanguage(isoCode)[default=en]
; cronjobRetentionCronJob; cronjobRetentionJob;
INSERT_UPDATE Trigger; cronJob(code)[unique = true] ; cronExpression
; cronjobRetentionCronJob ; 0 0 0 * * ?
A few notes regarding the above configuration:
-
- It aggressively cleans up all jobs older than two weeks, regardless of the cronjob result
-
- The
code
‘where’ clause restricts it to auto-generated jobs
- The
-
- It only targets unscheduled cronjobs (= cronjobs without a trigger)
-
- Cleaning up is done once per day, at midnight
The good part about using retention rules is that they are easily configurable, as shown in the example above.
An alternative way for cleaning up cronjobs would be the CleanupCronJobStrategy for the legacy Maintenance Framework. However, that strategy requires customization if you want to change which cronjobs it processes.
Cronjob Logs
To actually clean up old cronjob log files as described in CronJob Logs Clean-up, ensure that you configure a cronjob and a trigger to delete the logs.
The platform does not clean up old log files out-of-the-box!
The following is a sample ImpEx script which can be used to generate and run a cleanup job:
INSERT_UPDATE CronJob;code[unique=true];job(code);sessionLanguage(isoCode)[default=en]
; cronjobLogCleanupCronjob; cleanUpLogsJobPerformable;
INSERT_UPDATE Trigger; cronJob(code)[unique = true];cronExpression
# every hour
; cronjobLogCleanupCronjob ; 0 0 0/1 * * ?
If you have cronjobs that run very frequently (for example, every few minutes), you should schedule the log file cleanup even more frequently. Running the cleanup more frequently avoids building up too many log files that need to be deleted.
Cronjob Histories
SAP Commerce Cloud uses Cronjob Histories to track the progress of cronjobs. Similar to cronjob logs, they can accumulate quickly for frequently running jobs.
Starting with SAP Commerce 2005, the platform includes a cleanup cronjob for Cronjob Histories (documentation).
In case you are on an older patch release and cannot upgrade, please refer to the SAP Knowledge Base Note 2848601 for the impex file that sets up the cleanup job.
Cleaning up Cronjob histories is critically important for the performance.
Make sure that the cleanup job is enabled and active!
ImpexMedia
Every ImpEx import or export generates at least one ImpexMedia. These media stay in the system, the platform does not delete them when it deletes the ImpEx jobs they belonged to (ImpEx media can potentially be re-used for other ImpEx jobs, but that’s rarely ever the case). To set up a retention job for Media, use the following sample ImpEx script:
$twoWeeks = 1209600
INSERT_UPDATE FlexibleSearchRetentionRule;code[unique=true];searchQuery;retentionTimeSeconds;actionReference;
;impexMediaCleanupRule;"select {i:pk}, {i:itemtype}
from {ImpexMedia as i}
where
{i:code} like '00______' and
{i:modifiedTime} < ?CALC_RETIREMENT_TIME"; $twoWeeks; basicRemoveCleanupAction;
INSERT_UPDATE RetentionJob;code[unique=true];retentionRule(code);batchSize
; impexMediaCleanupJob; impexMediaCleanupRule; 1000
INSERT_UPDATE CronJob;code[unique=true];job(code);sessionLanguage(isoCode)[default=en]
; impexMediaCleanupCronJob; impexMediaCleanupJob;
INSERT_UPDATE Trigger; cronJob(code)[unique = true] ; cronExpression
# every day at midnight
; impexMediaCleanupCronJob ; 0 0 0 * * ?
The retention time should be the same as for the cronjob cleanup.
Saved Values
The Backoffice uses Saved Values to track item changes by business users (the “Last Changes” in the Administration tab). We recommend to keep as few history entries as possible. You can configure the number of entries per item through a property:
# Specifies the number of entries displayed in the "Last Changes" field on the "Administration" tab
# The very last change (if available) is always displayed,
# even if this property is set to 0 (zero).
hmc.storing.modifiedvalues.size=0
If your project runs for a long time and/or was upgraded from multiple previous releases, you most likely have accumulated millions of SavedValues
and SavedValueEntry
records in the database. Additionally, each of those entries also generate multiple rows in the props
table which slows down the overall system even further.
If you have access to the database, to quickly delete all entries and free up considerable space in the database, run the following SQL script directly in the database while SAP Commerce Cloud is offline:
TRUNCATE TABLE savedvalues;
TRUNCATE TABLE savedvalueentry;
DELETE FROM props WHERE itemtypepk IN (
SELECT pk FROM composedtypes
WHERE internalcode = 'SavedValues'
OR internalcode = 'SavedValueEntry'
);
Stored HTTP Sessions
Out-of-the-box, the HTTP Session Failover mechanism of the platform stores the sessions in the database. To avoid performance degradation, clean them up as soon as they are stale by using something like the following ImpEx script:
$oneDay = 86400
INSERT_UPDATE FlexibleSearchRetentionRule;code[unique=true];searchQuery;retentionTimeSeconds;actionReference;
;storedSessionRule;"select {s:pk}, {s:itemtype}
from {StoredHttpSession as s}
where
{s:modifiedTime} < ?CALC_RETIREMENT_TIME"; $oneDay; basicRemoveCleanupAction;
INSERT_UPDATE RetentionJob;code[unique=true];retentionRule(code);batchSize
; storedSessionCleanupJob; storedSessionRule; 1000
INSERT_UPDATE CronJob;code[unique=true];job(code);sessionLanguage(isoCode)[default=en]
; storedSessionCleanupCronJob; storedSessionCleanupJob;
INSERT_UPDATE Trigger; cronJob(code)[unique = true] ; cronExpression
# every 30 minutes
; storedSessionCleanupCronJob ; 0 0/30 * * * ?
The configuration above uses a conservative retention period of one day. Depending on the traffic on your site, you may want to clean them up even more aggressively.
Workflows
If you use Workflows to coordinate the work between business users, make sure to think about proper retention rules for those too. Workflows are highly specific to your project, which is why we don’t provide a one-size-fits-all solution to clean them up. Some items to consider include:
-
- How long do your processes usually take? When is a process considered abandoned if it is not finished?
-
- How long do you need to keep finished workflows? Do you need them to audit changes?
-
- Do you use Comments in your workflow? How long do you need to keep them?
Based on the answers to these questions, you can set up retention rules that fit your workflows.
ImpEx Distributed Mode
If your project uses ImpEx Distributed Mode to distribute workloads across the cluster, you may want to consider setting up retention rules for the following types:
-
DistributedImportProcess
-
ImportBatch
-
ImportBatchContent
Transactional Data
Now that we have covered most of the data generated by technical processes of SAP Commerce Cloud, let’s look at the data that is generated by your customers when they interact with the storefront. For this kind of data, you also need to consider the regulatory requirements around how long it needs to be stored or when it needs to be deleted. For example, GDPR includes the “right to be forgotten” and you are required to delete any data you have for a person if requested to do so.
SAP Commerce Cloud covers this for Customers and Orders. See Personal Data Erasure for more details.
If your project began before these jobs were available out-of-the-box or if you don’t import project data during the update process, you may need to import the jobs described in the link above into your system.
This leaves us with a few types in the system for which additional configuration is necessary:
-
- Carts
-
- Business Processes
Carts
For carts, there is a cleanup cronjob available, see Removing Old Carts with Cronjob. To enable the cleanup for your site, you need to modify the job and add your BaseSite to the configuration:
$siteUid=customBaseSite
INSERT_UPDATE OldCartRemovalCronJob;code[unique=true];job(code);sites(uid)
;oldCartRemovalCronJob;oldCartRemovalJob;siteUid
The cronjob is provided by commercewebservices
(or the the deprecated ycommercewebservices
) extension. Make sure it is included in your configuration, if you want to use it. Alternatively, you can always configure your own retention rules (one for anonymous carts, one for the carts of registered users).
Business Processes
Most of the operations users do in the Accelerator trigger Business Processes (for example, reset password, place an order, order fulfilment and more). Those processes obviously accumulate over time and need to be removed regularly.
$twoWeeks = 1209600
INSERT_UPDATE FlexibleSearchRetentionRule;code[unique=true];searchQuery;retentionTimeSeconds;actionReference;
;businessProcessRule;"SELECT {p:pk}, {p:itemtype}
FROM {BusinessProcess AS p JOIN ProcessState AS s ON {p:state} = {s:pk} }
WHERE
{s:code} in ('SUCCEEDED') AND
{p:modifiedTime} < ?CALC_RETIREMENT_TIME"; $twoWeeks; basicRemoveCleanupAction;
INSERT_UPDATE RetentionJob;code[unique=true];retentionRule(code);batchSize
; businessProcessCleanupJob; businessProcessRule; 1000
INSERT_UPDATE CronJob;code[unique=true];job(code);sessionLanguage(isoCode)[default=en]
; businessProcessCleanupCronJob; businessProcessCleanupJob;
INSERT_UPDATE Trigger; cronJob(code)[unique = true] ; cronExpression
; businessProcessCleanupCronJob ; 0 0 0 * * ?
This configuration cleans up all succeeded processes older than two weeks. If you have a lot of processes in other states (for example, FAILED or ERROR), you may want to configure a second retention rule for those with a longer retention period. In general, you want to keep the errors around longer for analysis.
If you have customized the business processes, make sure to cleanup any additional data related to them. An out-of-the-box example for this are EmailMessage
s. Those get automatically cleaned up at the end of a business process as long as they were successfully sent. Conversely, they remain in the database if they were not successfully sent.
One-time Clean Up
We have now covered most of the periodic cleanup necessary to ensure the performance of your solution remains high and doesn’t degrade over time. However, is it possible to run a one-time cleanup of the system, for example, before a migration to the cloud?
It doesn’t make sense to configure additional cronjobs for this tasks. To delete data you can:
-
- Execute SQL statements directly.
-
- Execute scripts in the administration console.
-
- Generate ImpEx scripts to remove items.
SQL statements are generally the fastest option but also the most dangerous one. They are executed outside the type system and therefore none of the automated cleanup, delete interceptors, validation and others are performed. Use with caution!
Scripts provide the maximum freedom and cleanup logic, however, they also have two disadvantages:
-
- Every script executed in the administration console runs inside a database transaction by default. Deleting a lot of data may fail because of this.
-
- You need to implement multi-threading if you want to speed up the deletion process.
That’s why generating ImpEx scripts to delete data is usually faster then running a cleanup script:
-
- ImpEx is multi-threaded by default.
-
- You don’t need to worry about transactions.
Here is an example skeleton to generate a cleanup ImpEx script through ImpEx Export:
You can re-use the Flexible Search queries provided for the retention rules. You only need to replace the CALC_RETIREMENT_TIME
query parameter with a date calculation specific to your database.
# Assumption: you don't have any Impex jobs that you explicitly configure / schedule
# -> you can delete everything related to impex
REMOVE ImpexMedia;pk[unique=true]
"#% impex.exportItemsFlexibleSearch(""select {pk} from {ImpexMedia!}"");"
REMOVE ImpExExportMedia;pk[unique=true]
"#% impex.exportItemsFlexibleSearch(""select {pk} from {ImpExExportMedia!}"");"
REMOVE ImpExImportCronJob;pk[unique=true]
"#% impex.exportItemsFlexibleSearch(""select {pk} from {ImpExImportCronJob!}"");"
REMOVE ImpExExportCronJob;pk[unique=true]
"#% impex.exportItemsFlexibleSearch(""select {pk} from {ImpExExportCronJob!}"");"
# unscheduled catalog syncs, with an auto-generated code
REMOVE CatalogVersionSyncCronJob;pk[unique=true];;
"#% impex.exportItemsFlexibleSearch(""select {pk} from {CatalogVersionSyncCronJob! as cj left join trigger as t on {t:cronJob} = {cj:pk} } where {cj:code} like '0000____%' and {t:pk} is null"");"
# solr index cronjobs
REMOVE SolrIndexerCronJob;pk[unique=true]
"#% impex.exportItemsFlexibleSearch(""select {cj:pk} from {SolrIndexerCronJob! as cj left join trigger as t on {t:cronJob} = {cj:pk} } where {cj:code} like '0000____%' and {t:pk} is null"");"
# solr hot update jobs
REMOVE SolrIndexerHotUpdateCronJob;pk[unique=true]
"#% impex.exportItemsFlexibleSearch(""select {pk} from {SolrIndexerHotUpdateCronJob!}"");"
# solr index jobs
REMOVE ServicelayerJob;pk[unique=true]
"#% impex.exportItemsFlexibleSearch(""select {pk},{code} from {ServicelayerJob! as j left join trigger as t on {t:job} = {j:pk} } where ( {j:code} like 'solrIndexerJob_full_%' or {j:code} like 'solrIndexerJob_update_%') and {t:pk} is null"");"
# left-over job logs
REMOVE JobLog; pk[unique=true]
"#% impex.exportItemsFlexibleSearch(""select {pk} from {JobLog} where {cronjob} is null"");"
Perform the following steps to use this script:
-
- Export the items to remove. (Go to HAC: Console -> ImpEx Export)
-
- Open Backoffice, System -> Tools -> Import
-
- “Upload” zip file generated in step 1, click “Create”
-
- Select
importscript.impex
as “Import file in ZIP”
- Select
-
- “Next”
-
- Make sure that “Allow code execution from within the file” is checked
-
- “Start”
You can ignore any ImpEx errors when deleting the data.
Type System and Orphaned Attributes Clean Up
The recommendation about type systems does not apply to SAP Commerce Cloud in the Public Cloud.
Here, type systems and rolling updates are managed for you.
When performing rolling updates it is usual to create new Type Systems, but sometimes, outdated ones are not removed.
You usually only require the Default
and the latest Type Systems , all other should be removed via HAC under Cleanup / Drop Type Systems, or via the droptypesystem
ant task:
ant droptypesystem -DtypeSystemName=USER_DEFINED_TYPE_SYSTEM
As described in Rolling Update on the Cluster documentation.
Additionally, when attributes are removed from an items.xml file and a system update is performed, these attributes are not automatically removed from the type system. These should be carefully removed.
Be very careful when deleting Type Attributes from a production environment, always verify on non production environments first.
Sample Extension
You can find an extension that automatically configures all cleanup cronjobs as described above here:
https://github.com/sap-commerce-tools/sanecleanup
The extension is under the Apache 2 license.
As per license, it is provided as-is and is free to use for any purposes, including commercial projects. For further details please refer to the project’s README.
Conclusion
This article covered various aspects regarding regular data maintenance and cleanup.
While the topic itself isn’t the most glamorous thing to work on when delivering a project, setting it up correctly from the start ensures your SAP Commerce Cloud solution stays healthy and high-performing over time.
In summary:
-
- Make sure you properly delete any temporary data in your custom code as soon as feasible.
-
- Set up proper retention rules for all the data generated by the platform and all business processes your implementation supports.