tag:blogger.com,1999:blog-11259842385633628842024-03-08T19:49:23.055-06:00Resources: SQL ServerAn infrequently-updated repository of solutions.ISO30http://www.blogger.com/profile/01999689281636743595noreply@blogger.comBlogger1125tag:blogger.com,1999:blog-1125984238563362884.post-11420999200144469892010-10-25T18:27:00.000-05:002010-12-30T10:07:49.858-06:00Remove non-printable / Unicode characters in SQL Server 2005<span style="font-family: Arial, Helvetica, sans-serif;">A few months ago, I was upgrading some report templates from the older version of Excel (*.xls) to Excel 2007 (*.xlsx). I ran into numerous problems almost immediately when I attempted to generate the upgraded reports because the incoming data was riddled with charaters that don't play nicely with XML. The data is used for a variety of reporting purposes, so I decided to tackle the problem on the back-end by removing all but the <a href="http://en.wikipedia.org/wiki/ASCII#ASCII_printable_characters">printable ascii characters</a>.</span><br />
<br />
<span style="font-family: Arial, Helvetica, sans-serif;">I started by writing a simple user function for individual strings, but I got to thinking that I may want to automate some of these cleanup tasks and ended up putting something together that allows for a bit more the flexibility. The following creates the basic string user function, along with two procedures to perform the cleanup at the column and table level:</span><br />
<br />
<div></div><br />
<div></div><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><strong>Function: fn_npclean_string</strong></span><br />
<div style="background-color:#eeeeee;padding:10px;"><pre>use [master]
go
set ansi_nulls on
go
set quoted_identifier on
go
CREATE function [dbo].[fn_npclean_string] (
@strIn as varchar(1000)
)
returns varchar(1000)
as
begin
declare @iPtr as int
set @iPtr = patindex('%[^ -~0-9A-Z]%', @strIn COLLATE LATIN1_GENERAL_BIN)
while @iPtr > 0 begin
set @strIn = replace(@strIn COLLATE LATIN1_GENERAL_BIN, substring(@strIn, @iPtr, 1), '')
set @iPtr = patindex('%[^ -~0-9A-Z]%', @strIn COLLATE LATIN1_GENERAL_BIN)
end
return @strIn
end</pre></div><br />
<br />
<span style="font-family: Arial, Helvetica, sans-serif;"><strong>Procedure: sp_npclean_col</strong></span><br />
<div style="background-color:#eeeeee;padding:10px;"><pre>use [master]
go
set ansi_nulls on
go
set quoted_identifier on
go
CREATE procedure [dbo].[sp_npclean_col]
@DatabaseName varchar(75) = null,
@SchemaName varchar(75) = null,
@TableName varchar(75),
@ColumnName varchar(75)
as
begin
Declare @FullTableName varchar(100)
declare @UpdateSQL nvarchar(1000)
if @DatabaseName is null begin
set @DatabaseName = db_name()
end
if @SchemaName is null begin
set @SchemaName = schema_name()
end
set @FullTableName = '[' + @DatabaseName + '].[' + @SchemaName + '].[' + @TableName + ']'
set @UpdateSQL = 'update ' + @FullTableName + ' set [' + @ColumnName + '] = dbo.fn_npclean_string([' + @ColumnName + ']) where [' + @ColumnName + '] like ''%[^ -~0-9A-Z]%'''
exec sp_ExecuteSQL @UpdateSQL
end
</pre></div><br />
<br />
<span style="font-family: Arial;"><strong>Procedure: sp_npclean_table</strong></span><br />
<div style="background-color:#eeeeee;padding:10px;"><pre>use [master]
go
set ansi_nulls on
go
set quoted_identifier on
go
create procedure [dbo].[sp_npclean_table]
@TargetDatabase varchar(75) = null,
@TargetSchema varchar(75) = null,
@TargetTable varchar(75)
as
begin
declare @getColSQL nvarchar(750)
declare @textCol CURSOR
declare @curCol varchar(75)
if @TargetDatabase is null begin
set @TargetDatabase = db_name()
end
if @TargetSchema is null begin
set @TargetSchema = schema_name()
end
set @getColSQL =
'select sc.name
from ' + @TargetDatabase + '.sys.columns sc
join ' + @TargetDatabase + '.sys.types st
on sc.system_type_id = st.system_type_id
join ' + @TargetDatabase + '.sys.objects so
on sc.object_id = so.object_id
join ' + @TargetDatabase + '.sys.schemas ss
on so.schema_id = ss.schema_id
where
so.type = ''U''
and st.name in (''text'',''ntext'',''varchar'',''char'',''nvarchar'',''nchar'')
and sc.is_rowguidcol = 0
and sc.is_identity = 0
and sc.is_computed = 0
and so.name = ''' + @TargetTable + '''
and ss.name = ''' + @TargetSchema + ''''
set @getColSQL = 'set @inCursor = cursor for ' + @getColSQL + ' open @incursor'
execute sp_executesql @getColSQL,N'@inCursor cursor out',@inCursor=@textCol OUT
fetch next from @textCol into @curCol
while @@fetch_status = 0
begin
exec sp_npclean_col @DatabaseName = @TargetDatabase, @SchemaName = @TargetSchema, @TableName = @TargetTable, @ColumnName = @curCol
fetch next from @textCol into @curCol
end
Close @textCol
DeAllocate @textCol
end</pre></div><br />
<span style="font-family: Arial, Helvetica, sans-serif;">Using these, invalid characters can be removed in the following ways:</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br />
</span><br />
<br />
<span style="font-family: Arial, Helvetica, sans-serif;"><strong>By String:</strong></span><br />
<div style="background-color:#eeeeee;padding:10px;"><pre>select master.dbo.fn_npclean_string('Stringğ withħ įņvalidđ charactersŝ')
</pre></div><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><strong>By table column:</strong></span><br />
<div style="background-color:#eeeeee;padding:10px;"><pre>exec master.dbo.sp_npclean_col [@DatabaseName = 'MyDatabaseName',] [@SchemaName = 'MySchemaName',] @TableName = 'MyTableName', @ColumnName = 'MyColumnName'
</pre></div><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><strong>By table:</strong></span><br />
<div style="background-color:#eeeeee;padding:10px;"><pre>exec master.dbo.sp_npclean_table [@TargetDatabase = 'MyDatabaseName',] [@TargetSchema = 'MySchemaName',] @TargetTable = 'MyTableName'
</pre></div>ISO30http://www.blogger.com/profile/01999689281636743595noreply@blogger.com0