我有像'John is my name; Ram is my name; Adam is my name'
的数据。SQL Server:在特定字符后检查大写或小写
我的规则是这样的,每个在;
之后的第一个字母应该是大写字母。
如何选择所有符合规则的值?
我有像'John is my name; Ram is my name; Adam is my name'
的数据。SQL Server:在特定字符后检查大写或小写
我的规则是这样的,每个在;
之后的第一个字母应该是大写字母。
如何选择所有符合规则的值?
的其他答案显示如何将行转换为与您的模式相匹配的内容。
如果你只是想select
符合您所描述的模式的行,你可以使用patindex()
或like
使用区分大小写的排序规则(或使用collate
申请一个)。
这里假定除了规则之外,每个分号后面的字母必须是大写字母,第一个字母也应该是大写字母。如果不是这种情况,只需删除where
中的第一个子句即可。
select *
from t
where patindex('[ABCDEFGHIJKLMNOPQRSTUVWXYZ]%', val collate latin1_general_cs_as) = 1
and patindex('%; [^ABCDEFGHIJKLMNOPQRSTUVWXYZ]%', val collate latin1_general_cs_as) = 0
测试设置:
create table t (id int not null identity(1,1),val varchar(256))
insert into t values
('John is my name; Ram is my name; Adam is my name')
,('john is my name; ram is my name; adam is my name')
rextester演示:http://rextester.com/DBGIS10645
上述两种返回的:
+----+--------------------------------------------------+
| id | val |
+----+--------------------------------------------------+
| 1 | John is my name; Ram is my name; Adam is my name |
+----+--------------------------------------------------+
你可能会与XML招这样
DECLARE @YourString VARCHAR(100)='John is my name; Ram is my name; Adam is my name';
WITH Splitted AS
(
SELECT CAST('<x>' + REPLACE((SELECT REPLACE(@YourString,'; ','$$SplitHere$$') AS [*] FOR XML PATH('')),'$$SplitHere$$','</x><x>')+ '</x>' AS XML) AS Casted
)
,DerivedTable AS
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS PartNr
,x.value(N'text()[1]',N'nvarchar(max)') AS Part
FROM Splitted
CROSS APPLY Casted.nodes(N'/x') AS X(x)
)
SELECT PartNr
,Part
,CASE WHEN ASCII(LEFT(Part,1)) BETWEEN ASCII('A') AND ASCII('Z') THEN 1 ELSE 0 END AS FirstIsCapital
FROM DerivedTable;
Nr Part FirstLetterIsCaptial
----------------------------------------
1 John is my name 1
2 Ram is my name 1
3 Adam is my name 1
我不知道你的最终目标是什么...找份,其中第一个字母,结果这个分裂不是资本?确保你的规则满员?
但是:
最好的是,以此来纠正你的设计,并将这些部件在1:n
相关的边桌。
使用以下源字符串进行测试:''John是我的名字;拉姆是我的名字;亚当是我的名字' –
@BogdanSahlean,那么我会使用'L/RTRIM()'...错误是存储格式...解决这个问题的任何代码将是一个黑客... – Shnugo
'TRIM'是在SQL Server 2017中引入的。如果源字符串是''!John是我的名字;!Ram是我的名字;!Adam是我的名字;'Hola!'? –
丑陋的解决方案的点点,但你可以给一个尝试...
Declare @str nvarchar(max) = 'John is my name; Ram is my name; Adam is my name'
Declare @xml as xml
Set @xml = cast(('<X>'+replace(@str,';' ,'</X><X>')+'</X>') as xml)
Select * from (
Select RowN = Row_Number() over (order by (SELECT NULL)), LTrim(RTrim(N.value('.', 'nvarchar(MAX)'))) as value FROM @xml.nodes('X') as T(N) -- this is to split if you are using sql server 2016 you can use string_Split
) a
Where unicode(substring(a.[value],1,1)) = unicode(upper(substring(a.[value],1,1)))
想法是分割字符串与Unicode值检查,看它是否是上还是不
注意:标准的做法是在应用程序中使用C#/ VB [.Net]执行此操作。
[1]解决方案:
DECLARE @Source NVARCHAR(100) = N'john is my name; Ram is my name; adam is my name'
SELECT z.Sentence
FROM (VALUES (CONVERT(XML, N'<root><i>' + REPLACE(@Source, N';', N'</i><i>;') + N'</i></root>'))) AS x(XmlCol)
CROSS APPLY x.XmlCol.nodes(N'/root/i') AS y(XmlCol)
CROSS APPLY (VALUES(y.XmlCol.value('(text())[1]', 'NVARCHAR(100)'))) AS z(Sentence)
WHERE SUBSTRING(z.Sentence, NULLIF(PATINDEX('%[a-z]%', z.Sentence), 0), 1) LIKE '%[a-z]%' COLLATE Latin1_General_BIN
ORDER BY ROW_NUMBER() OVER(ORDER BY y.XmlCol)
在这种情况下,结果将是
john is my name
; adam is my name
[2]如果你正在试图利用第一个字母从每一个一句然后我会用下列溶液(见注释广告行结束):
DECLARE @Source NVARCHAR(100) = N'john is my name; ram is my name; adam is my name'
SELECT (
SELECT u.NewSentence AS '*'
FROM (VALUES (CONVERT(XML, N'<root><i>' + REPLACE(@Source, N';', N';</i><i>') + N'</i></root>'))) AS x(XmlCol) -- It convert source string into XML. Every ; acct as a delimiter for sentence. End results will be like this <root><i>john...;</i><i> ram ....</i>...</root>
CROSS APPLY x.XmlCol.nodes(N'/root/i') AS y(XmlCol) -- It decompose original XML into separate sentences as XML
CROSS APPLY (VALUES(y.XmlCol.value('(text())[1]', 'NVARCHAR(100)'))) AS z(Sentence) -- ... AS NVARCHAR(100)
CROSS APPLY (VALUES(PATINDEX('%[a-z]%', z.Sentence))) AS t(FirstLetterIndex) -- It finds index of first letter
CROSS APPLY (VALUES(IIF(t.FirstLetterIndex > 0, STUFF(z.Sentence, t.FirstLetterIndex, 1, UPPER(SUBSTRING(z.Sentence, t.FirstLetterIndex, 1))), z.Sentence))) AS u(NewSentence) -- It replace every first letter with the capitalized version/UPPER(...)
ORDER BY ROW_NUMBER() OVER(ORDER BY y.XmlCol) -- All sentences should be ordered by original position within source string
FOR XML PATH('') -- It concatenates all sentences back in one string
)
例如,如果源串是N'john is my name; ram is my name; adam is my name'
那么结果将是N'John is my name; Ram is my name; Adam is my name'
。
注:该解决方案的工作(以及基于XML切碎所有其他解决方案)如果源字符串不包括一些XML字符保留(如<
)。让我知道如果这是你的情况。
只需注意:我的XML碎片没有问题,保留字符...用'PATINDEX'('%[az]%')搜索第一个字母似乎过于复杂... – Shnugo
我相信FOR XML ...增加一些开销。我提到如果源字符串包含这样的字符,OP应该让我知道。没有提到每个句子的第一个字母应该是一个字母。它可能是一个空间,也可能是第一封信之前可能是100个空格。 –
你可以创建一个这样的功能。
Create FUNCTION SPLITTER (
@textData NVARCHAR(MAX),
@Delimeter NVARCHAR(MAX)) RETURNS @RtnValue TABLE (
Data NVARCHAR(MAX)) AS BEGIN
DECLARE @index INT DECLARE @data nvarchar(1000) DECLARE @firstCharacter char
SET @index = CHARINDEX(@Delimeter,@textData)
WHILE (@index>0)
BEGIN
set @data = LTRIM(RTRIM(SUBSTRING(@textData, 1, @index - 1))) set @firstCharacter = SUBSTRING(@data,1,1);
if UNICODE(@firstCharacter) = UNICODE(upper(@firstCharacter)) begin INSERT INTO @RtnValue (data) SELECT @data end;
SET @textData = SUBSTRING(@textData, @index + DATALENGTH(@Delimeter)/2, LEN(@textData))
SET @index = CHARINDEX(@Delimeter, @textData)
END
set @data = @textData set @firstCharacter = SUBSTRING(@data,1,1);
if UNICODE(@firstCharacter) = UNICODE(upper(@firstCharacter)) begin INSERT INTO @RtnValue (data) SELECT @data end;
RETURN END
使用这样
SELECT * FROM分路器( '约翰是我的名字,拉姆是我的名字;亚当是我的名字', ';')
你可以抓住的NGrams8K副本,并做到这一点:
-- note that I made the 3rd item start with lower-case
DECLARE @YourString VARCHAR(100)='John is my name; Ram is my name; adam is my name';
WITH D(n) AS
(
SELECT 0 UNION ALL SELECT position
FROM dbo.NGrams8k(@yourstring,1) WHERE token = ';'
),
TOKEN(token) AS
(
SELECT LTRIM(SUBSTRING(@YourString, N+1,
ISNULL(NULLIF(CHARINDEX(';', @YourString, N+1),0), 101)-(N+1)))
FROM D
)
SELECT token,
FirstLetterIsCaptial = IIF(ASCII(SUBSTRING(token,1,1)) BETWEEN 65 AND 90, 1, 0)
FROM TOKEN;
结果
token FirstLetterIsCaptial
------------------ --------------------
John is my name 1
Ram is my name 1
adam is my name 0
的SQL Server的哪个版本? – Shnugo
@Shnugo Microsoft SQL Server 2012 - 11.0.5058.0(X64) –
这将是一个丑陋的问题,特别是如果分号分隔的术语数量未知。更好的解决方案是将数据标准化并将每个名称/句子放在单独的记录中。 –