Corrupção de Dados direto das trincheiras

Olá,

Hoje venho com mais um caso de corrupção de dados extraído de um ambiente real, direto das “trincheiras”.

Tive um caso onde há alguns dias o SQL vinha gerando dumps de um problema até então não identificado. Como os dumps não vinham sendo verificados o problema ficou por alguns dias e só fomos “informados” dele quando um dia o SQL foi derrubado pelo erro. Vamos à investigação.

Primeiro erro crítico encontrado, que derrubou o SQL:

=====================================================================                                            
       BugCheck Dump                                                                                             
=====================================================================                                            
                                                                                                                 
This file is generated by Microsoft SQL Server                                                                   
version 9.00.5000.00                                                                                             
upon detection of fatal unexpected error. Please return this file,                                               
the query or program that produced the bugcheck, the database and                                                
the error log, and any other pertinent information with a Service Request.                                       
                                                                                                                 
                                                                                                                 
Computer type is AT/AT COMPATIBLE.                                                                               
Current time is 01:53:50 09/20/12.                                                                               
8 Unknown CPU 9., 3 Mhz processor (s).                                                                           
Windows NT 5.2 Build 3790 CSD Service Pack 2.                                                                    
                                     
Memory                               
MemoryLoad = 87%                     
Total Physical = 8189 MB             
Available Physical = 1048 MB         
Total Page File = 9779 MB            
Available Page File = 2785 MB        
Total Virtual = 8388607 MB           
Available Virtual = 8379991 MB       
***Stack Dump being sent to E:\MSSQL.1\MSSQL\LOG\SQLDump0129.txt                                                 
SqlDumpExceptionHandler: Process 63 generated fatal exception c0000005 EXCEPTION_ACCESS_VIOLATION. SQL Server is 
terminating this process.                                                                                        
* *******************************************************************************                                
*                                                                                                                
* BEGIN STACK DUMP:                                                                                              
*   09/20/12 01:53:50 spid 63                                                                                    
*                                                                                                                
*                                                                                                                
*   Exception Address = 0000000001D20C81 Module(sqlservr+0000000000D20C81)                                       
*   Exception Code    = c0000005 EXCEPTION_ACCESS_VIOLATION                                                      
*   Access Violation occurred writing address 0000000000000002

Destaquei os pontos que me chamaram a atenção. Pensei, provavelmente algo grave mas vamos continuar investigando… Este erro ocorreu ainda mais uma vez e depois começaram a surgir dumps com esta característica:

Begin transaction log dump for database 'dbcorporativo'; for Index maintenance - filter on ObjId.
              
                   Current LSN = 002907c7:000019ea:01da                                                          
                     Operation = LOP_MODIFY_ROW                                                                  
                       Context = LCX_CLUSTERED                                                                   
                Transaction ID = 0000:7241332b                                                                   
                      Tag Bits = 0000                                                                            
       Log Record Fixed Length = 62                                                                              
             Log Record Length = 116                                                                             
                  Previous LSN = 002907c7:000019ea:01d8                                                          
                     Flag Bits = 0200                                                                            
                   AllocUnitId = 372292930109440                                                                 
                 AllocUnitName = dbo.mk45_pf.pk_mk45_pf                                                          
                       Page ID = 0001:004cfa63                                                                   
                       Slot ID = 3                                                                               
             Previous Page LSN = 002907c6:00039c20:015a                                                          
               Number of Locks = 3                                                                               
              Lock Information = HoBt 372292930109440:ACQUIRE_LOCK_IX OBJECT: 6:1385771994:0 ;ACQUIRE_LOCK_IX PAG
E: 6:1:5044835;ACQUIRE_LOCK_X KEY: 6:372292930109440 (0f0088d0eb93)                                              
                   Description =                                                                                 
                    Log Record = 00003e00c7072900ea190000d80102002b3341720000040263fa4c0001000300da339952c6072900
209c03005a0101000000da33995201009c000100000004000100010008001a00cd00772ad18a2e48160fe84c020100000101000c0000da339
9520000010200040204000a0f0088d0eb939952                                                                          
                                                                                                                 
                   Current LSN = 002907c7:000019ea:01d9                                                          
                     Operation = LOP_MODIFY_COLUMNS                                                              
                       Context = LCX_CLUSTERED                                                                   
                Transaction ID = 0000:7241331c                                                                   
                      Tag Bits = 0000                                                                            
       Log Record Fixed Length = 62                                                                              
             Log Record Length = 144                                                                             
                  Previous LSN = 002907c7:000019ea:01d7                                                          
                     Flag Bits = 0200                                                                            
                   AllocUnitId = 372292930109440                                                                 
                 AllocUnitName = dbo.mk45_pf.pk_mk45_pf                                                          
                       Page ID = 0001:0059445a                                                                   
                       Slot ID = 3                                                                               
             Previous Page LSN = 002907c7:000019ea:01d7                                                          
               Number of Locks = 3                                                                               
              Lock Information = HoBt 372292930109440:ACQUIRE_LOCK_IX OBJECT: 6:1385771994:0 ;ACQUIRE_LOCK_IX PAG
E: 6:1:5850202;ACQUIRE_LOCK_X KEY: 6:372292930109440 (5200e0c454d0)                                              
                   Description =                                                                                 
                    Log Record = 00003e00c7072900ea190000d70102001c334172000006025a44590001000300da339952c7072900
ea190000d70101000000da339952010000000000000008000800040008001a000200020001000100330033009c009c000200010016521b530
20100000101000c0000da3399520000010200040204000a5200e0c454d04172530104020a004e00d0000100d1339952                  

E no final deste dump, a mensagem:

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
                                                   
**Dump thread - spid = 63, PSS = 0x00000000BB17DEA0, EC = 0x00000000BB17DEB0                                     
***Stack Dump being sent to E:\MSSQL.1\MSSQL\LOG\SQLDump0131.txt                                                 
* *******************************************************************************                                
*                                                                                                                
* BEGIN STACK DUMP:                                                                                              
*   09/20/12 03:05:40 spid 63                                                                                    
*                                                                                                                
* CPerIndexMetaQS::ErrorAbort - Index corruption                                                                 

Já fiquei um pouco mais tranquilo. Trata-se de uma corrupção de índice. Ótimo! Algo relativamente simples de ser tratado. Restava ainda ter absoluta certeza de que se trava de uma corrupção de índice e tratá-la. Vamos ao DBCC CHECKDB. O resultado do CHECKDB na base em questão foi o seguinte (resumo):

Msg 8951, Level 16, State 1, Line 1
Table error: table 'mk45_pf' (ID 1385771994). Data row does not have a matching index row in the index 'IK_mk45_pf' (ID 24). Possible missing or invalid keys for the index row matching:
Msg 8955, Level 16, State 1, Line 1
Data row (1:4206449:9) identified by (cd_pes = 34916789) with index values 'cd_cpf_pes = 16614763254. and cd_sgm_pes_pri = '006' and cd_sgm_pes_sec = 'AKO' and cd_pes = 34916789'.
Msg 8951, Level 16, State 1, Line 1
Table error: table 'mk45_pf' (ID 1385771994). Data row does not have a matching index row in the index 'IK_mk45_pf' (ID 24). Possible missing or invalid keys for the index row matching:
Msg 8955, Level 16, State 1, Line 1
Data row (1:5060407:26) identified by (cd_pes = 38871114) with index values 'cd_cpf_pes = NULL and cd_sgm_pes_pri = '006' and cd_sgm_pes_sec = 'AKO' and cd_pes = 38871114'.
Msg 8951, Level 16, State 1, Line 1
Table error: table 'mk45_pf' (ID 1385771994). Data row does not have a matching index row in the index 'IK_mk45_pf' (ID 24). Possible missing or invalid keys for the index row matching:
Msg 8955, Level 16, State 1, Line 1
Data row (1:5061350:3) identified by (cd_pes = 38871291) with index values 'cd_cpf_pes = NULL and cd_sgm_pes_pri = '106' and cd_sgm_pes_sec = 'AUB' and cd_pes = 38871291'.
...
CHECKDB found 0 allocation errors and 28 consistency errors in table 'mk45_pf' (object ID 1385771994).
CHECKDB found 0 allocation errors and 28 consistency errors in database 'dbcorporativo'.
repair_rebuild is the minimum repair level for the errors found by DBCC CHECKDB (dbcorporativo).

No dump, ele gerou o resultado do DBCC CHECKDB desta forma:

<Dbcc ID="52" Error="8952" Severity="16" State="1">Table error: table 'mk45_pf' (ID 1385771994). Index row in ind
ex 'IK_mk45_pf' (ID 24) does not match any data row. Possible extra or invalid keys for:</Dbcc>                  
<Dbcc ID="53" Error="8956" Severity="16" State="1">Index row (1:15540904:69) with values (cd_cpf_pes = 9746759701
. and cd_sgm_pes_pri = '006' and cd_sgm_pes_sec = '080' and cd_pes = 46266481) pointing to the data row identifie
d by (cd_pes = 46266481).</Dbcc>                                                                                 
<Dbcc ID="54" Error="8952" Severity="16" State="1">Table error: table 'mk45_pf' (ID 1385771994). Index row in ind
ex 'IK_mk45_pf' (ID 24) does not match any data row. Possible extra or invalid keys for:</Dbcc>                  
<Dbcc ID="55" Error="8956" Severity="16" State="1">Index row (1:15577461:126) with values (cd_cpf_pes = 859179826
00. and cd_sgm_pes_pri = '106' and cd_sgm_pes_sec = '133' and cd_pes = 38872341) pointing to the data row identif
ied by (cd_pes = 38872341).</Dbcc>                                                                               
<Dbcc ID="56" Error="8990" Severity="10" State="1">CHECKDB found 0 allocation errors and 28 consistency errors in
 table 'mk45_pf' (object ID 1385771994).</Dbcc>                                                                  
<Dbcc ID="57" Error="8989" Severity="10" State="1">CHECKDB found 0 allocation errors and 28 consistency errors in
 database 'dbcorporativo'.</Dbcc>                                                                                
<Dbcc ID="58" Error="8957" Severity="-1" State="1">DBCC CHECKDB (dbcorporativo) WITH all_errormsgs, no_infomsgs e
xecuted by BSBR\dbadmin found 28 errors and repaired 0 errors. Elapsed time: 1 hours 9 minutes 23 seconds.</Dbcc>
<Dbcc ID="59" Error="8958" Severity="10" State="1">repair_rebuild is the minimum repair level for the errors foun
d by DBCC CHECKDB (dbcorporativo).</Dbcc>                                                                        
</DbccResults>                                                                                                   
**Dump thread - spid = 59, PSS = 0x0000000193777C00, EC = 0x0000000193777C10                                     
***Stack Dump being sent to E:\MSSQL.1\MSSQL\LOG\SQLDump0161.txt                                                 
* *******************************************************************************                                
*                                                                                                                
* BEGIN STACK DUMP:                                                                                              
*   10/05/12 15:08:22 spid 59                                                                                    
*                                                                                                                
* DBCC database corruption                                                                                       
*                                                                                                                
* Input Buffer 142 bytes -                                                                                       
*             dbcc checkdb(dbcorporativo)  with all_errormsgs, no_infomsgs                                       
*                                                                                                                
* *******************************************************************************                                

Problema diagnosticado com exatidão. Realmente trata-se de uma corrupção de índice. A minha suspeita seria de que este é um índice NON-CLUSTERED e alguma referência da PK foi corrompida, ou mesmo o próprio dado do índice. Não tive dúvidas, confirmei que se tratava de um índice NC (exec sp_helpindex ‘minha_tabela’), dropei o índice e o recriei logo em seguida.

Rodei o DBCC CHECKDB na sequência e, sucesso. Problema resolvido. Exp + 1. =)

 

Abraços e bons estudos.
Erickson Ricci

3 Respostas para “Corrupção de Dados direto das trincheiras

  1. Pingback: Análise de um banco de dados corrompido « Alex Souza

Deixe uma resposta

Preencha os seus dados abaixo ou clique em um ícone para log in:

Logotipo do WordPress.com

Você está comentando utilizando sua conta WordPress.com. Sair / Alterar )

Imagem do Twitter

Você está comentando utilizando sua conta Twitter. Sair / Alterar )

Foto do Facebook

Você está comentando utilizando sua conta Facebook. Sair / Alterar )

Foto do Google+

Você está comentando utilizando sua conta Google+. Sair / Alterar )

Conectando a %s