free web hosting | free website | Business Web Hosting | Free Website Submission | shopping cart | php hosting
affordable web hosting Pets web page hosting web hosting website hosting web hosting service web hosting web host
Reading Unicode Strings from RichEditCtrl and Converting Them into UTF-8 Format

Reading Unicode Strings from RichEditCtrl and Converting Them into UTF-8 Format

Reading Unicode Strings from RichEditCtrl

RichEditCtrl in Visual C++ 7.0 supports Unicode. To obtain Unicode strings from the RichEditCtrl box, use EM_GETTEXTEX message instead of GetDlgItemText.

SendMessage( 
   EM_GETTEXTEX,          
   (LPARAM) &getTextEx,  // GETTEXTEX structure
   (WPARAM) lpszWChar    // output WCHAR array
);

This page explains how to obtain Unicode strings using EM_GETTEXTEX message and convert them into UTF-8 format. The SendMessage function above takes three parameters. The first parameter is the message type EM_GETTEXTEX. The second parameter lpszWChar is a WCHAR array (which is casted to LPARAM). In order to allocate enough space for the Unicode string from the control for the array, call GetTextLengthEx to get the length of the Unicode string. Add an extra space for the null terminal character. The following code shows how to allocate the memory for the array.

   int nLength = edit->GetTextLengthEx(GTL_DEFAULT,1200);
   LPWSTR lpszWChar = new WCHAR[nLength+1];

The third parameter for the EM_GETTEXTEX message is getTextEx, a GETTEXTEX structure (which is casted to WPARAM). The code below shows how to set the GETTEXTEX structure. The codepage member of the structure is set to 1200 which indicates to encode the string in Unicode format. The cb value should be set to the length of the Unicode encoded string in bytes. The extra space is added for the terminal null character.

  GETTEXTEX getTextEx;
  getTextEx.cb=(nLength+1)*sizeof(WCHAR);
  getTextEx.codepage=1200;
  getTextEx.flags=GT_DEFAULT;
  getTextEx.lpDefaultChar=NULL;
  getTextEx.lpUsedDefChar=NULL;

By calling the EM_GETTEXTEX message, the Unicode string in the RichEditCtrl box will be stored in the WCHAR array.

   CRichEditCtrl* edit=(CRichEditCtrl*)GetDlgItem(id);
   edit->SendMessage(EM_GETTEXTEX, (WPARAM)&getTextEx, (LPARAM)lpszWChar); 
    

Converting the Unicode String into UTF-8 Format

Use ATL::AtlUnicodeToUTF8 function to convert the Unicode string into UTF-8 format. The ATL::AtlUnicodeToUTF8 function is declared in atlenc.h header file.

AtlUnicodeToUTF8(
   LPCWSTR lpszWChar, // the original Unicode string
   int nLength,       // the length of the Unicode string
   LPSTR data,        // output buffer array
   int len            // the length of the buffer
);

The function takes four parameters. The first parameter points to the Unicode string. The second parameter sets the length of the original Unicode string. The third parameter is the buffer where the converted UTF-8 string will be stored. The fourth parameter sets the length of the buffer.

The following code shows how to get the length of the buffer array by calling the same ATL::AtlUnicodeToUTF8 function and setting the third and fourth parameters to 0.

   int len=ATL::AtlUnicodeToUTF8(lpszWChar, nLength, 0, 0);

The buffer array should be of the size len+1 bytes. Call the ATL::AtlUnicodeToUTF8 function again to convert the Unicode string into UTF-8 format. The converted UTF-8 string will be stored in the buffer array.

   char *data=new char[len+1];
   AtlUnicodeToUTF8(lpszWChar, nLength, data, len);

Lastly, set the null terminal character at the end of the buffer and delete the WCHAR array.

   delete [] lpszWChar;
   data[len]='\0';

The code below shows the actual implementation of the above procedure. GetUtf8String(UINT id) function that takes the resource ID of the RichEditCtrl and returns the UTF-8 string in the box.

char* CQuoteDlg::GetUtf8String(UINT id)
{
	CRichEditCtrl* edit=(CRichEditCtrl*)GetDlgItem(id);
	int nLength = edit->GetTextLengthEx(GTL_DEFAULT,1200);
	LPWSTR lpszWChar = new WCHAR[nLength+1];

	GETTEXTEX getTextEx;
	getTextEx.cb=(nLength+1)*sizeof(WCHAR);
	getTextEx.codepage=1200;
	getTextEx.flags=GT_DEFAULT;
	getTextEx.lpDefaultChar=NULL;
	getTextEx.lpUsedDefChar=NULL;

	edit->SendMessage(EM_GETTEXTEX, 
	         (WPARAM)&getTextEx, (LPARAM)lpszWChar); 

	int len=ATL::AtlUnicodeToUTF8(lpszWChar, nLength, 0, 0);
	char *data=new char[len+1];

	AtlUnicodeToUTF8(lpszWChar, nLength, data, len);

	delete [] lpszWChar;
	data[len]='\0';

	return data;
}


Return to Home Page
Erica Asai
Last Modified: Fri Sep 01 15:37:34 2006