Split String C Builder For Mac

今天寫了 strtok 的範例:『如何分離網路 mac address』程式碼如下,大家一定會有疑問 strtok 第一次呼叫,第一參數輸入愈分離的字串,在 while 迴圈,則是輸入 NULL 呢?底下就來解析 strtok.c 的程式碼。 執行結果如下圖: strtok.c 在 FreeBSD 7.1 Release 裡面路徑是 /usr/src/lib/libc/string/strtok.c,可以看到底下函式 __strtok_r 大家可以看到,在第一次執行 strtok 時候,會針對傳入s字串每一個字進行比對,c = *s++; 意思就是 c 先設定成 *s,這行執行結束之後,會將 *s 指標加1,也就是字母 T -> h 的意思,這地方必須注意,如果第一個字母符合 delim 分隔符號,就會執行 goto cont;,如果不是,則會將 tok 指標指向 s 字串第一個位址,再來跑 for 迴圈找出下一個分隔字串,將其字串設定成 0 中斷點,回傳 tok 指標,並且將s字串初始值指向分隔字串的下一個位址。 接下來程式只要繼續執行 strtok(NULL, delim),程式就會依照上次所執行的 s 字串繼續比對下去,等到 *last 被指向 NULL 的時候就不會在執行 strtok 了,我相信這非常好懂,微軟 Visual Studio 有不同的寫法: https://research.microsoft.com/en-us/um/redmond/projects/invisible/src/crt/strtok.c.htm 微軟用了 strpbrk 來取代 for 迴圈的字串比對,但是整個流程是差不多的,大家可以參考看看,果然看 Code 長知識。

Related View

C string split on char
  • [C/C++] C語言切割字串函式 strsep,分析 URL GET 參數 (1)
  • [C/C++] strpbrk 在字串中找尋指定的符號或字母 (0)
  • [C/C++] cstring (string.h) 搜尋函式:strstr, strchr (6)
  • [C/C++] 判斷年份是否閏年 (0)
  • [C/C++] 實做 C 語言 substr 功能,模擬計算機功能 (1)
  • [C/C++] 判斷字串是否為數字 (1)
  • [C/C++] cstring (string.h) 函式:strcat, strncat, strcmp, strncmp (0)
  • [C/C++] 判斷檔案是否存在 file_exists (1)
  • [C/C++] 將字串轉成 16 進位 (3)
  • [C/C++] 如何計算陣列大小/個數 (0)

How to split a string in C++? That is to say, how to get a collection of substrings representing the words of a sentence, or the pieces of data contained in a CSV entry?

The String.Split method creates an array of substrings by splitting the input string based on one or more delimiters. This method is often the easiest way to separate a string on word boundaries. It's also used to split strings on other specific characters or strings.

This is a simple question, but one which has multiple answers in C++.

  • A C solution, since it involves using C arrays. Another problem is its low-level nature: the user must send a pointer to help strtok. In this example I have derived a class splitstring from string. If you have a splitstring and you want to use it as a string, you can, because it is one. But if you need to split the string, you can split it too.
  • Cではsplit関数が存在しないので、stringを使って文字列を分割してvectorに変換する場合には自分で実装する必要があります。 この記事では、 split関数とは findfirstof関数を使った実装 正規表現を使った実装 時間計測をしてどれが一番効率的か.

We will see 3 solutions, each one having advantages and drawbacks. Pick the one corresponding best to your needs. The point of this post as an episode of the STL learning resource is also to show you how the iterator interface goes beyond the scope of simple containers. And this illustrates how powerful the design of the STL is.

Solution 1 uses standard components. Solution 2 is better but relies on boost. And Solution 3 is even better but uses ranges. So the one for you really depends on what you need and what you have access to.

Solution 1: Iterating on a stream

Stepping into the world of streams

A stream is an object that creates a connection with a source or with a destination of interest. A stream can obtain information from the source (std::istream) or provide information to the destination (std::ostream), or both (std::iostream).

The source and destination of interest can typically be the standard input (std::cin) or output (std::cout), a file or a string, but really anything can be connected to a stream, provided that the right machinery is put in place.

The main operations done on a stream are

  • for input streams: draw something from it with operator>>,
  • for output streams: push something into it with operator<<.

This is illustrated in the below picture:

The input stream that connects to a string, std::istringstream, has an interesting property: its operator>> produces a string going to the next space in the source string.

istream_iterator

std::istream_iterator is an iterator that can connect with an input stream.

It presents the regular interface of an input iterator (++, dereferencing), but its operator++ actually draws onto the input stream.

istream_iterator is templated on the type it draws from the stream. We will use istream_iterator<std::string>, that will draw a string from the stream and provide a string when dereferenced:

When the stream has nothing more to extract from its source it signals it to the iterator, and the iterator is flagged as finished.

Split String C Builder For MacSplit String C Builder For Mac

Solution 1.1

Now with the iterator interface we can use algorithms, and this really shows the flexibility of the design of the STL. To be able to use the STL (see Inserting several elements into an STL container efficiently), we need a begin and an end. The begin would be the iterator on an untouched istringstream on the string to split: std::istream_iterator<std::string>(iss). For the end, by convention, a default constructedistream_iterator is flagged as finished: std::istream_iterator<string>():

Here is the resulting code:

The extra parentheses in the first parameter are made to disambiguate from a function call – see the “most vexing parse” in Item 6 of Scott Meyers’ Effective STL.

As pointed out by Chris in the comments section, in C++11 we can use uniform initialization using braces to work around that vexing phenomenon:

Advantages:

  • uses standard components only,
  • works on any stream, not just strings.

Drawbacks:

  • it can’t split on anything else than spaces, which can be an issue, like for parsing a CSV,
  • it can be improved in terms of performance (but until your profiling hasn’t proved this is your bottleneck, this is not a real issue),
  • arguably a lot of code for just splitting a string!

Solution 1.2: Pimp my operator>>

(Solution 1.2 is useful to read in order to understand the reasoning that leads to Solution 1.3, but Solution 1.3 is more practical in the end)

The causes of two of above drawbacks lie at the same place: the operator>> called by the istream_iterator that draws a string from the stream. This operator>> turns out to do a lot of things: stopping at the next space (which is what we wanted initially but cannot be customized), doing some formatting, reading and setting some flags, constructing objects, etc. And most of this we don’t need here.

So we want to change the behaviour of the following function:

Split String C Builder For Mac Os

Split String C Builder For Mac

We can’t actually change this because it is in the standard library. We can overload it with another type though, but this type still needs to be kind of like a string.

So the need is to have a string disguised into another type. There are 2 solutions for this: inheriting from std::string, and wrapping a string with implicit conversion. Let’s choose inheritance here.

Say we want to split a string by commas:

Ok, I must admit that this point is controversial. Some would say: “std::string doesn’t have a virtual destructor, so you shouldn’t inherit from it!” and even, maybe, hypothetically, become a tiny little trifle emotional about this.

What I can say here is that the inheritance does not cause a problem in itself. Granted, a problem will occur if a pointer to WordDelimitedByCommas is deleted in the form of a pointer to std::string. Or with the slicing problem. But we’re not going to do this, as you’ll see when you read on. Now can we prevent someone to go and instantiate a WordDelimitedByCommas and coldly shoot the program in the foot with it? No we can’t. But is the risk worth taking? Let’s see the benefit and you’ll judge by yourself.

Now operator>> can be overloaded with this, in order to perform only the operations we need : getting the characters until the next comma. This can be accomplished with the getline function:

(the return is statement allows to chain calls to operator>> .)

Now the initial code can be rewritten:

This can be generalized to any delimiter by templating the WordDelimitedByComma class:

Now to split with semicolon for instance:

Advantages:

  • allows any delimiter specified at compile time,
  • works on any stream, not just strings,
  • faster than solution 1 (20 to 30% faster)

Drawbacks:

  • delimiter at compile-time
  • not standard, though easy to re-use,
  • still a lot of code for just splitting a string!

Solution 1.3: stepping away from the iterators

The main problem with solution 1.2 is that the delimiter has to be specified at compile-time. Indeed, we couldn’t pass the delimiter to std::getline through the iterators. So let’s refactor solution 1.2 to remove the layers of iterators:

Here we use another feature of std::getline: it returns a stream that’s passed to it, and that stream is convertible to bool (or to void*) before C++11. This boolean indicates if no error has occured (so true is no error has occured, false if an error has occured). And that error check includes whether or not the stream is at an end.

C# Connection String Builder

So the while loop will nicely stop when the end of the stream (and therefore of the string) has been reached.

Advantages:

  • very clear interface
  • works on any delimiter
  • the delimiter can be specified at runtime

Drawbacks:

  • not standard, though easy to re-use

Solution 2: Using boost::split

This solution is superior to the previous ones (unless you need it to work on any stream):

The third argument passed to boost::split is a function (or a function object) that determines whether a character is a delimiter. For example here, we use a lambda taking a char a returning whether this char is a space.

The implementation of boost::split is fairly simple: it essentially performs multiple find_if on the string on the delimiter, until reaching the end. Note that contrary to the previous solution, boost::split will provide an empty string as a last element of results if the input string ends with a delimiter.

Advantages:

  • straightforward interface,
  • allows any delimiter, even several different ones
  • 60% faster than solution 1.1

Drawbacks:

  • needs access to boost
  • the interface does not output its results via its return type
Split String C Builder For Mac

Solution 3 (for the future): Using ranges

Even if they are not as widely available as standard or even boost components today, ranges are the future of the STL and should be widely available in a couple of years.

To get a glimpse of it, the range-v3 library of Eric Niebler offers a very nice interface for creating a split view of a string:

And it comes with several interesting features like, amongst others, using a substring as delimiter. Ranges should be included in C++20, so we can hope to be able to use this feature easily in a couple of years.

So, how do I split my string?

If you have access to boost, then by all means do Solution 2. Or you can consider rolling out your own algorithm that, like boost, split strings based on find_if.

If you don’t want to do this, you can do Solution 1.1 which is standard, unless you need a specific delimiter or you’ve been proven this is a bottleneck, in which case Solution 1.3 is for you.

And when you have access to ranges, Solution 3 should be the way to go.

Related posts:

Become a Patron!
Share this post! Don't want to miss out ?

C String Split On Char

Follow: