在看一个项目的代码时,发现了它用到了strtok 这个API, 它的作用是 split 一个字符串。这个功能在各个语言都很常见,而且接口也大同小异,比如 Golang:

1
2
3
s := strings.Split("a,b,c", ",")
fmt.Println(s)
// Output: [a b c]

Python:

1
2
3
4
5
txt = "hello, my name is Peter, I am 26 years old"

x = txt.split(", ")

print(x) 

strtok 的接口就比较奇怪了,看一个例子:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
Live Demo

#include <string.h>
#include <stdio.h>

int main () {
   char str[80] = "This is - www.tutorialspoint.com - website";
   const char s[2] = "-";
   char *token;
   
   /* get the first token */
   token = strtok(str, s);
   
   /* walk through other tokens */
   while( token != NULL ) {
      printf( " %s\n", token );
    
      token = strtok(NULL, s);
   }
   
   return(0);
}

输出为:

1
2
3
This is 
  www.tutorialspoint.com 
  website

也是同样的作用。但是它的实现上,第一次调用,看起来很正常,返回结果是第一个token.之后的循环调用,只用传分隔符,不断地拿剩下的 token。 man page 里的说明是:

The strtok() function is used to isolate sequential tokens in a null-terminated string, str. These tokens are separated in the string by at least one of the characters in sep. The first time that strtok() is called, str should be specified; subsequent calls, wishing to obtain further tokens from the same string, should pass a null pointer instead. The separator string, sep, must be supplied each time, and may change between calls.

这里是一个示例的 strtok 实现:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21

char *
strtok(s, delim)
    char *s;            /* string to search for tokens */
    const char *delim;  /* delimiting characters */
{
    static char *lasts;
    register int ch;

    if (s == 0)
	s = lasts;
    do {
	if ((ch = *s++) == '\0')
	    return 0;
    } while (strchr(delim, ch));
    --s;
    lasts = s + strcspn(s, delim);
    if (*lasts != 0)
	*lasts++ = 0;
    return s;
}

它在内部用了一个 static 变量来记录上一次的信息,所以也是不可重入的,也不是线程安全的。这函数应该是比较老了,当时线程的library可能还不存在,可重入性也没那么重要。 另外,它也修改了传入的 str。所以有些地方建议在token之前,copy一份数据再调用strtok:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
char *token;
const char *path = getenv("PATH");
/* PATH is something like "/usr/bin:/bin:/usr/sbin:/sbin" */
 
char *copy = (char *)malloc(strlen(path) + 1);
if (copy == NULL) {
  /* Handle error */
}
strcpy(copy, path);
token = strtok(copy, ":");
puts(token);
 
while (token = strtok(0, ":")) {
  puts(token);
}
 
free(copy);
copy = NULL;
 
printf("PATH: %s\n", path);
/* PATH is still "/usr/bin:/bin:/usr/sbin:/sbin" */

所以有些地方也建议不要用这个函数:

Never use this function. This function modifies its first argument. The identity of the delimiting character is lost. This function cannot be used on constant strings.

(The Linux Programmer’s Manual (man) page on strtok(3))

只能感叹说 C/C++的很多 library 设计真是太古老了。